Parse XML tags using Jsoup in Java

share link

by Abdul Rawoof A R dot icon Updated: Jan 25, 2024

technology logo
technology logo

Solution Kit Solution Kit  

JSoup is a Java library used to engage with real HTML. It gives a technique to govern and parse HTML texts. JSoup can clean HTML documents.


It can extract data from HTML and do other HTML-related tasks. Utilizing Jsoup, you may parse HTML or XML tags. It aids of using the usage of the pick-out feature of the Document class. Additionally, JSoup offers APIs for editing HTML documents and adding new elements. JSoup is a versatile tool that parses XML tags in Java and works with XML in Java. It can apply to many uses. They must process, change, or parse XML data.

Here are a few times while you may use it:

  • Web scraping: It is the use of JSoup. It lets you pull data from XML documents. You download them from the internet. These documents can be RSS feeds or API answers. 
  • Data mining: JSoup can extract statistics for evaluation from large XML collections.  
  • You can use JSoup to read and confirm the HTML produced by a web app for automated testing of web applications. 
  • A Java program can process XML documents. It can also change them using JSoup. 
  • You can use JSoup in a Java application to interpret XML documents from other systems.


Here is an example of how you can parse XML tags using JSoup in Java for your application: 

Fig 1: Preview of the code snippet which I copied from the kandi.

Fig 2: code snippet continuation.

Fig 3: Preview of the output that you will get on running this code from your IDE.

Code

Using jsoup your code will be looks like this:

<!-- https://mvnrepository.com/artifact/org.jsoup/jsoup -->
<dependency>
    <groupId>org.jsoup</groupId>
    <artifactId>jsoup</artifactId>
    <version>1.13.1</version>
</dependency>

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.parser.Parser;
import org.jsoup.select.Elements;

public class Example {


    public static void main(String[] args) {
        String xml = "<categories>\n"
                + "    <category>abc\n"
                + "        <category>cde\n"
                + "            <item>someid_1</item>\n"
                + "            <item>someid_2</item>\n"
                + "            <item>someid_3</item>\n"
                + "            <item>someid_4</item>\n"
                + "        </category>\n"
                + "    </category>\n"
                + "    <category>xyz\n"
                + "       <category>zwd\n"
                + "          <category>hgw\n"
                + "             <item>someid_5</item>\n"
                + "          </category>\n"
                + "       </category>\n"
                + "    </category>\n"
                + " </categories>";

        Document doc = Jsoup.parse(xml, "", Parser.xmlParser());

        //if you are interested in Items only
        Elements items = doc.select("category > item");
        items.forEach(i -> {
            System.out.println("Parent text: " +i.parent().ownText());
            System.out.println("Item text: "+ i.text());
            System.out.println();
        });


        //if you are interested in categories having at least one direct item element
        Elements categories = doc.select("category:has(> item)");
        categories.forEach(c -> {
            System.out.println(c.ownText());
            Elements children = c.children();
            children.forEach(ch -> {
                System.out.println(ch.text());
            });
            System.out.println();
        });
    }

Parent text: cde
Item text: someid_1

Parent text: cde
Item text: someid_2

Parent text: cde
Item text: someid_3

Parent text: cde
Item text: someid_4

Parent text: hgw
Item text: someid_5

cde
someid_1
someid_2
someid_3
someid_4

hgw
someid_5

Instructions

  1. Copy the code using the "Copy" button above, and paste it in a Java file in your IDE(IntelliJ Preferable).
  2. Add the required dependencies and import them in java file.
  3. Run the file to generate the output.


I hope you found this useful. I have added the link to dependent libraries, version information in the following sections.


I found this code snippet by searching for 'how to parse nested xml tag with same tag name' in kandi. You can try any such use case!

Environment Tested

I tested this solution in the following IDE and versions. Be mindful of changes when working with other versions.

  1. The solution is created in IntelliJ IDE and Java jdk-'11.0.17'.
  2. The solution is tested on jsoup version-'1.7.2'


Using this solution, we are able to parse xml tags using jsoup in Java with simple steps. This process also facilities an easy way to use, hassle-free method to create a hands-on working version of code which would help us to parse the xml tags using jsoup in Java.

Dependent Libraries

jsoupby jhy

Java doticonstar image 10188 doticonVersion:jsoup-1.16.1doticon
License: Permissive (MIT)

jsoup: the Java HTML parser, built for HTML editing, cleaning, scraping, and XSS safety.

Support
    Quality
      Security
        License
          Reuse

            jsoupby jhy

            Java doticon star image 10188 doticonVersion:jsoup-1.16.1doticon License: Permissive (MIT)

            jsoup: the Java HTML parser, built for HTML editing, cleaning, scraping, and XSS safety.
            Support
              Quality
                Security
                  License
                    Reuse

                      You can add the dependent library in your gradle or maven files. you can get the dependancy xml in above link

                      You can search for any dependent library on kandi like jsoup java.

                      FAQ

                      1. How to parse XML using Jsoup in Java?

                      To parse XML with Jsoup in Java, use the Jsoup.parse() method, providing the XML string as input.


                      2. Can Jsoup handle large XML files?

                      Jsoup is better for parsing HTML. For large XML files, use dedicated XML parsers like JAXB or SAX.


                      3. How to select XML elements with Jsoup selectors?

                      Use Jsoup selectors. They are like CSS selectors. They target and extract specific XML elements from the parsed document.


                      4. Is Jsoup suitable for parsing complex XML structures?

                      The creators of Jsoup designed it for HTML parsing. For complex XML structures, consider using specialized XML libraries like DOM or StAX.


                      5. How to extract data from XML attributes using Jsoup?

                      Use Jsoup's Element.attr() method to retrieve values from XML attributes.

                      Support

                      1. For any support on kandi solution kits, please use the chat
                      2. For further learning resources, visit the Open Weaver Community learning page.

                      See similar Kits and Libraries