Parse XML tags using Jsoup in Java

share link

by Abdul Rawoof A R dot icon Updated: Jan 24, 2023

technology logo
technology logo

Solution Kit Solution Kit  

JSoup is a Java library used to interact with actual HTML. It offers a method to manipulate and parse HTML texts. 


JSoup can clean HTML documents, extract data from HTML, and complete other HTML-related activities. Utilizing Jsoup, you can parse HTML or XML tags by using the select function of the Document class. Additionally, JSoup offers APIs for editing HTML documents and adding new elements. In various situations, JSoup can be used to parse XML tags in Java, and it is also an effective tool for working with XML in Java and may be applied to a variety of applications that require processing, altering, or parsing XML data. . Here are some instances when it might be put to use: 


  • Web scraping: Using JSoup, you can extract information from XML documents that you download from the internet, including RSS feeds or API answers. 
  • Data mining: JSoup can extract data for analysis from huge XML collections. 
  • Automated testing of web applications: JSoup can be used to read and validate the HTML or XML produced by a web application. 
  • Processing of XML documents: A Java program can process and change XML documents using JSoup. 
  • Data exchange: In a Java application, you can use JSoup to interpret XML documents that you receive from external systems. 


Here is an example of how you can parse XML tags using JSoup in Java for your application: 

Fig 1: Preview of the code snippet which I copied from the kandi.

Fig 2: code snippet continuation.

Fig 3: Preview of the output that you will get on running this code from your IDE.

Code

Using jsoup your code will be looks like this:

<!-- https://mvnrepository.com/artifact/org.jsoup/jsoup -->
<dependency>
    <groupId>org.jsoup</groupId>
    <artifactId>jsoup</artifactId>
    <version>1.13.1</version>
</dependency>

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.parser.Parser;
import org.jsoup.select.Elements;

public class Example {


    public static void main(String[] args) {
        String xml = "<categories>\n"
                + "    <category>abc\n"
                + "        <category>cde\n"
                + "            <item>someid_1</item>\n"
                + "            <item>someid_2</item>\n"
                + "            <item>someid_3</item>\n"
                + "            <item>someid_4</item>\n"
                + "        </category>\n"
                + "    </category>\n"
                + "    <category>xyz\n"
                + "       <category>zwd\n"
                + "          <category>hgw\n"
                + "             <item>someid_5</item>\n"
                + "          </category>\n"
                + "       </category>\n"
                + "    </category>\n"
                + " </categories>";

        Document doc = Jsoup.parse(xml, "", Parser.xmlParser());

        //if you are interested in Items only
        Elements items = doc.select("category > item");
        items.forEach(i -> {
            System.out.println("Parent text: " +i.parent().ownText());
            System.out.println("Item text: "+ i.text());
            System.out.println();
        });


        //if you are interested in categories having at least one direct item element
        Elements categories = doc.select("category:has(> item)");
        categories.forEach(c -> {
            System.out.println(c.ownText());
            Elements children = c.children();
            children.forEach(ch -> {
                System.out.println(ch.text());
            });
            System.out.println();
        });
    }

Parent text: cde
Item text: someid_1

Parent text: cde
Item text: someid_2

Parent text: cde
Item text: someid_3

Parent text: cde
Item text: someid_4

Parent text: hgw
Item text: someid_5

cde
someid_1
someid_2
someid_3
someid_4

hgw
someid_5

Instructions

  1. Copy the code using the "Copy" button above, and paste it in a Java file in your IDE(IntelliJ Preferable).
  2. Add the required dependencies and import them in java file.
  3. Run the file to generate the output.


I hope you found this useful. I have added the link to dependent libraries, version information in the following sections.


I found this code snippet by searching for 'how to parse nested xml tag with same tag name' in kandi. You can try any such use case!

Environment Tested

I tested this solution in the following IDE and versions. Be mindful of changes when working with other versions.

  1. The solution is created in IntelliJ IDE and Java jdk-'11.0.17'.
  2. The solution is tested on jsoup version-'1.7.2'


Using this solution, we are able to parse xml tags using jsoup in Java with simple steps. This process also facilities an easy way to use, hassle-free method to create a hands-on working version of code which would help us to parse the xml tags using jsoup in Java.

Dependent Libraries

jsoupby jhy

Java doticonstar image 10188 doticonVersion:jsoup-1.16.1doticon
License: Permissive (MIT)

jsoup: the Java HTML parser, built for HTML editing, cleaning, scraping, and XSS safety.

Support
    Quality
      Security
        License
          Reuse

            jsoupby jhy

            Java doticon star image 10188 doticonVersion:jsoup-1.16.1doticon License: Permissive (MIT)

            jsoup: the Java HTML parser, built for HTML editing, cleaning, scraping, and XSS safety.
            Support
              Quality
                Security
                  License
                    Reuse

                      You can add the dependent library in your gradle or maven files. you can get the dependancy xml in above link

                      You can search for any dependent library on kandi like jsoup java.

                      Support

                      1. For any support on kandi solution kits, please use the chat
                      2. For further learning resources, visit the Open Weaver Community learning page.

                      See similar Kits and Libraries