How to parse Nested XML with same tag name using Jsoup

share link

by vigneshchennai74 dot icon Updated: Feb 20, 2023

technology logo
technology logo

Solution Kit Solution Kit  

This Java code using the Jsoup library helps to parse and process an XML hierarchy of categories and items by allowing you to extract specific information from the XML document. It enables you to select and extract only the category and item elements relevant to your task or analysis instead of manually parsing the entire document. 


The code can be used to simplify the parsing and processing of hierarchical data structures in XML documents. It demonstrates the use of two different techniques for selecting elements based on their attributes and structure. These techniques can also be applied to other XML documents, making the code a useful starting point for working with XML data in Java. 


The classes jsoup.Jsoup, jsoup.nodes.Document, jsoup.parser.Parser, and jsoup.select.Elements are part of the Jsoup library, a Java library for working with HTML and XML documents. 

  • jsoup - Jsoup class provides static methods for parsing HTML and XML documents. It takes the document's source as input, such as a URL or a string, and returns a jsoup.node. Document object that represents the parsed document.  
  • jsoup.nodes - Document class represents an in-memory representation of an HTML or XML document. It provides methods for querying and manipulating the document, such as selecting elements based on a tag name, attribute value or CSS selector. 
  • jsoup.parser - Parser class is an enumeration that provides different parsers that can be used to parse an HTML or XML document. The default parser is the HTML parser, but other parsers, such as the XML parser, can be specified for documents that require different parsing rules. 
  • jsoup.select - Elements class represents a collection of HTML or XML elements selected based on a CSS selector. It provides methods for iterating over the selected elements and performing operations on them, such as getting the text content, the attributes, or the HTML representation of the element. 


The Jsoup library to parse and process XML hierarchies of categories and items can be helpful in various applications that require processing XML data. 

Preview of the output that you will get on running this code.

Code

In this solution we have used JSOUP Library.

<!-- https://mvnrepository.com/artifact/org.jsoup/jsoup -->
<dependency>
    <groupId>org.jsoup</groupId>
    <artifactId>jsoup</artifactId>
    <version>1.13.1</version>
</dependency>

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.parser.Parser;
import org.jsoup.select.Elements;

public class Example {


    public static void main(String[] args) {
        String xml = "<categories>\n"
                + "    <category>abc\n"
                + "        <category>cde\n"
                + "            <item>someid_1</item>\n"
                + "            <item>someid_2</item>\n"
                + "            <item>someid_3</item>\n"
                + "            <item>someid_4</item>\n"
                + "        </category>\n"
                + "    </category>\n"
                + "    <category>xyz\n"
                + "       <category>zwd\n"
                + "          <category>hgw\n"
                + "             <item>someid_5</item>\n"
                + "          </category>\n"
                + "       </category>\n"
                + "    </category>\n"
                + " </categories>";

        Document doc = Jsoup.parse(xml, "", Parser.xmlParser());

        //if you are interested in Items only
        Elements items = doc.select("category > item");
        items.forEach(i -> {
            System.out.println("Parent text: " +i.parent().ownText());
            System.out.println("Item text: "+ i.text());
            System.out.println();
        });


        //if you are interested in categories having at least one direct item element
        Elements categories = doc.select("category:has(> item)");
        categories.forEach(c -> {
            System.out.println(c.ownText());
            Elements children = c.children();
            children.forEach(ch -> {
                System.out.println(ch.text());
            });
            System.out.println();
        });
    }

Parent text: cde
Item text: someid_1

Parent text: cde
Item text: someid_2

Parent text: cde
Item text: someid_3

Parent text: cde
Item text: someid_4

Parent text: hgw
Item text: someid_5

cde
someid_1
someid_2
someid_3
someid_4

hgw
someid_5
  1. copy the code using the "Copy" button above, and paste it in a your Java IDE.
  2. Add jsoup Library in your code.
  3. Run the file to get the Output


I hope you found this useful. I have added the link to dependent libraries, version information in the following sections.


I found this code snippet by searching for "How to parse xml tag with same tag Name" in kandi. You can try any such use case!

Environment Tested

I tested this solution in the following version. Be mindful of changes when working with other versions.


  1. The solution is created and executed in java java version "1.8.0_251"
  2. The solution is tested on Joup Library version "1.13.1"


In this solution we are going to parse Nested XML with same tag name using Jsoup in java with simple steps. This process also facilities an easy to use, hassle free method to create a hands-on working version of code which would help us parse Nested XML with same tag name using Jsoup.

Dependent Library

jsoupby jhy

Java doticonstar image 10188 doticonVersion:jsoup-1.16.1doticon
License: Permissive (MIT)

jsoup: the Java HTML parser, built for HTML editing, cleaning, scraping, and XSS safety.

Support
    Quality
      Security
        License
          Reuse

            jsoupby jhy

            Java doticon star image 10188 doticonVersion:jsoup-1.16.1doticon License: Permissive (MIT)

            jsoup: the Java HTML parser, built for HTML editing, cleaning, scraping, and XSS safety.
            Support
              Quality
                Security
                  License
                    Reuse

                      If you do not have Jsoup that is required to run this code , You can just install it by clicking on the above link and copying the pip install command from the Jsoup page in Kand. You can search for any dependent library on kandi like Jsoup.

                      Support

                      1. For any support on kandi solution kits, please use the chat
                      2. For further learning resources, visit the Open Weaver Community learning page.

                      See similar Kits and Libraries