Java Language – 175 – XML Parsing (SAX, DOM, StAX)

XML and JSON Processing – XML Parsing (SAX, DOM, StAX)

XML (eXtensible Markup Language) is a widely used format for representing structured data. In Java, you can parse and process XML documents using various APIs and techniques. Three common methods for XML parsing are SAX (Simple API for XML), DOM (Document Object Model), and StAX (Streaming API for XML). In this article, we will explore these XML parsing methods, their differences, and provide code examples to illustrate their usage.

1. Introduction to XML Parsing

XML parsing involves reading an XML document and extracting data from it. It is a fundamental task when working with XML data, as it allows you to access and manipulate the information within the XML document.

2. SAX (Simple API for XML) Parsing

SAX parsing is an event-driven approach to XML parsing. It processes the XML document sequentially and generates events as it encounters elements, attributes, and content. Applications can register event handlers to respond to these events.

2.1. SAX Example

Let’s see an example of SAX parsing in Java:

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.XMLReaderFactory;

public class SaxParserExample {
    public static void main(String[] args) {
        try {
            XMLReader xmlReader = XMLReaderFactory.createXMLReader();
            xmlReader.setContentHandler(new DefaultHandler() {
                public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
                    System.out.println("Start Element: " + qName);
                }

                public void characters(char[] ch, int start, int length) throws SAXException {
                    System.out.println("Text: " + new String(ch, start, length));
                }
            });

            xmlReader.parse("example.xml");
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

In this example, we use the SAX API to parse an XML file and print the start elements and text content of the XML document.

3. DOM (Document Object Model) Parsing

DOM parsing creates a tree-like representation of the XML document in memory. It allows random access to elements, attributes, and content, making it suitable for both reading and writing XML data.

3.1. DOM Example

Here’s an example of DOM parsing in Java:

import org.w3c.dom.*;
import javax.xml.parsers.*;
import java.io.File;

public class DomParserExample {
    public static void main(String[] args) {
        try {
            File xmlFile = new File("example.xml");
            DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
            DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
            Document doc = dBuilder.parse(xmlFile);

            doc.getDocumentElement().normalize();

            System.out.println("Root element: " + doc.getDocumentElement().getNodeName());
            NodeList nodeList = doc.getElementsByTagName("book");

            for (int i = 0; i < nodeList.getLength(); i++) {
                Node node = nodeList.item(i);
                if (node.getNodeType() == Node.ELEMENT_NODE) {
                    Element element = (Element) node;
                    System.out.println("Title: " + element.getElementsByTagName("title").item(0).getTextContent());
                    System.out.println("Author: " + element.getElementsByTagName("author").item(0).getTextContent());
                }
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

This code reads an XML document, navigates the DOM tree, and extracts data from it, including book titles and authors.

4. StAX (Streaming API for XML) Parsing

StAX parsing is a stream-oriented approach to XML parsing. It provides a pull-style API, allowing the application to pull events from the XML stream as needed. StAX is memory-efficient and suitable for processing large XML documents.

4.1. StAX Example

Here’s an example of StAX parsing in Java:

import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;
import java.io.FileInputStream;

public class StaxParserExample {
    public static void main(String[] args) {
        try {
            XMLInputFactory factory = XMLInputFactory.newInstance();
            XMLStreamReader reader = factory.createXMLStreamReader(new FileInputStream("example.xml"));

            while (reader.hasNext()) {
                int event = reader.next();
                switch (event) {
                    case XMLStreamConstants.START_ELEMENT:
                        if ("title".equals(reader.getLocalName())) {
                            System.out.print("Title: ");
                            reader.next();
                            System.out.println(reader.getText());
                        } else if ("author".equals(reader.getLocalName())) {
                            System.out.print("Author: ");
                            reader.next();
                            System.out.println(reader.getText());
                        }
                        break;
                }
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

This code uses StAX to read the XML document and selectively extract title and author information.

5. Conclusion

XML parsing is a critical part of working with XML data in Java. The choice of SAX, DOM, or StAX parsing depends on the specific requirements of your application. SAX is efficient for large documents and event-driven processing, DOM provides flexibility and random access, and StAX is suitable for memory-efficient streaming. Understanding these parsing methods enables you to effectively work with XML data in Java applications.