Learn How to write a parser in Java

How-to-write-a-parser-in-Java

Parser is an important component in any programming language. There are multiple open source parsers available in the market. So, the developer has to select the correct parser as per the requirement. In some situations, the correct parser may not be available freely, which means that the developer would have to develop their own custom parsers in different languages like Java Programming, C++, etc. The reason behind developing custom parsers can be performance issue, complexity, flaws in parsing, not matching the requirement, etc.

In this article we will try to explore how parsing is performed in Java and we will also have a look at different popular Java parsers.

What is parsing and what is a parser?
Before going into details, we must know the meaning of the terms ‘parsing’ and ‘parser’. Let’s have a look.

In simple words, parsing can be defined as the process of breaking down a block of data into smaller pieces based on some pre-defined set of rules. And then interpret, modify or manage the small pieces of data as per the requirement.

And, parser is a software program that is used to break the data into smaller chunks. A parser can be written in any languages based on the requirement.

What are the different types of parsers in Java?
Parsers can be categorized in different ways. In the simplest way, a parser can be either sequential or random. In a sequential parser, only the current parsed data is accessible. It cannot go back or forward. In a random parser, parsed data can be accessed randomly, so moving back and forth is possible. SAX and StAX parsers are examples of sequential parser and XML DOM is an example of a random parser.

In a different way, parsers can be classified as text parser or XML parser. A text parser parses textual data whereas XML parser parses XML/JSON data. In our discussion, we will focus on popular Java DOM and SAX parsers and their examples.

DOM parser and SAX parser
DOM (Document Object Model) defines an interface which can be used to manipulate XML documents. XML parsers are written by implementing this interface. DOM parsers are random parsers which are suitable when:

Information about the structure of the document is important
You need to move back and forth within the structure

DOM parser provides several Java interfaces and methods to work with the XML data. It returns a tree structure of all the elements in a XML document. And the tree can be traversed to work with the data.

SAX (Simple API for XML) is a sequential event-based parser. It parses the XML data in a sequential manner, starting from the root till the end. It does not form a tree structure to parse; rather it sends an event notification while parsing elements. SAX is suitable when

Linear and sequential processing is required
The XML document is too large
Complex nesting in XML is not there
Part of the XML document needs to be manipulated

SAX parser provides interfaces with call-back methods to get event notification during parsing.

How to implement a DOM parser in Java?
In this section, we will work with an XML document and a DOM parser. Following is a sample XML file containing employee’s data of a company. This is the input file to the parser.

Following is a XML document containing employee related data of a company. The root element is ‘company’, which is at the top of the document. After that, ‘employee’ is the next branch element. It contains employee data like name, salary, etc. Parsing will start from the root element onward.

Listing 1: Sample XML document for processing

<?xml version="1.0"?>
<company>
   <employee empid="3931">
      <firstname>Kaushik</firstname>
      <lastname>Pal</lastname>
      <nickname>Kaushik</nickname>
      <salary>85000</salary>
   </employee>
   <employee empid="4932">
      <firstname>Thomas</firstname>
      <lastname>saparoff</lastname>
      <nickname>Thomas</nickname>
      <salary>95000</salary>
   </employee>
   <employee empid="5935">
      <firstname>Nick</firstname>
      <lastname>Doe</lastname>
      <nickname>Nick</nickname>
      <salary>90000</salary>
   </employee>
</company>

Now, let us create a Java parser by using DOM parsing model. Following are the steps to be followed in the program to extract the data.

In the import section get all the XML related packages
Access input data file and create document builder
Extract the root element
Create node list containing ’employee’ node
Iterate through the node list and extract values

Listing 2: Implementing DOM parser

//Create a package
package com.eduonix.xml;
//Import all the packages
import java.io.File;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

//Write a public parser class
public class TestDomParser {
   public static void main(String[] args){
      try {
	  //Access input data file and create document builder
         File inputDataFile = new File("InputData.txt");
         DocumentBuilderFactory dbldrFactory 
            = DocumentBuilderFactory.newInstance();
         DocumentBuilder docBuilder = dbldrFactory.newDocumentBuilder();
         Document docmt = docBuilder.parse(inputDataFile);
         docmt.getDocumentElement().normalize();
         System.out.println("Name of the Root element:" 
            + docmt.getDocumentElement().getNodeName());
	  //Create node list	
         NodeList ndList = docmt.getElementsByTagName("employee");
         System.out.println("*****************************");
         //Iterate through the node list and extract values
		 for (int tempval = 0; tempval < ndList.getLength(); tempval++) {
            Node nd = ndList.item(tempval);
            System.out.println("\n Name of the current element :" 
               + nd.getNodeName());
            if (nd.getNodeType() == Node.ELEMENT_NODE) {
               Element elemnt = (Element) nd;
               System.out.println("Employee ID : " 
                  + elemnt.getAttribute("empid"));
               System.out.println("Employee First Name: " 
                  + elemnt
                  .getElementsByTagName("firstname")
                  .item(0)
                  .getTextContent());
               System.out.println("Employee Last Name: " 
               + elemnt
                  .getElementsByTagName("lastname")
                  .item(0)
                  .getTextContent());
               System.out.println("Employee Nick Name: " 
               + elemnt
                  .getElementsByTagName("nickname")
                  .item(0)
                  .getTextContent());
               System.out.println("Employee Salary: " 
               + elemnt
                  .getElementsByTagName("salary")
                  .item(0)
                  .getTextContent());
            }
         }
      } catch (Exception e) {
         //Catch and print exception - if any
		 e.printStackTrace();
      }
   }
}

Now, compile and run the Java program keeping the XML document in a proper location. The output of the application will be shown as below. It shows all the employee data found in the XML file.

Compiling the source code....
$javac com/eduonix/xml/TestDomParser.java 2>&1
Executing the program....
$java -Xmx128M -Xms16M com/eduonix/xml/TestDomParser 
Name of the Root element:company
*****************************
Name of the current element: employee
Employee ID: 3931
Employee First Name: Kaushik
Employee Last Name: Pal
Employee Nick Name: Kaushik
Employee Salary: 85000

Name of the current element: employee
Employee ID: 4932
Employee First Name: Thomas
Employee Last Name: saparoff
Employee Nick Name: Thomas
Employee Salary: 95000

Name of the current element: employee
Employee ID: 5935
Employee First Name: Nick
Employee Last Name: Doe
Employee Nick Name: Nick
Employee Salary: 90000

Parser best practices
In case of parsers, best practices depend upon the situation and developer’s requirements. A text parser is suitable when you are parsing text input, and then tokenizing/splitting it and making use of the data. XML parsers are suitable when you receive XML/JSON data as an input. Following are some of the best practice rules followed in XML parsing.

DOM parser is best fit when the numbers of elements are under 1000 and you have a requirement of adding/deleting elements. But as DOM creates a tree structure before it starts processing, performance is an important parameter. So, for partial manipulation of an XML document, DOM is not recommended.

SAX is best fit for large XML files with linear structure and unique elements. It is light weight and suitable for shallow xml document parsing. As it does not make any tree structure, the performance is better than DOM parser.

Conclusion:
Parsing is an integral part of any programming languages. Java has its own method of parsing text, XML data. In this article, we have covered different areas of parsing as a generic concept. And then we have talked about specific areas of parsing and parsers like DOM and SAX. In the example section, we have covered DOM parser and its implementation details. At the end of the article we concluded with the best parsing practices in the industry.

Previous articleLearn About Final Classes, Methods, and Variables in Java

Next articleLearn How to create a multi-threaded Server in Java

Learn How to write a parser in Java

LEAVE A REPLY Cancel reply

Exclusive content

The Rise of the Machines: Has ChatGPT Invaded Coder Territory

which coding language lands you with highest paying job ?

Essential Strategies for Troubleshooting Common Errors!

Latest article

The Rise of the Machines: Has ChatGPT Invaded Coder Territory

which coding language lands you with highest paying job ?

Essential Strategies for Troubleshooting Common Errors!

More article

How AI Chatbots Could Revolutionize Access to Complex Therapy

The Rise of the Machines: Has ChatGPT Invaded Coder Territory

which coding language lands you with highest paying job ?

Essential Strategies for Troubleshooting Common Errors!