How to split Big XML File using Java Application


In this tutorial I am going to explain how to split the Big XML file into multiple XML files based on the specified element/node. In this example I am taking the following employee xml file, it contains name, personal details, work details and project details. Here I am going to split the employee xml file based on the passed element.
Here is the Sample Big XML file:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<employee>
 <name>
  <lastname>Kelly</lastname>
  <firstname>Grace</firstname>
 </name>
 <personaldetails>
  <email>test@gmail.com</email>
  <phone>999-999-9999</phone>
  <address>LIGH-1031</address>
  <city>Kukatpally</city>
  <state>Andhra Pradesh</state>
  <zipcode>500072</zipcode>
 </personaldetails>
 <workdetails>
  <email>test@codesuggestions.com</email>
  <phone>040-66287799</phone>
  <address>Patrikanagar</address>
  <city>Madhapur</city>
  <state>Andhra Pradesh</state>
  <zipcode>500081</zipcode>
 </workdetails>
 <hiredate>October 15, 2005</hiredate>
 <project>
  <product>Printer</product>
  <id>111</id>
  <price>$111.00</price>
 </project>
</employee>  

If we want to split the above file into multiple xml files i.e personaldetails xml, project xml, workdetails xml then we need to use an API called VTD-XML. With this we can splitt the xml file into multiple xmls.
http://vtd-xml.sourceforge.net/codeSample/cs1.html

http://ximpleware.wordpress.com
To run this application you need to download the following jar files:
  1. commons-io-2.4.jar
  2. vtd-xml.jar
Here are the following steps:-

STEP-1

The following packages are imported:
import com.ximpleware.AutoPilot;
import com.ximpleware.VTDGen;
import com.ximpleware.VTDNav;

STEP-2

The Next step is to read the raw bytes of emp.xml into a memory buffer. Notice that the following code snippet throws java.io.IOexception. We have used FileUtils API to read the xml file to string.

String xml = FileUtils.readFileToString(new File("D:\\emp.xml"));

STEP-3

The Next step is to parse the byte buffer using the "parse(...)" method of VTDGen, which throws com.ximpleware.ParseException.

VTDGen vg = new VTDGen();
vg.setDoc(b);
vg.parse(true); // set namespace awareness to true

STEP-4

Next, retrieve the VTDNav object from VTDGen and instantiate the AutoPilot object. This part of the code throws com.ximpleware.NavException.

VTDNav vn = vg.getNav();
AutoPilot ap = new AutoPilot(vn);
ap.selectXPath("/" + rootNode + "/" + splitNode);

Here rootNode is the XML document ROOT NODE and splitNode is the node that we want to split into a separate xml..

STEP-5

The last is to iterate throughout the string. The basic idea is that calling the "iterate(...)" method of AutoPilot returns the VTD index value and moves the cursor automatically to the corresponding node position.

while ((ap.evalXPath()) != -1) {
      long l = vn.getElementFragment();
      out.write(ba, (int) l, (int) (l >> 32));
}

Here is the complete code to splitt the xml file:


public static String splitXML(String xmlString, String rootNode, String splitNode) {
        String content = null;
        try {
            VTDGen vg = new VTDGen();
            vg.setDoc(xmlString.getBytes());
            vg.parse(true);
            VTDNav vn = vg.getNav();
            AutoPilot ap = new AutoPilot(vn);
            ap.selectXPath("/" + rootNode + "/" + splitNode);
            byte[] ba = vn.getXML().getBytes();
            ByteArrayOutputStream out = new ByteArrayOutputStream();
            while ((ap.evalXPath()) != -1) {
                long l = vn.getElementFragment();
                out.write(ba, (int) l, (int) (l >> 32));
            }
            content = out.toString();
        } catch (Exception e) {
            System.err.println("::Exception has occured in splitXML()::" + ExceptionUtils.getRootCauseMessage(e));
        }
        return (content == null) ? "" : content;
    }

We can test the above code by the passing the above XML to the code.
Here is the test code:

public static void main(String[] args) throws IOException {
        String xml = FileUtils.readFileToString(new File("D:\\emp.xml"));
        String test = splitXML(xml, "employee", "workdetails");
        System.out.println("----" + test);
}

Here are the outputs:

Here employee is the ROOT NODE of XML document and workdetails is the splitting node..

<workdetails>
 <email>test@codesuggestions.com</email>
 <phone>040-66287799</phone>
 <address>Patrikanagar</address>
 <city>Madhapur</city>
 <state>Andhra Pradesh</state>
 <zipcode>500081</zipcode>
</workdetails>

Suppose if we pass “personaldetails” as split node then we will get the following output

<personaldetails>
 <email>test@gmail.com</email>
 <phone>999-999-9999</phone>
 <address>LIGH-1031</address>
 <city>Kukatpally</city>
 <state>Andhra Pradesh</state>
 <zipcode>500072</zipcode>
</personaldetails>

No comments

Powered by Blogger.