Wednesday, September 09, 2015

How to read XML nodes if you don't have XML parser?

I used Filemaker to create database applications. On of the application need to access to Web Services. The returned text is XML. Therefore, I use a plugin called BaseElements to read the nodes. However, recent development requires the same functionality to be available in iPad. Filemaker for iPad does not allow plugins.

Fortunately, the xml parser is not a general parser that translates XML to an array. It just returns the text value of the node defined by the path.

To circumvent the issue, I decided to replicate the same functionality of the BaseElement plugin (BE_XPath() function). The function works like this. You pass it a XML and a path and it returns you the result. For exmaple, BE_XPath($xml;"//Data/myname") will return you the text value from the path. If there are more than one "Data" then you need to define it as a sort of array like Data[1].

It is not at all difficult to interpret the xml. There is just one step to do before parsing it. You need to add [Carriage Return] after every node. Usually the xml is returned as a continuous text without a carriage return character. What I need to do is just search for "><" and replace it with ">[CR]<". [CR] is the character code 13.

The result of doing such replacements means each node will be on a line ending with a carriage return. For example,

<myname>xxx</myname><myphone>yyy</myphone>

will be come

<myname>xxx</myname>
<myphone>yyy</myphone>

What you need to do is search for the path in the sequence for each line of the XML. My example uses Data and myname. So I just read each line and look for first. If it is found then I look for . If reaches first then is not found. If the line returns "" before the two nodes are found then the path is not available.

If the path is Data[x], I just need to repeat the search for till the x occurrence is reached. It is as simple as that.

Once the node name is found, it is time to remove the xml node name. Well, it is quite simple. You simply replace the enclosing node names together with its XML markup as "". Don't worry as XML does not allow "<" and ">" to be part of node text.

Obviously, you need to look out for <myname/>. This means that the node exists but no value. But since I am looking for the node text, whether it exists or have no value is not an issue, I just return a blank unless it is specifically required to determine whether the node actually exists.

The above is assuming that the XML is without carriage returns, tabs and spaces. If the xml is generated by hand and is "tidied". the situation more complicated. You will need to reformat the xml to remove all the carriage return, linefeed, and leading and trailing spaces first then proceed to add carriage return as per described above.




No comments:

Post a Comment