Thursday, April 17, 2008

Simplifying libxml

As I mentioned in my previous post, XML handling on the iPhone is through the open source libxml library, which is a procedural C-based API. We can also use libxml in Cocoa, and if you have an eye toward re-using your code on the iPhone, it's probably not a bad idea to use libxml instead of NSXML. It also means I can show you how to use libxml without using the iPhone SDK and violating the NDA. 

To use libxml in your Cocoa projects, you just need to do one thing in your Xcode project settings, which is to add /usr/include/libxml2/** to your Header Search Paths build settings:
This will tell your project where to find the header files for libxml.

Now, to use libxml to parse XML data contained in an instance of NSString, say we had:
NSString *xml; // string containing XML
We need to make sure we import the necessary header file:
#import <libxml/xmlmemory.h>
Next, we have to tell the library to parse this data. Being a procedural C library, libxml knows nothing about NSString, which is an objective-C class class cluster, so we have to convert our NSString into a c string, like this:
xmlDocPtr doc = xmlParseMemory([xml UTF8String], [xml lengthOfBytesUsingEncoding:NSUTF8StringEncoding]);
If this was successful doc will not be NULL. Now, how do we get values from this? Well, we can get the root node like this:
xmlNodePtr root = xmlDocGetRootElement(doc);
xmlNodes are implemented as doubly linked lists, a construct that we don't use much in Objective-C, although it's likely used under the hood in the implementation of some of the collection classes. This abstraction is different from how we commonly work in Objective-C. Ordinarily, we have an object that represents the collection and we call methods on that collection object to get to the objects that it contains. In an old-school linked list like this, the xmlNode object pointed at by an xmlNodePtr represents both the node itself and the collection. Now, there is only going to be one root node in most situations but let's look at how we would get to the children of the root node.

It's actually pretty easy. We declare another node pointer and point it at the nodes's children like so:
xmlNodePtr node = root->children;
Now, the children pointer gives us a single node, but that node is also our access to all of the node's siblings and children. To iterate through all the nodes at this level, we can loop like this:
xmlNodePtr cur_node;
for (cur_node = node; cur_node; cur_node = cur_node->next)
{
 // Do something
}
This loop keeps going until it gets to the last item in the linked list. The last item has NULL as its pointer to next, so the loop stops. Similarly, we can loop through a node's children like this:
xmlNodePtr children;
for (children = node->children; children; children = children->next)
{
 // Do something
}
To get a node's name, we just look at the name member of the xmlNode struct, which is a c string. So, if we want to find a node with a specific name among a node's children, we would do it like this:
NSString *nameToSearchFor = @"Id";
xmlNode *child = NULL;
for (child = node->children; child; child = child->next)
{
if (strcmp((char *)child->name, [aName cStringUsingEncoding:NSUTF8StringEncoding])==0)
{
// Do something with the node
}
}
To determine the value of a node - the value between the begin and end tab in your XML like this: 
<node>value</node>
It's a little more complicated, but not much. To obtain its value as a string:
xmlChar *ret = xmlNodeListGetString(doc, node->children, 1);
Notice that we pass not the node, but the node's children pointer. This is because the text between the begin and end tags counts as a child.

At this point, I think you can see how mixing a procedural C API with Objective-C code looks a little ugly. On the other hand, I believe that Apple must have had good reason to not port NSXML to the iPhone, probably having to do with performance, but that is just conjecture on my part. So, how can we make our code look nicer without imposing significant additional overhead? 

Well, we can create a very low-overhead Objective-C wrapper. In the internals of the class, we use libxml for performance, but then convert to and from Objective-C objects in our accessors and mutators. This gives us a nice compromise between performance and readability, so our code can look like this without adding significant processing overhead:
 NKDLibXMLDocument *doc = [NKDLibXMLDocument documentWithRawXML:result];
NKDLibXMLNode *root = [doc rootNode];
NKDLibXMLNode *user = [root childNamed:@"User"];
NSLog(@"Id: %@", [user valueForChildNamed:@"Id"]);
NSLog(@"FirstName: %@", [user valueForChildNamed:@"FirstName"]);
NSLog(@"LastName: %@", [user valueForChildNamed:@"LastName"]);
Doesn't that look nicer? Isn't it easy to tell what's going on in that code? We hide all the nastiness away in a couple of classes and then never have to deal with it again. Yeah, that's the ticket.

I've still got some work to do on on the wrapper class, but I'll post it in the next day or two.

No comments:

Post a Comment