Metadata, Xml, Fluidity, JSON and the meaning of life
August 5, 2006 on 2:46 am | In .NET Coding |In my various exploits with Relavance, I’m trying just about everything I can think of to get some semblance of CRUD operations for objects, as well as a degree of interoperability with arbitrary platforms all rolled into one. With a relational database, I could just create an XSD file describing my class and run it through XSDObjGen or more recently XSD.exe (v 2.0 — 1.0 and 1.1 were for the birds!) and then use any of the ORMapping frameworks already out there. In particular I’d like to write some .NET based data provider that allows me to interoperate with Ruby on Rails, ultimately with a RESTful interface. Since all the ORMapping stuff pretty much assumes you’re using SQL, I was on my own.
The first approach:
One thing that seems to be virtually universal is the ability of a platform to somehow work with WSDL-described XML webservices. It was the first thing that crossed my mind, and of course, the first way I attempted to implement an interaction.
The Overview:
Create Concrete domain objects to hold the data exposed via public properties that can be easily remoted via an XML webservice. Create a webservice that has methods for each of the so-called CRUD operations.
How I did it:
I started by modelling my domain objects (naked classes built just to hold data with public properties to get and set the fields) in XSD files. After which I’d run them through an XSD class generator (such as XSDObjGen or XSD.exe) and then populate those objects and send them across the wire with XmlSerializer as the center of my .NET provider. I used XSLT to generate my CRUD business objects –> one for each domain object. Each had the ability to Get, Put (which did both an insert and an update, depending on whether or not the ID value was present), Find, and Delete. It also exposed the same methods for the collections of each domain object. The stylesheet also generated a huge ASMX webservice with about a hundred methods exposing each feature of each business object. I had about 26 domain object types, and each had 10 - 12 CRUD-related methods exposed through the service.
The issues I ran into:
This was back before .NET 2.0 was out, and since IXmlSerializable was a bit arcane, and definitely not supported by microsoft, I decided to use the default serializer… This is where the trouble was — first. It was taking aeons to retreive a set of 1000 objects (on the order of 12 seconds.) At this point, i hadn’t seen a profiler before, so I had to deduce that the serialization was the bottleneck because the put and get operations were roughly taking the same amount of time for the same numbers of similar objects. So, I figured without doing something drastic like implementing IXmlSerializable for every object, I’d see if there was anything at all I could do to speed things up another way. I rewrote all of my query logic customizing the operations for each type of object so that I was now sealing my fate, I couldn’t run the code generator again without some serious trouble. I now know of course I could have solved the problem by generating abstract classes, and simply customizing the concrete subclasses, but even though I knew perfectly well how to do inheritence, it wasn’t on my list of ‘things to do’ because I just didn’t think of it. Honestly. Never crossed my mind, not once.
What I did about it:
I scrapped the whole idea. Almost all of it. I still was going to use the webservice idea, but I needed to do something about that crappy serialization. So, along comes .net 2.0, which, since I have an MSDN subscription was immediately available for download. There was practically no documentation anywhere on it, but I found Christian Weyer’s blog article on IXmlSerializable (see my blogroll at right.) I re-wrote my code generator, and this time instead of using the XSD to create domain objects, I used a sort of rough tree structure of nodes that was something like a Composite Pattern. This meant that I could now work with only three types of domain objects, but each was able to deal with a wide range of types. I wrote out sample XML files for each, and ran them through a parser I threw together to generate an associative base that represented the composite structures. I turned my new service loose, and I was ecstatic to see access times about 10 times faster than I had before. There was still one problem though, 10 times faster was still slow. So, I set out to read every article I could on performance, most of which I only half-understood. I wrapped all of my low level calls that repeated blocks of IF calls checking for nulls into a ‘driver library’ that did all of the bounds checking for me. That was good, and it did speed things up a little bit, because of new things like String.IsNullOrEmpty, but I was still having a great deal of difficulty with things like introducing new object structures into the system. I had to go tweak things all over the place every time I added something new. I decided that it was time to get away from all semblance of concrete structures, and move to full tree-only structures instead of my generated domain objects.
Approach Number Two: Pure Composite
Ah, fluidity, I thought I must be a genius or something. Right. Well, I tried my hand again at running with the Composite pattern. I created a composite with an ArrayList based node structure, which technically worked, and did allow for fluidity, but when I ran it with ANY value types at all, I was seeing the performance cut substantially. Well, that about sucked, so I figured I would look around and see how to handle this. I had read about boxing and casting before, and thought that, though people complained about performance, it couldn’t be all that substantial. Well, I was wrong of course. “C#, unified types! Some crock!” I cursed. How in the world would I go about doing this without having to cast everything?
What I did:
I decided to stick to the basic composite design, but instead of using the data values themselves in the nodes, I created ‘typed nodes’ like StringNode, and IntegerNode, all of which implemented a common interface like INode { NodeTypeEnum GetType();} That way, I figured, I’d be able to Get the type of the nodes in any traversal by just invoking the GetType, and I’d know what I had without having to do any sort of reflection based typeof() bore. Well, that did work, sort of. I could traverse the trees of these nodes now, but when it came time to get the data out, I couldn’t easily automate it, because I had no idea whether each node would be a reference type or a value type. I couldn’t use generics, so I thought, because I didn’t know what the Receiver would be, like I can’t say X = Y, without knowing what type I have to declare for X. Sure, System.Object would cover it, but I would still have to get the type and cast it EVERYTIME. That wasn’t any good either.
Approach 3: The Visitor pattern
Ok so, now I was learning a ton, (from my mistakes and misconceptions constantly running me into the ground) but I was still not there yet. I hadn’t found the way to do this. Fluidity still eluded me. I thought that someone must have tried this before, so I went to the microsoft newsgroups and asked how someone else might do it. I got one discourse about how boxing probably wasn’t the issue, and that it was really ok to cast all over… which I ignored, but I shouldn’t have, as I later found out. Then a very kind consultant from TeamB helped me out by pointing me at the Visitor pattern. I could take these typed nodes, and add the ability to Accept() a visitor at each one, and then pass it on to the children in the tree. I wrote my own attempt at it, which technically was the visitor pattern, and had a half cocked employment of c# generics, but I was stuck having to write a new type of visitor for everything.
The breaking point:
I almost gave up and decided to implement the concrete objects, and manually produce the IXmlSerializable for each object, and say, “there is no good way!” Actually I had begun the process of doing so, when I remembered that JSON.org had some other way to represent objects. Javascript Object Notation. Well, ok, last time I checked the only interfaces to that used reflection all over, and expected that your objects already existed in concrete form. That didn’t seem too helpful at the time, so when I finally hit upon the latest version of JSON.Net by Newtonsoft I saw they made a streamreader/streamwriter forward running parser for it. Wow, I wonder if I can use that?!? So I downloaded it and immediately wrote up a quick set of test fixtures for it. Serialized 1000 objects via the writer in 14 milliseconds. Bingo! I tried deserializing it, 1000 objects in 34 ms, great! How was this voodoo happening? I was stunned.
Shocked. So absolutely flabberghasted that I thought that the sky might actually be falling on my head one brick at a time. The library is open source so it only took a minute for me to see that it was using the composite design pattern with (gasp!) System.Object properties EVERYWHERE! I now knew that the gentleman on Microsoft’s newgroups was not just talking nonsenese. HE was right, boxing and casting weren’t even factors in such small numbers of operations, well ok, a little, but not at all the rate determining step. I finally stumbled onto the root cause: I was, in my original approach, buffering the entire object before I was doing anything with it, but instead, I should have been processing it one ‘node’ at a time with a stream. I had done this part right with the custom IXmlSerializable, but I was still reading things ‘domlike’ starting at the top, and walking down, when I wanted to access any of the values, so I didn’t see the advantage. I had pulled the wool over my own eyes, because I KNEW that boxing and casting were deadly… or so I thought. I can easily see if I were doing this for hundreds of thousands or even millions of operations how the added hitwould be a factor, but for me, using relatively modest sized objects and collections, numbering only less than 10,000 in a collection, it barely affected anything at all.
The moral of the story, after all is said and done is one that I consider a two fold lesson:
One: use a stream instead of a document if possible — ever. One nice side effect of a stream is that it can process data before it’s even all arrived. You just have to send each token as it comes in into the writer or reader in question. (System.NET.Sockets work well with this)
Two: Only change one variable at a time. I threw the scientific method out the window when I changed my whole model in two major ways, so I wasn’t able to tell which was the good and which was the bad.
A little PS for you: JSON is really cool! I figured out that I can create a TCP socket and send a stream of bytes across in JSON (optionally encrypted) to any platform that lets me do sockets. Pretty much everyone has library of some sort for turning JSON into objects, so it makes a nice succinct, relatively unambiguous specification of an object, without some of the strange nuances that WSDL forces us to work around for interop with XML. (Like the classic example of .NET to Java apps — you can’t use unsigned types, and you have to send int’s from .net as strings, decorated in the XML and XSD as being of type integer) Give JSON a go if you feel up to learning something new!
No Comments yet »
RSS feed for comments on this post. TrackBack URI
Leave a comment
Powered by WordPress with Pool theme design by Borja Fernandez.
Entries and comments feeds.
Valid XHTML and CSS. ^Top^
