Coming soon: RefGen, a real open source product from all my musings
August 30, 2006 on 12:16 am | In .NET Coding | No CommentsI’ve been pounding out code now at 6 hour clips twice a day for the past oh… I don’t know how long, when did I start this blog? Finally, after all of my testing and positing, I have something to show for it and to share with the world.
So many people have helped me along the way, and I can’t thank everyone enough, so I figure I’ll do the next best thing, and open source the more universally useful part of my project.
It’s called RefGen, and what it does is cache reflection data about an object, and expose dynamic method based property accessors via a simple interface called the Introspector. It also features UI generation via this object mapping. Basically it allows you to mark up your classes with a couple of simple attributes and generate the UI that will allow you to edit one on your webpage. You get the data into it by just assigning your object to it, and the data back out by reading it. I’m still working on a way to infer control styling from objects that don’t have the attribute markup, so that’s coming soon.
I’m not quite ready to post the code, but in the meantime, here’s a sample of how it works with my beaten to death Person object as a sample. (First, without markup so you can see the Introspector works on any old class)
public class Person
{
string name;
int ssn;
DateTime dob;
public string Name
{
get { return name; }
set { name = value; }
}
public int SSN
{
get { return ssn; }
set { ssn = value; }
}
public DateTime DateOfBirth
{
get { return dob; }
set { dob = value; }
}
}
Nothing special about the class, just a data holder (Domain Object).
Now if I want to have fast dynamic access to the properties by a string name (something I need a lot for ORMappers, generating UI’s and whatnot) then you’ll now have them with the Introspector.
Here’s a sample clip:
Person po = new Person();
po.Name = "Dave Dolan";
po.SSN = 123456789;
po.DateOfBirth = new DateTime(1912, 3, 21);
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
sw.Start();
for(int x = 0; x < 1000000; x++ )
{
Introspector.SetProperty(po, "SSN", x);
}
sw.Stop();
Console.Out.WriteLine("Setting the SSN property (boxing incurred) 1 million times took {0} milliseconds.", sw.ElapsedMilliseconds);
// something like 800 ms or so, depending on the machine you run it on
sw.Reset();
sw.Start();
string Foo = null;
for(int x = 0; x < 1000000; x++)
{
Foo = Introspector.GetProperty(po, "Name");
}
sw.Stop();
// something like 400 milliseconds, again depending on your host
Console.Out.WriteLine("Getting the Name property (no boxing) 1 million times took {0} milliseconds." sw.ElapsedMilliseconds);
foreach(KeyValuePair property in Introspector.PropertyDictionary(po))
{
Console.Out.WriteLine("Property {0} is of type {1}", property.Key, property.Value.Name);
Console.Out.WriteLine("po.{0} = ", Introspector.GetProperty(po, property.Key);
}
// bonus material: Fast creation by string name of the type
// this only works once the type has been "Learned" by Introspector, which
// happens automatically when you pass it an object for the first time to either
// list it's properties, get or set them. It is a ton faster than Activator.CreateInstance().
object NewPerson;
sw.Reset();
sw.Start();
for(int x = 0; x < 1000000; x++)
{
NewPerson = Introspector.CreateObject("Person");
}
sw.Stop();
More to come. This isn’t as useful by itself unless you do a lot of ORMapping or Grokking of objects based on runtime parameters, but add UI generation and it’s money. I’ll show and tell, and hopefully post some code (it’s already working, just not presentable yet) soon.
Edit: Wow, it’s really hard to get code formatting working in wordpress…
The ultimate implementation of DynamicMethod
August 17, 2006 on 1:14 am | In Uncategorized | 1 CommentOk. Here.
DynamicMethod — Now as soon as I close my gaping jaw…
August 16, 2006 on 4:45 pm | In .NET Coding | No CommentsOk, this .net stuff actually still gets better. I’m sitting here strugling with ways to access methods and properties dynamically without using reflection every time, and I was starting to get creative. Of course I knew the right way would be to emit some IL and just suck it up already, but setting up IL emit you have to create the assembly, the module, and the app domain… blah blah… or so I thought. Then I somehow stumbled on Lightweight Code generation. 50-100 times faster than reflection (depending on the method) and only 1.5 - 3 times speed than direct method access… yes you read that right.
I’m not going to pretend to be an expert or anything, but I am going to pretend to be a guy who’s read a lot about it now. Here’s some links for you to do the same.
This guy in particular has the whole thing pretty much mastered. It’s the most complete example I’ve seen, and basically, there isn’t much need for anything else (unless you insist on emitting more IL than necessary because you like stack based programming for some reason.)
What I’m using this thing to do is couple it with custom code attributes to markup shell classes and generate class wrappers that have the ability to write them to and read them from relavance without a single bit of extra code to write. Just straight up reflection to generate some cached IL calls that we zip through to do the writes. Basically the IL is only to call other methods that I’ve already written, but it’s ‘dynamic’ so that I can say store it in a dictionary by a string key, and call it to do some mapping for output, or I can put the delegates in a list and enumerate the list invoking each delegate, each writing a property at a time to the base. It emulates streams a bit. Sure the delegates have the tiniest bit of overhead, but it sure beats reflection and some other form of more arcane representaion of meta data (like xml files, that you’d have to put a path into some other config file, and make sure they’re available to the library at directory X blah blah blah)
Here’s another one: http://robgarrett.com/cs/blogs/software/archive/2005/10/12/1655.aspx
And Don’t forget CodeProject (the best overall site in the world for me — usually not so in depth, but they’re almost always enough to get me started in what I need to learn or do)
Here’s a couple I found out there:
http://www.codeproject.com/csharp/FastMethodInvoker.asp
http://www.codeproject.com/useritems/Dynamic_Code_Generation.asp (herb’s picture looks like John Coltrane on the cover of Blue Train — it actually may be the album cover, I can’t quite tell.)
Metadata, Xml, Fluidity, JSON and the meaning of life
August 5, 2006 on 2:46 am | In .NET Coding | No CommentsIn my various exploits with Relavance, I’m trying just about everything I can think of to get some semblance of CRUD operations for objects, as well as a degree of interoperability with arbitrary platforms all rolled into one. With a relational database, I could just create an XSD file describing my class and run it through XSDObjGen or more recently XSD.exe (v 2.0 — 1.0 and 1.1 were for the birds!) and then use any of the ORMapping frameworks already out there. In particular I’d like to write some .NET based data provider that allows me to interoperate with Ruby on Rails, ultimately with a RESTful interface. Since all the ORMapping stuff pretty much assumes you’re using SQL, I was on my own.
The first approach:
One thing that seems to be virtually universal is the ability of a platform to somehow work with WSDL-described XML webservices. It was the first thing that crossed my mind, and of course, the first way I attempted to implement an interaction.
The Overview:
Create Concrete domain objects to hold the data exposed via public properties that can be easily remoted via an XML webservice. Create a webservice that has methods for each of the so-called CRUD operations.
How I did it:
I started by modelling my domain objects (naked classes built just to hold data with public properties to get and set the fields) in XSD files. After which I’d run them through an XSD class generator (such as XSDObjGen or XSD.exe) and then populate those objects and send them across the wire with XmlSerializer as the center of my .NET provider. I used XSLT to generate my CRUD business objects –> one for each domain object. Each had the ability to Get, Put (which did both an insert and an update, depending on whether or not the ID value was present), Find, and Delete. It also exposed the same methods for the collections of each domain object. The stylesheet also generated a huge ASMX webservice with about a hundred methods exposing each feature of each business object. I had about 26 domain object types, and each had 10 - 12 CRUD-related methods exposed through the service.
The issues I ran into:
This was back before .NET 2.0 was out, and since IXmlSerializable was a bit arcane, and definitely not supported by microsoft, I decided to use the default serializer… This is where the trouble was — first. It was taking aeons to retreive a set of 1000 objects (on the order of 12 seconds.) At this point, i hadn’t seen a profiler before, so I had to deduce that the serialization was the bottleneck because the put and get operations were roughly taking the same amount of time for the same numbers of similar objects. So, I figured without doing something drastic like implementing IXmlSerializable for every object, I’d see if there was anything at all I could do to speed things up another way. I rewrote all of my query logic customizing the operations for each type of object so that I was now sealing my fate, I couldn’t run the code generator again without some serious trouble. I now know of course I could have solved the problem by generating abstract classes, and simply customizing the concrete subclasses, but even though I knew perfectly well how to do inheritence, it wasn’t on my list of ‘things to do’ because I just didn’t think of it. Honestly. Never crossed my mind, not once.
What I did about it:
I scrapped the whole idea. Almost all of it. I still was going to use the webservice idea, but I needed to do something about that crappy serialization. So, along comes .net 2.0, which, since I have an MSDN subscription was immediately available for download. There was practically no documentation anywhere on it, but I found Christian Weyer’s blog article on IXmlSerializable (see my blogroll at right.) I re-wrote my code generator, and this time instead of using the XSD to create domain objects, I used a sort of rough tree structure of nodes that was something like a Composite Pattern. This meant that I could now work with only three types of domain objects, but each was able to deal with a wide range of types. I wrote out sample XML files for each, and ran them through a parser I threw together to generate an associative base that represented the composite structures. I turned my new service loose, and I was ecstatic to see access times about 10 times faster than I had before. There was still one problem though, 10 times faster was still slow. So, I set out to read every article I could on performance, most of which I only half-understood. I wrapped all of my low level calls that repeated blocks of IF calls checking for nulls into a ‘driver library’ that did all of the bounds checking for me. That was good, and it did speed things up a little bit, because of new things like String.IsNullOrEmpty, but I was still having a great deal of difficulty with things like introducing new object structures into the system. I had to go tweak things all over the place every time I added something new. I decided that it was time to get away from all semblance of concrete structures, and move to full tree-only structures instead of my generated domain objects.
Approach Number Two: Pure Composite
Ah, fluidity, I thought I must be a genius or something. Right. Well, I tried my hand again at running with the Composite pattern. I created a composite with an ArrayList based node structure, which technically worked, and did allow for fluidity, but when I ran it with ANY value types at all, I was seeing the performance cut substantially. Well, that about sucked, so I figured I would look around and see how to handle this. I had read about boxing and casting before, and thought that, though people complained about performance, it couldn’t be all that substantial. Well, I was wrong of course. “C#, unified types! Some crock!” I cursed. How in the world would I go about doing this without having to cast everything?
What I did:
I decided to stick to the basic composite design, but instead of using the data values themselves in the nodes, I created ‘typed nodes’ like StringNode, and IntegerNode, all of which implemented a common interface like INode { NodeTypeEnum GetType();} That way, I figured, I’d be able to Get the type of the nodes in any traversal by just invoking the GetType, and I’d know what I had without having to do any sort of reflection based typeof() bore. Well, that did work, sort of. I could traverse the trees of these nodes now, but when it came time to get the data out, I couldn’t easily automate it, because I had no idea whether each node would be a reference type or a value type. I couldn’t use generics, so I thought, because I didn’t know what the Receiver would be, like I can’t say X = Y, without knowing what type I have to declare for X. Sure, System.Object would cover it, but I would still have to get the type and cast it EVERYTIME. That wasn’t any good either.
Approach 3: The Visitor pattern
Ok so, now I was learning a ton, (from my mistakes and misconceptions constantly running me into the ground) but I was still not there yet. I hadn’t found the way to do this. Fluidity still eluded me. I thought that someone must have tried this before, so I went to the microsoft newsgroups and asked how someone else might do it. I got one discourse about how boxing probably wasn’t the issue, and that it was really ok to cast all over… which I ignored, but I shouldn’t have, as I later found out. Then a very kind consultant from TeamB helped me out by pointing me at the Visitor pattern. I could take these typed nodes, and add the ability to Accept() a visitor at each one, and then pass it on to the children in the tree. I wrote my own attempt at it, which technically was the visitor pattern, and had a half cocked employment of c# generics, but I was stuck having to write a new type of visitor for everything.
The breaking point:
I almost gave up and decided to implement the concrete objects, and manually produce the IXmlSerializable for each object, and say, “there is no good way!” Actually I had begun the process of doing so, when I remembered that JSON.org had some other way to represent objects. Javascript Object Notation. Well, ok, last time I checked the only interfaces to that used reflection all over, and expected that your objects already existed in concrete form. That didn’t seem too helpful at the time, so when I finally hit upon the latest version of JSON.Net by Newtonsoft I saw they made a streamreader/streamwriter forward running parser for it. Wow, I wonder if I can use that?!? So I downloaded it and immediately wrote up a quick set of test fixtures for it. Serialized 1000 objects via the writer in 14 milliseconds. Bingo! I tried deserializing it, 1000 objects in 34 ms, great! How was this voodoo happening? I was stunned.
Shocked. So absolutely flabberghasted that I thought that the sky might actually be falling on my head one brick at a time. The library is open source so it only took a minute for me to see that it was using the composite design pattern with (gasp!) System.Object properties EVERYWHERE! I now knew that the gentleman on Microsoft’s newgroups was not just talking nonsenese. HE was right, boxing and casting weren’t even factors in such small numbers of operations, well ok, a little, but not at all the rate determining step. I finally stumbled onto the root cause: I was, in my original approach, buffering the entire object before I was doing anything with it, but instead, I should have been processing it one ‘node’ at a time with a stream. I had done this part right with the custom IXmlSerializable, but I was still reading things ‘domlike’ starting at the top, and walking down, when I wanted to access any of the values, so I didn’t see the advantage. I had pulled the wool over my own eyes, because I KNEW that boxing and casting were deadly… or so I thought. I can easily see if I were doing this for hundreds of thousands or even millions of operations how the added hitwould be a factor, but for me, using relatively modest sized objects and collections, numbering only less than 10,000 in a collection, it barely affected anything at all.
The moral of the story, after all is said and done is one that I consider a two fold lesson:
One: use a stream instead of a document if possible — ever. One nice side effect of a stream is that it can process data before it’s even all arrived. You just have to send each token as it comes in into the writer or reader in question. (System.NET.Sockets work well with this)
Two: Only change one variable at a time. I threw the scientific method out the window when I changed my whole model in two major ways, so I wasn’t able to tell which was the good and which was the bad.
A little PS for you: JSON is really cool! I figured out that I can create a TCP socket and send a stream of bytes across in JSON (optionally encrypted) to any platform that lets me do sockets. Pretty much everyone has library of some sort for turning JSON into objects, so it makes a nice succinct, relatively unambiguous specification of an object, without some of the strange nuances that WSDL forces us to work around for interop with XML. (Like the classic example of .NET to Java apps — you can’t use unsigned types, and you have to send int’s from .net as strings, decorated in the XML and XSD as being of type integer) Give JSON a go if you feel up to learning something new!
Powered by WordPress with Pool theme design by Borja Fernandez.
Entries and comments feeds.
Valid XHTML and CSS. ^Top^
