What’s with the widgets?
October 22, 2007 on 10:39 pm | In Uncategorized | No CommentsOk, I’m a sort of almost a blogger, and I write things that sometimes roughly resemble blog postings. I admit it takes a certain level of vanity to do that. I like to think I’m not being too ridiculous, and I admit that I only have a very narrow crowd of people who even care what I have to say… Most of them are people I collaborate on projects with, or people who look at it to be nice to me… but certain websites in the endless pursuit of, I’m not sure what, are getting ridiculous today. It’s vain enough to put a ‘Slashdot this!’ or ‘Digg It!’ widget on your page, but if you’re ’sort of a big deal’ then I guess that makes sense. HOWEVER, things can quickly get out of hand, and I kid you not, I actually saw this on a real live site…

Great Idea! That’s right, for your convenience, you can digg it, facebook it, del.icio.us it, newsvine it, stumbleupon it, reddit, yahoo it, fark it, technorati it, furl it, ma.gnolia it, embed it in your site, or just spam the hell out of your closest friends. All with the click of a button. Brilliant! But still a little too much work for me.
I’m going to invent my own button. I’ll put it at the bottom of my posts and sign up for adsense… I’ll call it ‘Give Me Ad Revenue!’, and it will do all of those things with one click. It’s honest, straight to the point, and best of all, it saves you the trouble of clicking on all of the buttons individually. How Convenient!

C# Template for GoldParser Builder and the Morozov Engine
October 15, 2007 on 12:58 am | In .NET Coding | 1 CommentI decided to give a shot to creating a template for Gold Parser Builder that implements a C# instance of a parser making use of the Morozov Engine, which is known in the Gold Parser world as the fastest of the C# engines.
Thanks to Devin Cook for making this all possible by creating Gold Parser Builder, and also to Vladimir Morozov for his contribution of the great Gold engine for us C# geeks. (a few of the structure-like features of this template were.. uh.. borrowed from the Calitha template, but nothing really ‘codey’, just ’structurey’)
You can get the template here, in case you’re interested: http://davedolan.com/downloads/C-SharpMorozovNet2.pgt
Updated I forgot to add the definition of SymbolException and RuleException to the original download (it was correct in the version that went to the list, I just got a little happy cleaning up the script and … chopped two things too many. It is FIXED now.
I’ve also added a .Net 1.x compatible version of this template: http://davedolan.com/downloads/C-SharpMorozovNet1.pgt
Don’t even ask about .Net 3, and if you were thinking ‘what? why not?’ then perhaps you should not be making parsers, even with templates
Lock Free Coding Addendum
October 3, 2007 on 10:38 pm | In .NET Coding | 3 CommentsOk so something strikes me… the only thing that makes sense in multi client lock free coding is to have a shared, un-orderd data structure between all process elements. This strikes me as a little odd because you have to ensure the thread safety of this shared structure… I’m going to have to read more on this, because I’m now throughly confused… Maybe lock free isn’t the best methodology for client-server paradigms (that don’t broadcast data to each client simultaneously, and must maintain open ‘indexed’ tcp connections) Sure you can safeguard a collection, like the hash structure of tcp handles… but… don’t you have to put them in critical regions first? This is what i get for learning this stuff in my “spare time”… you can see how often I get that.
Comment added After the Fact: I apologize for sounding like such an idiot on this subject, I’m just trying to get my head into the multi-core ready ideas, and I’m having a bit of a time at it… All of your emails and comments are very helpful. So Thanks, and I’m trying, I’m really not as dumb as my (lack of) conceptualization of Lock-Freeness would suggest.
Dirty Data?
July 23, 2007 on 1:42 pm | In General Programming | 1 CommentI’ve heard from all sides lately about how there is a growing problem with ‘Dirty Data’…
What is Dirty Data? According to IBM, and various other folks, Dirty Data is data that is extracted from the real world that cannot be nicely integrated into relational models because it’s not organized, or doesn’t fit a particular model. Luckily for us, there are various products out there to ‘clean it’ or techniques recommended by the various database manufactures to squeeze ‘clean’ over ‘dirty data.’
First of all, WHAT?! Do we work for computers or do they work for us? There is no such thing as dirty data, the fact of the matter is that if your system pukes because the data isn’t in the right format, then you’re using the wrong system to process it. The real world isn’t just ‘full of dirty data,’ it’s full of data, and computers are here to help us process it. It’s completely ridiculous for the database companies to say that my data is dirty.
Another sore point with me is the fact that they say that performance problems can stem from ‘unreasonable relationship tracking.’ I want to know too many facets of my data, and it’s not ‘database correct’ of me to want my data and to be able to eat it too, from every angle, in real time. Pardon me, but the fact that YOUR relational database doesn’t handle this kind of scenario well doesn’t mean that my request is unreasonable. If I want to be able to efficiently track back to the original record by any one of any number of my many to one or many to many, or one to many relationships, that’s my requirement, not your place to tell me that it’s unreasonable of me to expect that it can be done.
The problem is that relational database proponents, and in particular the big three commercial implementors, are all engaging in this ‘you can’t do that, in real life’ and ‘its your fault our apps are slow,’ propaganda, and I won’t stand for it.
Just as an example, and I’m sure it’s not the only way, I’m using Relavance, an associative model engine which has none of these limitations.
Even Relavance aside, or even the entire associative model aside, it’s still really stupid of any real computer scientist to think, and especially to decree that certain problems are just not to be solved using computers because they are not well suited to modern technology. If that were the attitude that people had throughout history, then we’d still be toggling in code on the front panels of computers the size of Olympus Mons. Microsoft may be the market leader, and Oracle and IBM may sell a lot of bits and bytes, and tow very heavy loads in the marketplace, but they are not here to tell me what is and is not possible in computer science based on their own models of the universe.
</rant>
Lock Free Coding
July 12, 2007 on 2:16 pm | In .NET Coding | 2 CommentsSo I’ve been looking around. One of the things i’ve been messing with is trying to determine how to take advantage of an I/O Completion Port strategy for a client server program.
What’s an I/O Completion Port? Well, it’s basically a queue that fires an event when something is either added to it or comes out of it. The benefit of them is that you can have code wired up through delegates (assuming we’re talking about C#) to respond in some way when a ‘message’ comes into the queue. It can eliminate the need for blocking the thread, or coding an endless loop (though under the hood I guess most every program is some kind of endless loop.)
The benefit of this is I can treat my processors as a pool of thread runners, so for example, when an event fires, I can spawn another thread (provided I’m not at the limit yet) to go handle it, and it will execute in parallel on the other core while the one that captured the input is still, well… capturing input and handing off to other threads. Actually you can have several different kinds of ‘listeners’ hooked up to a completion port each doing something when a different conditional factor is met based on the thing that’s going into or out of the queue.
Problem… HOW do I get result messages back to clients after I’m done processing them? I can easily see how if your ‘result’ message is something that say, prints something to the console… But what if it’s a TCP/Socket where the other endpoint is in egypt and I somehow now have to phone home telling it that it’s ok to pick up the results… I can say, sure… the client can block, and the server can block a thread waiting on the output, but, then we have less of an advantage to being able to use the whole IOCP model to being with… So… I know I’m just missing something, and if anyone out there might have an idea of what that is let me know.
I’ve run into things like COmega, and the CCR, and even this nice little opensource gig on CodePlex, but in all of the examples, the data is like a Multiple input, single output spigot, (printing to the console or broadcasting a single message to multiple clients polling a shared memory queue) What am I missing?
In case I seem extraordinarily dense, Lets say I want to run a database server on the back end, and have my clients connect to an IOCP based command processor, which returns data to the caller. When each client issues a query, I want to get the data back that applies to the client, but having the client keep open a listener, seems odd, and having the client poll around all over ALSO seems odd. Asynch just doesn’t seem like it’s a good idea when we’re trying to get data back from the operation… So then, how do I take advantage of multi core machines if they’re never any better than lock block and stock systems?
I must be Slow… (DLR)
June 2, 2007 on 10:34 pm | In .NET Coding | No CommentsOk, well I was so anxious to hear about the DLR, that I forgot to look around for articles on it. I missed this one [The One True Object (Part One)] by a month.
And this one [First DLR Talk On The Web - Live (pre-recorded!)].
And, wow, I was really sleeping folks, because this one [A Dynamic Runtime DLR] is even days older than those two.
I’m not just a link poster, honest. I do a lot of writing too, it’s just that I’m most definitely not an expert on this subject, even though I’m trying here. I’m so happy stuff is finally coming out, but I just don’t have all day to play like I wish I did, so I have to rely on blogisms (which I have to admit, coming straight from the library authors’ desk is great,) but I have a feeling they’re saving the best for a book or something. Oh yeah, and they’re probably coding. Go figure.
I’m not so sure that I would be able to work under those conditions. Picture yourself in this scenario: It’s like the world is watching with ticking stop watches, just waiting for you to stop reading that google, I mean… er… msn.. news article. “Check your email later, man, we’re waiting for the DLR with examples and doco!!!” Yeah. See?
So my hats off to Jim for taking the time to write about this. Whatever happened to that John Lam cat? Oh yeah, he’s a rock star now! (In all seriousness, I’d probably live in a hole if I were hired off to Microsoft on such a high profile project.)
In my own various attempts to write a ‘dynamic’ language of my own, I’ve always tried to do what the folks do in C++ land.. they make tagged unions or wrappers for data types and objects, and then have some sort of syntax tree for dealing with the cases of each… Even take a look at our friend TreeCC, which is, at least for it’s day, one of the most excellent tools for generating AST nodes en mass (though not necessarily for dynamic languages, but just anyway for argument) it even does the tagged union thing. So naturally I hadn’t even considered the possibility that building actual CLR types (a la the one true object) in .NET meta-data and IL is the way to go… I just figured I ought to write my own ‘runtime support’ which I figured must include a type system that marshals to the real type system. Hah. This is where I learned a valuable lesson. “Look at the source to IronPython.”
In particular, I’d take a look at the illustration in UserTypes.cs. The one comment that sticks out nice and blonde at a brunette convention:
“UserType represents the type of new-style Python classes (which can inherit from built-in types). ”
Who would have thought that some of the most interesting information in a comment would be the aside in parenthesis?
As I said before…
May 1, 2007 on 4:01 pm | In General Programming | No CommentsI mentioned in the middle of one of my other posts that I was preparing an article on Ruby. I sort of alluded to the fact that I’d post a link, so I suppose I’ll do that now, (although it’s more likely that you’ve come to this blog FROM the article than the other way around.) Ruby for C# Geeks (DevX.com) It’s not really all that in-depth, but just to make C# folks aware of what else is out there.
Programmers at the old folks home in 2065
April 28, 2007 on 1:01 am | In .NET Coding | No Comments“Back in my day… We typed in complete sentences… uphill both ways.”
The DLR for .NET?
April 25, 2007 on 1:29 pm | In .NET Coding | 1 CommentA post in ZDNet points out that Microsoft is releasing a dynamic language layer intended to run on the CLR. With the acquisition of John Lam and Jim Hugunin, they pretty much have their own little market cornerd… perhaps this is why Mr Lam wasn’t releasing any more drops of RubyCLR of late…
Very cool, and I’ll be very excited to have a look at this!
The Associative Model of Knowledge
April 10, 2007 on 1:27 am | In General Programming | 2 CommentsForget everything you know about storing data. Particularly the bits about retrieving it after you’ve stored it. I said forget it, and I know you didn’t. So, try again, forget it.
Rather than try to start by comparing this method to what you already know how to do, I’m going to just pretend that you’ve always wondered how to store data, and never have done it before.
Applications acquire data. They also retrieve data and display it for users, or do something otherwise useful with it. Sometimes it’s the users who give the data to the application directly. Actually that happens quite a lot.
We store different bits of information in different places, and we do so for the purpose of organizing things that go together. We go to all of this bother because we’re likely to want to get them back out together. What that means to you and me as developers is that we spend quite a lot of time thinking about the kinds of things we want to get out of the system before we go putting them in. Arguably we try to think of everything we’d like to get out of it beforehand, and that’s how we’ll know we’re done with our model – when it provides us with a place and method to put everything we need to know into the storage and get it back out on demand. We’d use this information of what we need to get out later as the basis for a ’schema.’ A layout of how all the data looks when it’s just sitting around waiting for someone to get it back out again.
Great! That sounds simple enough! Ah, but life isn’t so forgiving, and neither is programming. Today we think we know what we want, and that’s it, but tomorrow, that all might change. In fact, if you ask enough pointy haired bosses about what they want from the data they’re paying you to store in a database, then you’re likely to get a few things in common, and a bunch of special requests. Some of what they want is ’stuff’ that needs to be in there, but also, they want ‘ways to organize things’ on the way out especially. Some people would call these differing requirements views, or reports.
Fine, so we can’t possibly think of all of the kinds of views and reports, so we might as well just not think about them ahead of time, right? We’ll just store everything as generally as possible while leaving some things together that obviously go together in pretty much all cases, and that way, we can let the person getting things out tell us what they want at that time, and never worry about it again. Especially not that would push the application delivery date back! Right? WRONG!
When we store data, each thing we’d like to know about is called a Concept. And all of the different aspects of the concepts are different sorts of ‘lights in which we’d like to view these concepts,’ which we might term a Context. So in life we have concepts, or things or ideas of interest, and contexts, which are the glasses through which we view these concepts.
When we want to store a concept in our knowledge store, we put it in there, and designate it to be accessible in a particular context. Of course it’s not really friendly to the user to just pull every concept out with every time we want to retrieve information, so the idea of limiting a concept to a particular context seems to make a lot of sense. We do this in our everyday conversation, even sometimes a single word can mean many things! Surely that’s an idea we’d have difficulty describing in a file… Or is it?
Lucky for us, Everyone in the Universe uses Associative Knowledgebases, and everyone knows that in an associative knowledgebase, you add concepts to the base, associated with a particular context, and from there you are able to retrieve concepts relative to the particular context of reference. You can have a single concept be availabe in as many contexts as are needed, and you don’t need to keep putting multiple representations of the same concept in over and over again to get the job done. Each concept has a bit of data that goes with it, and it can be associated to other concepts to represent what sorts of ideas it shares with its associates. Now an association isn’t just a one type deal, you can associate on different ‘axes’ of association to represent different aspects of what the two concepts have in common. Who knows, two concept items may also be associated to the same context, so when you go to retrieve information relative to a context, you’d get them both anyway! As you may have picked up, each ‘concept’ can be represented by a single ‘item’ or a set of ‘items’, which are tiny units of free-floating information. Every item is associated to at least one context, and some associated to thousands of contexts, just depending on what you need from it. Each item is universally addressable by it’s own unique four ordinal vector. Each item stores its own value, as well as the vectors of each of its associates, grouped by which axis they happen to be associated on. The items can be associated to other items, or to entire contexts, it’s completely up to us! We can associate multiple items on the same axis or split them up, and there is virtually no limit to the number of associates we can have in any particular axis. (Could be anywhere from zero to several million or even sometimes several billion other concepts.) Since we have this wildy variable method of associating one uniquely addressable item with any number of uniquely addressible items, on any number of axes, it’s lucky for us that we don’t need to know how many of what kind of associates we can have right up front. Even items in the same context can have different numbers of associates, and there is no negative consequence to the other items. It’s cool that nobody decided that we’d have to allocate null associates to represent items without associates, because when we got over a hundred, well that would just be ridiculous! We certainly can’t easily convert items into something that fits completely into an Excel spreadsheet! That’s quite alright, because we have a lot at our disposal now. (And if we REALLY like Excel spreadsheets, then we can limit the information coming out in such a way that it fits, complete all of the totally unnatural quirkiness of any spreadsheet, some empty columns in some rows, and flat tuples that are all but meaningless without some formal explanation of what they are and what they mean — like any Excel spreadsheet, pointy haired bosses love those.)
Of course we can predetermine some of the ways in which we’ll have to establish associations based on the context we wish to use to represent our particular concept, and some of them will be common to all items in the same context. These we’ll call ‘explicit mappings’ of information. For example if you’re adding the concept of an Apple to the Fruit context, you’ll likely want to also associate it to a member of the Plants context to indicate what it grows on (we can even say that this association exists on the ‘Fruit Grows On’ axis.) These are simple, and the rules are always the same.
We also have the ability to examine concepts we wish to associate with a context AS THEY ARE BEING ADDED TO THE BASE and apply a set of rules to them to determine if, while we’re at it, we ought to associate them with any other concepts or contexts. In this way, we can define a list of rules which dynamically determine to which other contexts and concepts we would like to associate with this concept, based on well… what it is. Contexts and concepts don’t have a set structure. They can be anything we like them to be, and represent any aspect we’d like them to. For example, sure we have a context called Fruit which is associated to all of the things that you and I call “Fruit” like Apples, Pears, and Bananas (a personal favorite of mine!) but you can also talk about a context that’s more abstract like ‘Found in Africa’. So all of the things we know about that are ‘found in Africa’ can be associated to the Found in Africa context. That means we can have some of the Fruits, and some of the Plants all be associated to the same context of ‘Found in Africa’, so when we ask the Knowledge base for what sort of concepts it knows about in the context of ‘found in Africa’ we get both some Fruits and some Plants. Isn’t data storage great!?!
Well, pointy haired bosses are never satisfied, so when they see you can do such wonderful cartwheels with individually simple data, they still want more.
“I’d like to be able to find what sort of Fruits and Plants we can find in Africa that are legal to export to the United States.”
Just when we thought we’d thought of everything, we now have to add something else!
“Ok,” we tell the boss, “No Problem!” It’s a good thing that we don’t have to completely redesign the knowledge base just to add a new context for some concepts we already have! We can just create a new context called “Exportable To the US” and associate the proper concepts with that context as well. Great!
So, it’s also relatively simple now to construct ‘queries’ for this kind of data store. When we write the application to show the boss what sorts of Fruits and Plants are Exportable to Africa, then we just have to retrieve all of the items associated with the context of Exportable to the US and also Found in Africa, and then make sure that the results are either Fruits or Plants. It’s rather like drawing a venn diagram. Since we used rules to make the associations on the way into the storage system, we don’t have to do anything but grab the ready and waiting list of associates we need on the way out! Sure it takes a tiny bit of effort when the concepts are inserted into the store for the rules to figure out where they are to be associated, but that all runs asynchronously, and we didn’t care about it when we added it. Only now when the boss comes to get his answers does the effort really matter! So, with almost no effort at all, we’ve given the boss what he wants. We can only hope that there aren’t too many more bosses to satisfy today, because we were supposed to go out for a drink after work. It’s a good thing everyone uses Associative Knowledgebases, or who knows how long we’d be here?!?
A Post Facto I forgot to mention, this is a description of a working system. It’s called Relavance, and it really does all of the wonderful things I just described as if they were hypothetical.
Powered by WordPress with Pool theme design by Borja Fernandez.
Entries and comments feeds.
Valid XHTML and CSS. ^Top^
