Dirty Data?

July 23, 2007 on 1:42 pm | In General Programming | 1 Comment

I’ve heard from all sides lately about how there is a growing problem with ‘Dirty Data’…

What is Dirty Data? According to IBM, and various other folks, Dirty Data is data that is extracted from the real world that cannot be nicely integrated into relational models because it’s not organized, or doesn’t fit a particular model. Luckily for us, there are various products out there to ‘clean it’ or techniques recommended by the various database manufactures to squeeze ‘clean’ over ‘dirty data.’

First of all, WHAT?! Do we work for computers or do they work for us? There is no such thing as dirty data, the fact of the matter is that if your system pukes because the data isn’t in the right format, then you’re using the wrong system to process it. The real world isn’t just ‘full of dirty data,’ it’s full of data, and computers are here to help us process it. It’s completely ridiculous for the database companies to say that my data is dirty.

Another sore point with me is the fact that they say that performance problems can stem from ‘unreasonable relationship tracking.’ I want to know too many facets of my data, and it’s not ‘database correct’ of me to want my data and to be able to eat it too, from every angle, in real time. Pardon me, but the fact that YOUR relational database doesn’t handle this kind of scenario well doesn’t mean that my request is unreasonable. If I want to be able to efficiently track back to the original record by any one of any number of my many to one or many to many, or one to many relationships, that’s my requirement, not your place to tell me that it’s unreasonable of me to expect that it can be done.

The problem is that relational database proponents, and in particular the big three commercial implementors, are all engaging in this ‘you can’t do that, in real life’ and ‘its your fault our apps are slow,’ propaganda, and I won’t stand for it.

Just as an example, and I’m sure it’s not the only way, I’m using Relavance, an associative model engine which has none of these limitations.

Even Relavance aside, or even the entire associative model aside, it’s still really stupid of any real computer scientist to think, and especially to decree that certain problems are just not to be solved using computers because they are not well suited to modern technology. If that were the attitude that people had throughout history, then we’d still be toggling in code on the front panels of computers the size of Olympus Mons. Microsoft may be the market leader, and Oracle and IBM may sell a lot of bits and bytes, and tow very heavy loads in the marketplace, but they are not here to tell me what is and is not possible in computer science based on their own models of the universe.

</rant>

Lock Free Coding

July 12, 2007 on 2:16 pm | In .NET Coding | 2 Comments

So I’ve been looking around. One of the things i’ve been messing with is trying to determine how to take advantage of an I/O Completion Port strategy for a client server program.

What’s an I/O Completion Port? Well, it’s basically a queue that fires an event when something is either added to it or comes out of it. The benefit of them is that you can have code wired up through delegates (assuming we’re talking about C#) to respond in some way when a ‘message’ comes into the queue. It can eliminate the need for blocking the thread, or coding an endless loop (though under the hood I guess most every program is some kind of endless loop.)

The benefit of this is I can treat my processors as a pool of thread runners, so for example, when an event fires, I can spawn another thread (provided I’m not at the limit yet) to go handle it, and it will execute in parallel on the other core while the one that captured the input is still, well… capturing input and handing off to other threads. Actually you can have several different kinds of ‘listeners’ hooked up to a completion port each doing something when a different conditional factor is met based on the thing that’s going into or out of the queue.

Problem… HOW do I get result messages back to clients after I’m done processing them? I can easily see how if your ‘result’ message is something that say, prints something to the console… But what if it’s a TCP/Socket where the other endpoint is in egypt and I somehow now have to phone home telling it that it’s ok to pick up the results… I can say, sure… the client can block, and the server can block a thread waiting on the output, but, then we have less of an advantage to being able to use the whole IOCP model to being with… So… I know I’m just missing something, and if anyone out there might have an idea of what that is let me know.

I’ve run into things like COmega, and the CCR, and even this nice little opensource gig on CodePlex, but in all of the examples, the data is like a Multiple input, single output spigot, (printing to the console or broadcasting a single message to multiple clients polling a shared memory queue) What am I missing?

In case I seem extraordinarily dense, Lets say I want to run a database server on the back end, and have my clients connect to an IOCP based command processor, which returns data to the caller. When each client issues a query, I want to get the data back that applies to the client, but having the client keep open a listener, seems odd, and having the client poll around all over ALSO seems odd. Asynch just doesn’t seem like it’s a good idea when we’re trying to get data back from the operation… So then, how do I take advantage of multi core machines if they’re never any better than lock block and stock systems?

Powered by WordPress with Pool theme design by Borja Fernandez.
Entries and comments feeds. Valid XHTML and CSS. ^Top^