Dirty Data?

July 23, 2007 on 1:42 pm | In General Programming |

I’ve heard from all sides lately about how there is a growing problem with ‘Dirty Data’…

What is Dirty Data? According to IBM, and various other folks, Dirty Data is data that is extracted from the real world that cannot be nicely integrated into relational models because it’s not organized, or doesn’t fit a particular model. Luckily for us, there are various products out there to ‘clean it’ or techniques recommended by the various database manufactures to squeeze ‘clean’ over ‘dirty data.’

First of all, WHAT?! Do we work for computers or do they work for us? There is no such thing as dirty data, the fact of the matter is that if your system pukes because the data isn’t in the right format, then you’re using the wrong system to process it. The real world isn’t just ‘full of dirty data,’ it’s full of data, and computers are here to help us process it. It’s completely ridiculous for the database companies to say that my data is dirty.

Another sore point with me is the fact that they say that performance problems can stem from ‘unreasonable relationship tracking.’ I want to know too many facets of my data, and it’s not ‘database correct’ of me to want my data and to be able to eat it too, from every angle, in real time. Pardon me, but the fact that YOUR relational database doesn’t handle this kind of scenario well doesn’t mean that my request is unreasonable. If I want to be able to efficiently track back to the original record by any one of any number of my many to one or many to many, or one to many relationships, that’s my requirement, not your place to tell me that it’s unreasonable of me to expect that it can be done.

The problem is that relational database proponents, and in particular the big three commercial implementors, are all engaging in this ‘you can’t do that, in real life’ and ‘its your fault our apps are slow,’ propaganda, and I won’t stand for it.

Just as an example, and I’m sure it’s not the only way, I’m using Relavance, an associative model engine which has none of these limitations.

Even Relavance aside, or even the entire associative model aside, it’s still really stupid of any real computer scientist to think, and especially to decree that certain problems are just not to be solved using computers because they are not well suited to modern technology. If that were the attitude that people had throughout history, then we’d still be toggling in code on the front panels of computers the size of Olympus Mons. Microsoft may be the market leader, and Oracle and IBM may sell a lot of bits and bytes, and tow very heavy loads in the marketplace, but they are not here to tell me what is and is not possible in computer science based on their own models of the universe.

</rant>

1 Comment »

RSS feed for comments on this post. TrackBack URI

  1. Dude, I totally agree. But look at the source, IBM. I stopped going to them for information a long time ago. Either they have no collective intelligence or they just get their dumbest people to write their articles. Either way, good rant!

    Comment by slonkak — July 25, 2007 #

Leave a comment

XHTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Powered by WordPress with Pool theme design by Borja Fernandez.
Entries and comments feeds. Valid XHTML and CSS. ^Top^