The Associative Model of Knowledge

April 10, 2007 on 1:27 am | In General Programming |

Forget everything you know about storing data. Particularly the bits about retrieving it after you’ve stored it. I said forget it, and I know you didn’t. So, try again, forget it.

Rather than try to start by comparing this method to what you already know how to do, I’m going to just pretend that you’ve always wondered how to store data, and never have done it before.

Applications acquire data. They also retrieve data and display it for users, or do something otherwise useful with it. Sometimes it’s the users who give the data to the application directly. Actually that happens quite a lot.

We store different bits of information in different places, and we do so for the purpose of organizing things that go together. We go to all of this bother because we’re likely to want to get them back out together. What that means to you and me as developers is that we spend quite a lot of time thinking about the kinds of things we want to get out of the system before we go putting them in. Arguably we try to think of everything we’d like to get out of it beforehand, and that’s how we’ll know we’re done with our model – when it provides us with a place and method to put everything we need to know into the storage and get it back out on demand. We’d use this information of what we need to get out later as the basis for a ’schema.’ A layout of how all the data looks when it’s just sitting around waiting for someone to get it back out again.

Great! That sounds simple enough! Ah, but life isn’t so forgiving, and neither is programming. Today we think we know what we want, and that’s it, but tomorrow, that all might change. In fact, if you ask enough pointy haired bosses about what they want from the data they’re paying you to store in a database, then you’re likely to get a few things in common, and a bunch of special requests. Some of what they want is ’stuff’ that needs to be in there, but also, they want ‘ways to organize things’ on the way out especially. Some people would call these differing requirements views, or reports.

Fine, so we can’t possibly think of all of the kinds of views and reports, so we might as well just not think about them ahead of time, right? We’ll just store everything as generally as possible while leaving some things together that obviously go together in pretty much all cases, and that way, we can let the person getting things out tell us what they want at that time, and never worry about it again. Especially not that would push the application delivery date back! Right? WRONG!

When we store data, each thing we’d like to know about is called a Concept. And all of the different aspects of the concepts are different sorts of ‘lights in which we’d like to view these concepts,’ which we might term a Context. So in life we have concepts, or things or ideas of interest, and contexts, which are the glasses through which we view these concepts.

When we want to store a concept in our knowledge store, we put it in there, and designate it to be accessible in a particular context. Of course it’s not really friendly to the user to just pull every concept out with every time we want to retrieve information, so the idea of limiting a concept to a particular context seems to make a lot of sense. We do this in our everyday conversation, even sometimes a single word can mean many things! Surely that’s an idea we’d have difficulty describing in a file… Or is it?

Lucky for us, Everyone in the Universe uses Associative Knowledgebases, and everyone knows that in an associative knowledgebase, you add concepts to the base, associated with a particular context, and from there you are able to retrieve concepts relative to the particular context of reference. You can have a single concept be availabe in as many contexts as are needed, and you don’t need to keep putting multiple representations of the same concept in over and over again to get the job done. Each concept has a bit of data that goes with it, and it can be associated to other concepts to represent what sorts of ideas it shares with its associates. Now an association isn’t just a one type deal, you can associate on different ‘axes’ of association to represent different aspects of what the two concepts have in common. Who knows, two concept items may also be associated to the same context, so when you go to retrieve information relative to a context, you’d get them both anyway! As you may have picked up, each ‘concept’ can be represented by a single ‘item’ or a set of ‘items’, which are tiny units of free-floating information. Every item is associated to at least one context, and some associated to thousands of contexts, just depending on what you need from it. Each item is universally addressable by it’s own unique four ordinal vector. Each item stores its own value, as well as the vectors of each of its associates, grouped by which axis they happen to be associated on. The items can be associated to other items, or to entire contexts, it’s completely up to us! We can associate multiple items on the same axis or split them up, and there is virtually no limit to the number of associates we can have in any particular axis. (Could be anywhere from zero to several million or even sometimes several billion other concepts.) Since we have this wildy variable method of associating one uniquely addressable item with any number of uniquely addressible items, on any number of axes, it’s lucky for us that we don’t need to know how many of what kind of associates we can have right up front. Even items in the same context can have different numbers of associates, and there is no negative consequence to the other items. It’s cool that nobody decided that we’d have to allocate null associates to represent items without associates, because when we got over a hundred, well that would just be ridiculous! We certainly can’t easily convert items into something that fits completely into an Excel spreadsheet! That’s quite alright, because we have a lot at our disposal now. (And if we REALLY like Excel spreadsheets, then we can limit the information coming out in such a way that it fits, complete all of the totally unnatural quirkiness of any spreadsheet, some empty columns in some rows, and flat tuples that are all but meaningless without some formal explanation of what they are and what they mean — like any Excel spreadsheet, pointy haired bosses love those.)

Of course we can predetermine some of the ways in which we’ll have to establish associations based on the context we wish to use to represent our particular concept, and some of them will be common to all items in the same context. These we’ll call ‘explicit mappings’ of information. For example if you’re adding the concept of an Apple to the Fruit context, you’ll likely want to also associate it to a member of the Plants context to indicate what it grows on (we can even say that this association exists on the ‘Fruit Grows On’ axis.) These are simple, and the rules are always the same.

We also have the ability to examine concepts we wish to associate with a context AS THEY ARE BEING ADDED TO THE BASE and apply a set of rules to them to determine if, while we’re at it, we ought to associate them with any other concepts or contexts. In this way, we can define a list of rules which dynamically determine to which other contexts and concepts we would like to associate with this concept, based on well… what it is. Contexts and concepts don’t have a set structure. They can be anything we like them to be, and represent any aspect we’d like them to. For example, sure we have a context called Fruit which is associated to all of the things that you and I call “Fruit” like Apples, Pears, and Bananas (a personal favorite of mine!) but you can also talk about a context that’s more abstract like ‘Found in Africa’. So all of the things we know about that are ‘found in Africa’ can be associated to the Found in Africa context. That means we can have some of the Fruits, and some of the Plants all be associated to the same context of ‘Found in Africa’, so when we ask the Knowledge base for what sort of concepts it knows about in the context of ‘found in Africa’ we get both some Fruits and some Plants. Isn’t data storage great!?!

Well, pointy haired bosses are never satisfied, so when they see you can do such wonderful cartwheels with individually simple data, they still want more.

“I’d like to be able to find what sort of Fruits and Plants we can find in Africa that are legal to export to the United States.”

Just when we thought we’d thought of everything, we now have to add something else!

“Ok,” we tell the boss, “No Problem!” It’s a good thing that we don’t have to completely redesign the knowledge base just to add a new context for some concepts we already have! We can just create a new context called “Exportable To the US” and associate the proper concepts with that context as well. Great!

So, it’s also relatively simple now to construct ‘queries’ for this kind of data store. When we write the application to show the boss what sorts of Fruits and Plants are Exportable to Africa, then we just have to retrieve all of the items associated with the context of Exportable to the US and also Found in Africa, and then make sure that the results are either Fruits or Plants. It’s rather like drawing a venn diagram. Since we used rules to make the associations on the way into the storage system, we don’t have to do anything but grab the ready and waiting list of associates we need on the way out! Sure it takes a tiny bit of effort when the concepts are inserted into the store for the rules to figure out where they are to be associated, but that all runs asynchronously, and we didn’t care about it when we added it. Only now when the boss comes to get his answers does the effort really matter! So, with almost no effort at all, we’ve given the boss what he wants. We can only hope that there aren’t too many more bosses to satisfy today, because we were supposed to go out for a drink after work. It’s a good thing everyone uses Associative Knowledgebases, or who knows how long we’d be here?!?

A Post Facto I forgot to mention, this is a description of a working system. It’s called Relavance, and it really does all of the wonderful things I just described as if they were hypothetical.

2 Comments »

RSS feed for comments on this post. TrackBack URI

  1. Relevance isn’t the only ones using Associative Databases. The semantic web is founded on an associative model. I’m working on an in house associative database written in Oracle and C#. As an early adopter I can tell you that there is still a lot of things to figure out for this model to compete against relational databases.

    Comment by Darroll — January 18, 2008 #

  2. True, it’s not the only one, per se, but it’s the only one that works the way that it does. For example in XML based representations, identity is a lookup game. You have some attribute that you have to specify at some point in the document as an identity, and then reference it somewhere else, and in order to get that reference you have to traverse something, like an index if you’re lucky, or the whole document if you’re not. The uniquely vector addressable nature of items and their associates is a huge advantage because there is a notion of continuity inherent in both the conceptual and physical representations of it in the system. It’s nearly a one to one mapping of reality to a model with very little impedance mismatch. I’m not discounting OWL wholesale of course there are a ton of things that it’s great for, as well as relational databases, but as for economics of scale, the associative model (both semantic web and relavance) has a large potential market itch that relational datawarehousing has not yet begun to be able to scratch. The examples I’ve presented here are of course trivial, but if you get them, you can see how it can scale outward. More details to come shortly from me, I’m writing another monstrous post on the new system that’s about to release any day now.

    Comment by Dave — January 18, 2008 #

Leave a comment

XHTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Powered by WordPress with Pool theme design by Borja Fernandez.
Entries and comments feeds. Valid XHTML and CSS. ^Top^