I’ve been spending a lot of time thinking about distributed systems lately, what with the job and all. The problem seems to come in when I start looking for distributed database engines.
I spent a good deal of time today playing with CouchDB, and it seems relevant to my interests and all, but I’ve thus far only managed to add data to it. Tomorrow (or tonight, since I’m still waiting on the digital deliery of tonight’s episode of Lost) will be devoted to figuring out “views,” but the simple fact that it’s not readily intuitive bugs me.
Why is it that SQL (Simple Query Language, not the RDBMSes that grok it) is not considered suitable for the task of querying a distributed database? I wouldn’t generalize like this, but most of the options that I looked at today that were not cheap hacks of a given RDBMS used a query paradigm that was completely foreign to me. The only one that I feel I could really go with in a hit-the-ground-running manner is Hypertable and its “HQL” queries, but it’s not currently an option.
In the case of CouchDB, I can see where it might be a bit difficult to map SQL to its paradigm, but that’s mostly because of the open-ended nature of its data. The examples basically suggest that you throw all of the data for a given application into a single database, which really isn’t all that strange until you consider that there’s not really a “table” concept. Each entry in the database is a JSON document made up of any fields that you wish to define. I suppose that views might be a method by which one could emulate tables, but I’m not terribly sure just yet.
Either way, I believe that it would be exceptionally cool to actually try to do the aforementioned mapping, though not necessarily with CouchDB. I’ve been thinking of ways to accomplish this particular task a bit lately, as one of the internal systems at work is now, effectively, a big MapReduce problem (that’s the way that I’ve been working on it, anyways). Basically, I run a traditional SQL query on each of the nodes in the system, aggregating the data on the master node. That requires a bit more pure perl hackery than I’d really like … I’m thinking about doing some SQL::Statement hacking to see if I can map it to basically do the aggregation and everything in a transparent manner so that I’m just handing queries to an object and telling it to go.
Barring that, I’m thinking about hacking some network capability into SQLite (thus breaking at least one of SQLite’s design goals).
Now, granted, it may seem like I’m missing the entire point of non-relational database systems. I really do get it. I’m just talking about providing a familiar interface, and that really shouldn’t be all that difficult to accomplish, in the long run.