RDBMS theory

I feel the need to checkpoint this. Life is getting confusing.

Barry Morris, CEO of Nuodb, has written a series of artilces about the “Holy Grail”, which he published at the Cloud Computing Journal, and somewhere within the NuoDB site.

The most import ant contribution that Morris makes, in my mind is that there are four models of scale out RDBMS. (Shared Disk, Shared Nothing, Synchronous Commit and their own Durable Distributed Cache invented, (or maybe substantiated), by Jim Starkey.)

Unsurprisingly, Morris’ third article extolling the superiority of what he has to sell  does not, as far as I can see describe how the consistency property is met. I need to re-read the MVCC part of the article. MVCC is based on a file/item append model. MVCC obviates locks (How?) and thus removes a massive part of the seriality of a DBMS which is good because not only do we have Brewers Theory to deal with, but also Amdahl’s Law. The un-answered question to me is how does the relevant cache partition ensure that the page copy it gets from a remote node is the most recent and not required to be locked for update? He states the relationships are asynchronous between nodes, so we are back to eventually consistent, it would seem.

From Morris’ article we learn that NuoDB (like MarkLogic?) and in fact like MySQL where Starkey worked for a while consists of a Transaction Engine and a Storage Manager entity.

Morris mentions Google F1, which is used to support their ad keywords database.It is based on Google’s Spanner which seems pretty much their answer to the CAP theorum, we’ll have to see what the latency cost is like, but being Google it may not be publicly open source.

Morris’ article does not reference Brewer’s CAP theory. I have collected the following links tagged Brewer,

At some stage  I found the proof that the CAP theorem was a theory, I think the Barnes article above references it.

Can we break Brewer’s theory?

I need a, personally, accessible definition of Consistent, Available and Partition Aware. (The first two are easy). Although the wikipedia entry, CAP Theorum has a pretty good set of definitions

In theoretical computer science, the CAP theorem, also known as Brewer’s theorem, states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:[1][2]

  • Consistency (all nodes see the same data at the same time)
  • Availability (a guarantee that every request receives a response about whether it was successful or failed)
  • Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)

It’s likely I suppose that we might engineer to ensure that the failing condition is so trivial it can be ignored.

The commonest compromise is between availability and consistency although eventual consistency is a relatively modern construction.

Shared disk clusters engineered for HA on a fail fast and recover algorithm are a solution that fails the Availability requirement of the CAP theorum although they have a zero RPO and can have relatively short RTOs.

Here’s the sponsored Bloor paper on NuoDB.

The Jim Starkey wikipedia article references a 2012 patent that patents “A multi-user, elastic, on-demand, distributed relational database management system.” We’ll see? Probably the patents that protect the Nuodb products.

ooOOOoo

The NHS have decided to replace Oracle with RIAK for the “spine”. This claims partition tolerance and availability.

http://www.aerospike.com/ is another hi-performance, scale-out database.

When considering XML/RDF optimised databases, I have been pointed at Virtuoso, which has a wikipedia page here. and a white papers page here.

Ubuntu 13.10

I am installing this on the Mac under Virtual Box. This snip is about Ubuntu.

The first problem is that it comes with quite a bit of crap installed and the Unity search is over zealous and too network noisy.

This might be helpful, an article about the first 10 things to do. I seem to have a problem with finding the dash plugins to stop the internet searches. This needs to be tuned.

I want to configure the software package manager to check daily.

I have installed Unity Tweaks.

I have turned the virtual desktop manager on.

I have uninstalled Ubuntu One.

RCS

Revision Control System, one of the grand daddies of them all.

I still use it because its simple and does not have a network interface.

Links

rcs -i creates the repo, rcs -l will turn locking on, ci -l checks the file in. If locking is not on, no version message is prompted. If locking is not on, the file if checked in is deleted from the home directory. rlog will display the version history. So


cat $file | rcs -l | ci

$Header$ is an omnibus with most of what one needs and $Log$  displays the log messages.

I need to experiment with the $Revision$ keyword, can this be subsituted into code.