All Your Base 2013

6 minute read

The second running of White October’s All Your Base conference was held in Oxford last week, boasting another great lineup of industry speakers showcasing the same fascinating mix of product specialists and project implementors that made last year such a great experience.

What a difference a year makes

It’s clear that the landscape’s shifted somewhat from this time last year. NoSQL has clearly moved from interesting novelty to useful toolkit item and the industry’s starting to be able to focus on selecting the right tool for the job, debating the trade-offs between availability and consistency (a common theme this year) and pondering whether a particular system’s inner workings are compatible with their project’s expected usage patterns rather than speculatively following the latest trend.

With this in mind, it wasn’t a great conference for MongoDB as spending the past year or so being lauded as the poster child for the NoSQL revolution has caused them to come in for a lot of flak - some of it on display during speaker presentations - on the occasions where they turned out not to be a good fit for the task in hand. The painful but sadly inevitable result of being many people’s first NoSQL experience. My take on this is that rather than anyone saying “this is terrible, don’t use it”, it’s more a case of “be aware of its strengths and weaknesses before using it in production” (with a possible side-order of “have a flexible stack and be prepared to move when things don’t work out the way you’d hoped”), which is advice that should be applied to any technology choice.

The talks, in brief

In the first of many changes from the advertised programme, Neha Narula set the ball rolling with a keynote talk focussed on caching and how the database fits into the web application stack, concluding with an insight into the latest research into “dataflow frameworks”.
Neha’s slide deck

This was followed by the return of Tim Moreton, Acunu’s CTO to talk about their experiences with Cassandra, particularly with regard to how it fits in with Hailo’s stack (which also includes ElasticSearch), and of course their own analytics platform that layers data mining capabilities on top of Cassandra. For anyone considering a career change, it was noted that candidates with Cassandra experience on their CVs are vanishingly rare.
Tim’s slide deck

After the morning break, Simon Metson of Cloudant gave us a refresher course on CouchDB, a quick overview of the status of parallel projects such as PouchDB (still going) and TouchDB (apparently deprecated) and a quick look ahead to some of the new features in Couch 1.5. By the sounds of things Couch is still hamstrung by a lack of security settings which will rule it out of a lot of deployment scenarios.
Simon’s slide deck

Keeping things technical, Tomomi Imura spoke about IndexedDB, demonstrating how to store data in the client’s browser - I should insert a warning from last year here about the dubious security of data stored in this way so be careful what you choose to store - and the unfortunate need to use shim code to maintain support for the officially deprecated WebSQL storage engine to ensure that data can be stored within all modern browsers, notably Safari. The good news is that all recent builds of modern browsers have support for one or the other.
Tomomi’s slide deck

Wrapping up the pre-lunch session was Andrew Godwin on how the latest Postgres features allow the use of hybrid schemas by defining part of the record structure using the ‘json’ data type to allow embedded document stores within a traditional relational record, a marked improvement over the earlier ‘data’ field (although it should be noted that the full json querying functionality appears to only ship with version 9.3). Also touched on was Postgres’ support for writing embedded functions in a wide variety of languages (and why you should use this power sparingly).
Andrew’s slide deck

Straight in at the deep end after lunch with Geoff Wagstaff, CTO and co-founder at GoSquared sharing his experiences of scaling his web-based realtime analytics system from its humble single server LAMP stack beginnings back in 2009 to its current distributed form powered by a mix of Redis and Cassandra (via an unfortunate brush with MongoDB where problems with document structure and disc space management when writing to large records forced a hasty rethink). Covered a lot of technical ground in a short time and finished with a useful recap that included the importance of choosing the right tool for the job - being mindful of the difference between scaling reads (easy) and scaling writes (problematic) - and the need to put aside scaling concerns when building a prototype.
Geoff’s slide deck

A quick mid-session break followed in the form of a video ad for Code Club - a volunteer-driven project to teach programming to 9-11 year olds in after-school clubs - that had the audience laughing in all the right places and spontaneously applauding at the end as if the presenters had been in the room.

Continuing the lessons learned theme was Sarah Mei telling the story of how Diaspora, following the perceived wisdom that “social data isn’t relational” ran into trouble by implementing MongoDB then realising that, as the feature set grew, their data structure was relational after all. This was backed up by a further example where a scenario used throughout as a use-case that was a perfect fit for a document store turned out to be anything but when the client demanded a “simple” change that meant traversing the data in a way that’s more suited to a relational structure.
Sarah’s slide deck

One final break for some much needed coffee then we returned to hear Ines Sombra, Lead Data Engineer at Engine Yard somehow managing to cram about an hour’s worth of material into 30 minutes to take us on a whistle-stop tour of 10 data anti patterns to avoid. The tale of someone having to explain virtualisation to an FBI agent who demanded to be told which server to pull out of a rack for evidence was a particular highlight.
Ines’ slide deck

Next up, Sean Cribbs from Basho - makers of Riak - on eventual consistency by way of Leslie Lamport’s “Safety & Liveness” principles, scalability and schemas that concluded, refreshingly, with a list of use cases for which Riak would not be suitable as well as the conventional list of scenarios that could be a perfect fit. Also included was a teaser of some of the new features promised for Riak 2.0.
Sean’s slide deck

Last, but definitely not least, Tamar Bercovici from Box sharing the story of how they sharded their MySQL database without breaking the live site. Inspiring stuff.
Tamar’s slide deck

Thanks to John Wards and everyone at White October for another great conference. Hopefully see you again next year!