Last Wednesday I attended the “Cases of Data Storage & Data Management” session of the Seattle Tech Forum. We had some great speakers and a lot more attendance than recent STFs.
NuoDB sponsored the entire session, and also provided our first speaker, Barry Morris, CEO and Co-Founder. His talk was titled “Establishing A Successful Relational Database Strategy for the 21st Century”. He started by talking about how SQL and relational database in general were the biggest inventions in database technology in the 20th century. He outlined all the ways that a set-level data store beats a record-level/document oriented store. The SQL/RDB paradigm has brought us benefits such as ACID, and has built up a huge amount of value in existing data, tools, and large numbers of people trained to do useful things with SQL.
He went on to explain that since SQL is a 20th century invention, and is not well suited to deal with 21st century problems such as:
- Commodity data centers (lower cost, low management requirements)
- Big data
- Modern workloads
- 24×7 operation
- Geo-distribution
- Developer empowerment
According to Morris, this has led to a database crisis, generating many bad ideas like sharding, master/slave replication, and complicated caching schemes.
His solution is NuoDB, a database designed to supply all the powerful benefits of an RDB (such as ACID), while bringing all the flexibility and low overhead of a NoSQL database. NuoDB’s organization and ability to scale arises from emergent properties resulting from simple, deterministic behaviors of each machine, just like natural systems. With NuoDB, there is apparently no central control, no master data, no supervisory role for any machine in the database. Machines come and go as they please, and once added to a database can quickly become a useful part of that database — automatically, without configuration.
Apparently NuoDB has already been in trials with customers, and goes into beta in January.
Next we heard a presentation titled “Emerging Trends in BI and Bigdata Warehousing” by Amol Shanbhag, Senior Data Warehousing Engineer at Expedia. He started with the question “how big is Big Data?”, supplying these interesting statistics:
- The Library of Congress adds 5TB a month
- The internet will move 18 exabytes per month in 2013 (I may have written this down wrong since Google is telling me we’re already at 21EB per month)
- One zetabyte is twice as big as today’s Internet
After covering some more Big Data facts and trends, Shanbhag moved on to talk about the use of NoSQL in BigData. He emphasized his belief that “NoSQL” should really be thought of as “Not only SQL”, as there is a need for both. He claimed that especially for analytics one of the benefits of NoSQL is that you have a faster time to insights about your data. He also explained the difference between OLTP (MongoDB, Couch, Azure, etc) and OLAP (Hadoop, etc) NoSQL systems.
He then examined Hadoop in some depth, explaining it’s benefits and use cases for Expedia and in general.
Finally we heard from Mike Miller, Chief Scientist and Co-Founder at Cloudant; Affiliate Professor of Particle Physics, UW. Miller had spoken at the November session as well. This time his talk was titled “Moving Beyond the No/New/SQL Debate: Introducing the Application Data Layer”.
He gave a great overview of Cloudant, which he said you can think of as the “Akamai of Data Content”. Cloudant provides a scalable, managed CouchDB/BigCouch database as an enterprise grade service. The goal of Cloudant is to allow you to focus on your application, not data operations.
At one point Miller also said a goal is “to get a data center on every cell tower”, which was (I think?) tongue in cheek, but speaks to their desire for high performance and ubiquity, and the strength of the NoSQL model. Since CouchDB is a NoSQL database with no guarantees of immediate consistency, such radical decentralization is actually a very realistic possibility.
Miller touted the auto sharding behavior of CouchDB. He contrasted that with the pain of app level sharding. Apparently it took two whole years for Google to shard it’s F1 ad network data. He gave another example of Hothead Games, who were unable to scale their game’s MySQL database when the game went viral — until they moved to Cloudant.
In the Q&A we had some great questions (which I took poor notes on). At one point it seemed like there might be a debate brewing between Morris and Miller on the pros and cons of NoSQL, but sadly it never materialized.
I would like to compliment both Morris and Miller on their presentations which focussed mostly on their own companies but somehow did not come off as a sales pitch — most speakers can’t pull that off. I also enjoyed the technical depth Shanbhag was able to explore. It was a very interesting and educational evening.