We Don't Need a "Database"

Posted by yrashk

I’ve been trying to formulate what StrokeDB is recently. And here is my summary: StrokeDB is not a database; it is a programming environment on top of Ruby (until we’ll have it ported to other languages). And here are my thoughts about “database” concept.

Do we really need “databases”? Well, I mean, we surely need some toolset to be able to store and retrieve data, but who said we need it in a form of pure datasets to be stored and retrieved? Who said that there should be a database server to interact with? Who said we might need special domain languages designed to manipulate arbitrary data?

What we really need is a persistence-aware programming environment, aren’t we? We just need to be able to store and retrieve data no matter how its persistence handled internally. There is nothing new about it, actually — MUMPS and GemStone/S (or even PL/SQL) were around for decades. What we really might need is to be able to create your-application-data-domain specific languages without any hassles — since we need to manipulate application’s data, not just any data (like you basically do with SQL).

It is quite popular in Rails world to say that we need a stupid database, just a kind of storage and let Ruby do the rest. Basically, they have a point. They use RDBMS as a data storage layer and their database is actually smart, because what is really important for data handling is actually implemented in Ruby. It is still usually limited by RDBMS design constraints, though.

My point is that your data should be as close to your main programming environment as it is possible. Your structures should be as native as it is possible — and they should be handled within the same environment. That’s reminds things like PL/SQL. Basically, PL/SQL is not THAT bad, but the thing with it is that usually you was using not ONLY PL/SQL, but, say, some Java code to interoperate with Oracle database.

Your application itself IS a smart database.

So, I’d say we’re in the beginning of the long way “back to the future” — persistence-aware programming environments, not just databases.

Viva smart databases!

P.S. I am going to blog about data organization concepts within such environments soon — that’s an interesting topic to talk about and it is surely more concrete than this one :)

Comments

Leave a response

  1. PhilMay 15, 2008 @ 11:49 PM

    Awesome! The thing I really like about it is that StrokeDB addresses the coder’s needs, and not just adapts the current SQL/DB techniques to be easy-to-use in dev environment with all of the underlying constraints still preventing dev from doing things he needs.

    Long straight way StrokeDB!

  2. Jamie FlournoyMay 16, 2008 @ 12:51 PM

    It sounds like you’re arguing for an object database more than a document database. A document DB is not as close to Ruby as possible.

  3. Yurii RashkovskiiMay 16, 2008 @ 12:57 PM

    Jamie,

    well, it is all about data organization logic. For example, in StrokeDB we use documents and metadocuments model and we do implement it on top of Ruby atm. Documents are just a kind of DSL/API for Ruby, and those documents+storage api+views+etc. makes StrokeDB, a programming environment on top of Ruby.

    StrokeDB as a concept is not limited to Ruby and its particular object model, so it could be ported to other languages without changing its model.

  4. JSMay 17, 2008 @ 11:08 PM

    This is a surprisingly ignorant blog post. The purpose of the database is for data integrity… that is to say, making sure the data is not corrupted and ensuring that queries into the data return results that can be trusted. One other major advantage of relational databases is simply that you construct one correct model for the data that’s important to you, and ALL of the applications that you build that would want to use that data look at it the same way.

    Not so your Ruby ‘smart databases’. Each Ruby app you write requires to rewrite all the ‘smart database’ routines over again. Whereas with a relational database, after you think a little bit about the structure required by your data and the integrity checks to apply, the checks in the database are useful to any applications that attempt to put data in or to interact with the stored data.

  5. Yurii RashkovskiiMay 18, 2008 @ 02:27 AM

    Haha, that’s really funny. I was told in previous blog post: “Well said. However, prepare yourself for a lot more of this kind of FUD. SQL databases have owned the market for a decade, and people with SQL careers aren’t likely to cede to pluralism easily.”, and here is the confirmation :)

    By the way, “The purpose of the database is for data integrity”. I am stupid, right? But I thought that database’s primary goal is data persistence. Data integrity could be guaranteed without database.

  6. Ashley MoranMay 18, 2008 @ 02:34 AM

    I agree with the idea that databases should be smart, and I also agree with JS when he says “the purpose of the database is for data integrity”. But to me, these are the same thing. Read pretty much anything by C J Date and you will come to the same conclusion.

    The raw data is pretty weak on its own. What is needed is constraints (that give meaning to the data) and ways to manipulate the data. The current OO-relational mapping is weak because it’s hard to integrate the predicate logic from the database with the OO programming constructs in modern languages, especially given that the database is never running in the same process space as the business model. Trying to abstract the database has had the unfortunate effect of reducing the usage of constraints (and hence removing meaning from the data), even as far as the (shocking) belief that foreign key constraints are unnecessary.

    What I think we need is more intelligence in the databases, and an elegant way of accessing this from the application. Of course, that’s a much broader and harder problem.

  7. Yurii RashkovskiiMay 18, 2008 @ 02:40 AM
    The current OO-relational mapping is weak because it’s hard to integrate the predicate logic from the database with the OO programming constructs in modern languages, especially given that the database is never running in the same process space as the business model.

    That is the problem addressed in StrokeDB — it is a database running in the same process space and it allows you to define constraints but it does not impose severe constraints on you by default (like limiting your data structure types, flexibility of records, etc.)

  8. Yurii RashkovskiiMay 18, 2008 @ 02:43 AM

    ...and yes, luckily StrokeDB is not another ORM.

  9. Yurii RashkovskiiMay 18, 2008 @ 02:54 AM
    What is needed is constraints (that give meaning to the data)

    Constraints does not define data’s meaning. Constraints define constraints. Data meaning could be defined with some sort of ontology.

  10. JanMay 18, 2008 @ 02:56 AM

    For fun and profit, upfront consistency can go to hell: http://www.allthingsdistributed.com/2007/12/eventually_consistent.html

  11. dudeMay 18, 2008 @ 04:21 AM

    dude, seriously, who cares???

  12. Kyle LahnakoskiMay 18, 2008 @ 05:12 PM

    There is a place of object databases and the Ruby solution to persistence. When the amount of data is small, or the schema is simple, or most data built on the fly, Ruby has a fine solution.

    The problem with these solutions is apparent when you have huge amounts of data, accessed by multiple languages, with complicated schema. These are times you need a whole person just to maintain the data consistency and manipulate data as just data, and not as a peripheral of some other application. This specialized domain creates a need for specialized language; ergo SQL.

    Where program code is always in the same state every time you start your program, a data-store is accumulating state over time. When application code switches based on the data in a data-store, the data is acting like code; essentially the data-store is changing code base. As new business rules are introduced, schema refactorings are required. Refactoring a data store is more difficult and time consuming than refactoring plain code, because, by the time you have made a patch to refactor a data-store the data-store has changed. Since data is code, applying a patch meant for different code can be dangerous so you must be careful.

    The Ruby data-mapping does not solve the schema refactoring problem at this time. Ruby has a fine persistence solution as long as schema refactorings can be avoided:

    1) data accumulation can be stopped while refactoring happens 2) data is not used by other applications 3) past data does not exist, or is easy to convert

  13. Oleg AndreevMay 19, 2008 @ 11:54 AM

    StrokeDB is not a database, it is a framework for building a database. Multiple languages (or even “services”) can access a database through the well-defined interface to the storage (read/write document by UUID) and views (“key”, “limit”, “offset”, “reverse” options).

    You also say about “data-mapping” and “schema”. First of all, StrokeDB is not an object database where language runtime object relations should be persisted. We support a very narrow set of serializable types (JSON: strings, numbers, arrays, “datetimes”, hashtables) and relations (references to other documents). The document space is very simple to use, to understand and to implement. We don’t have to worry about universal serialization of custom datastructures. There’s only one more-or-less successful example of a consistent object database: it is a Gemstone environment for Smalltalk. But it has its own limits. Other solutions are half-ass done.

    Finally, “schema refactoring” is not an issue in StrokeDB. There’s no schema at all, so no schema refactoring is necessary! The other side of it is that your code should support all the document formats: both actual and obsolete. In some circumstances it might lead to a pain and bloatware, but I believe that code refactoring is much easier, than schema refactoring for 10,000,000 of records. Sometimes people don’t even do this with RDBMS: when there’s a huge amount of data, it might be easier to leave the data as is.

  14. Ashley MoranMay 20, 2008 @ 04:32 AM

    “StrokeDB is not a database, it is a framework for building a database”

    Just for pure comparison, this is exactly what Date says about object databases in Introduction to Database Systems. It’s not appropriate for me to comment on StrokeDB because I haven’t used it, but this makes it sound like a “dynamic language” version of the previous generation of object databases.

    As for people that complain about schema refactorings, or avoid doing them, they are just plain lazy and should be banned from programming :)

  15. Kyle LahnakoskiMay 23, 2008 @ 03:37 PM

    Finally, “schema refactoring” is not an issue in StrokeDB. There’s no schema at all, so no schema refactoring is necessary! The other side of it is that your code should support all the document formats: both actual and obsolete. In some circumstances it might lead to a pain and bloatware, but I believe that code refactoring is much easier, than schema refactoring for 10,000,000 of records.

    First, to say there is “no schema at all” is misleading. Your data has structure, and it obey relations, even if they are not explicit in code. You data has an implicit schema, and that schema will change over the life of the product.

    Second, you are right to point out you may perform the schema conversion at runtime (aka “support old formats”), leaving the old data unchanged. This is effectively a runtime patch; patching old data to the new schema. This patch must be maintained, as the application progresses. For example, suppose your app is old enough to have 4 versions, A, B, C and D. Here is the work you perform.

    Write app A Write app B, translate schema A to app B Write app C, translate schema A to app C, translate schema B to app C Write app D, translate schema A to app D, translate schema B to app D, translate schema C to app D

    Which is O(n^2) work, and can get un-maintainable quickly.

    Compare this to Write app A Write app B, translate schema A to schema B Write app C, translate schema B to schema C Write app D, translate schema C to schema D

    Which is O(n) work. The patches in this latter example can be runtime or one-time. But it is best the refactoring is done is a domain specific language.

    I am advocating there is a need for a simple language to perform schema translations; SQL is good enough.

    (BTW, 10,000,000 records is not a lot, especially with one-time schema refactorings. You do not even need to bother optimizing with such low numbers, the machine will do it for you.)

  16. g-manMay 26, 2008 @ 01:56 PM

    Hi Yurii,

    I really enjoy your explorations into all kinds of languages, even Erlang!

    My questions for you:

    Why Ruby? I agree Ruby is fun to use, and is very close to a natural language syntax (especially for English speakers), but I have found Python to be cleaner and more ‘programmerly’ to use (and I really like the indentation blocks).

    Why JSON? From what I read, YAML is better in many ways (no bloody braces), and is actually a super-set of JSON

    Best to you in your development of StrokeDB!

  17. Yurii RashkovskiiMay 26, 2008 @ 11:23 PM

    g-man,

    1. Ruby is because it allows me to prototype fast and define DSLs. I’ve been programming in Python for a couple of years, but it hardly allows me to DSLize my code

    2. Well, in fact, StrokeDB isn’t JSON-based. It is just easier to explain its types as JSON+ (since datatypes StrokeDB supports are basically that ones supported by JSON + few more, like document reference, Time, Range, Regex). So JSON is just a kind of example for supported datatypes, nothing more — we don’t store data in JSON.