StrokeDB goes public

Posted by yrashk

For the past two weeks Oleg Andreev and me spent most of our time working on a stuff we enjoyed really a lot — StrokeDB project

What’s it?

StrokeDB is a lightweight approach to document-oriented database, currently implemented in Ruby. The concept is pretty much simple:

  • each document is uniquely identified by UUID
  • each document has a set of slots, which are basically key/value pairs, where key is a string and value is a simplistic data structure (boolean, number, string, array, hash — like in JSON)
  • each time you update documents, its version is updated. Version is basically a hash-function for document content.
  • reference to previous version is automatically maintained by StrokeDB
  • each document may reference 1+ “meta documents”, which are the documents that declaratively describe an essence of a particular document

One of the motivations for StrokeDB was my desire to decentralize some databases. Currently databases are pretty much centralized, like in SaaS you use — you basically host your data at some company’s data center. I believe that in some cases it is not a proper way of managing your data. Due to centralization you put your data security at risk, you need their database software to be really shining fast (because there a lot of clients working with their data), etc. But what I really want is to have my data right where I am working with it (i.e. on my laptop), be able to share it with other parties in a secure way, back it up, etc.

So, yes, I just want to return some data to the client’s computer.

That’s how I came to StrokeDB, which was greatly inspired by Git and my previous experiments in metaframe databases.

Why another document database?

Why not CouchDB/ThruDB/SimpleDB? Well, we had a number of reasons to launch own project:

  • We want it to be really lightweight, and basically, embeddable. That’s how it is implemented now — it is just a Ruby library.
  • We want to workaround natural limitations of the mentioned DBs. CouchDB does not support code injection to the database core, indexes in particular (like in PostgreSQL). SimpleDB is hosted elsewhere, supports very primitive queries, not extendable. ThruDB supports only keyword-based search index (no special indexes). Also, partitioning and distribution is done via SimpleDB.
  • We want to build a system on the top of concept of asynchronous operation. We do not rely on locking or a synchronous conflict resolution (aka optimistic locking). Well-designed asynchronous workflow leads to several useful features: unlimited data distribution, offline work, replication-based load balancing, data consistency, availability and fast access altogether.

Metadocuments?

Here is a simple example of metadocuments usage: Imagine you have document that represents some concrete apple:


some_apple: 
        weight: 3oz 
        color: green 
        price: $3 

it could have three metadocuments that “describe it”: Apple, Fruit and Product:


some_apple: 
        __meta__: [Apple, Fruit, Product] 
        weight: 3oz 
        color: green 
        price: $3 

Upon this document load ruby object will be extended by three modules (Apple, Fruit and Product).

For example, you have them defined as


Apple = Meta.new
Fruit = Meta.new do 
        def green? 
                color == 'green' 
        end 
end 
Product = Meta.new do 
        def sell! 
                # ... 
        end 
end 

So when you load that some_apple document (by finding it with slot-based search, or by its UUID), you will have an object that also responds to #green? and #sell! methods.

It will also will respond positively to #is_a?(Apple), #is_a?(Fruit), #is_a?(Product)

Some examples?

Here you go:


config = StrokeDB::Config.new(true)

config.add_storage :mem, :memory_chunk
config.add_storage :fs, :file_chunk, 'test/storages/test'

config.chain :mem, :fs
config[:mem].authoritative_source = config[:fs]

config.add_storage :index_storage, :inverted_list_file, 'test/storages/index'
config.add_index :default, :inverted_list, :index_storage

config.add_store :default, :skiplist, :mem, :cut_level => 4

User = Stroke::Meta.new
unless u = config.indexes[:default].find(:__meta__ => User.document, :email => "someemail@gmail.com").first
  puts "User not found, creating new user" 
  u = User.new :email => "someemail@gmail.com" 
  u.save!
else
  puts "We've found him!" 
end
puts u

config[:mem].sync_chained_storages!

What do we still miss?

A lot:

  • Transactions (though we have some building blocks ready to build them)
  • Replication (but again, we have building blocks for streaming replication already)
  • Efficient indexes
  • Nice API (time cures this disease!)

But hey, it was only two weeks of hacking — so stuff is definitely coming.

Questions? Ideas?

Join our mailing list

Comments

Leave a response

  1. Ladislav MartincikFebruary 03, 2008 @ 11:47 PM

    Very interesting. Keep working on it.

  2. Noah SlaterFebruary 04, 2008 @ 04:18 AM

    FWIW, CouchDB is fully indexable in the form of JavaScript functions:

    http://www.couchdbwiki.com/index.php?title=Views

    Other than that, it’s nice to see some healthy competition in this arena. ;)

  3. RobFebruary 04, 2008 @ 04:42 AM

    Hey Yurii!

    How good does this database scale?

    Are you going to spend more time working on it?

    Where do you want to get it?

    And are your strengths that couchdb does not have?

    Let me know, would greatly appreciate your comments on this. Rob

  4. Jake LucianiFebruary 04, 2008 @ 05:17 AM

    Hi,

    Interesting project. Just for clairificarion Thrudb does support full indexes via clucene and does not rely on simpledb.

    Jake

  5. Yurii RashkovskiiFebruary 04, 2008 @ 07:05 AM

    Noah,

    Surely I know about Javascript views in CouchDB. May be I just expressed my thoughts in a not very precise way. By “CouchDB does not support code injection to the database core” I meant that CouchDB itself is written in Erlang, and you inject only Javascript, but with code injection in js you are not in a full control of things that happen. For example, in StrokeDB you can easily add new special kind of index in Ruby which will use some low-level things. Of course, this approach has its own pros and cons.

    And yes, some competition is always nice :)

  6. Yurii RashkovskiiFebruary 04, 2008 @ 07:31 AM

    Rob,

    Well, StrokeDB is by design scalable by launching any number of instances that can exchange streaming replicas.

    And yes, I am definitely going to work more on StrokeDB — I am going to use it for next version of Issues Done tools.

    Where I want to get it? Well, it is a big question. Somewhere where it will rock :)

    “And are your strengths that couchdb does not have?” — what do you mean by this?

  7. Marius MathiesenFebruary 04, 2008 @ 10:06 AM

    Great work! The meta document stuff looks really interesting, can regular documents be used as meta documents (like prototypes or templates)?

    I’ll definitely have to try this out!

  8. Yurii RashkovskiiFebruary 04, 2008 @ 11:48 AM

    Marius,

    Yes, any document that has “Meta” meta (and you can have multiple metas for any document) could be used as metadocument.