Oleg Andreev has published a nice writeup about networking/collaboration aspect of StrokeDB vision.
Top 10 Reasons to Avoid Document Databases FUD 6
This article is written in response to Top 10 Reasons to Avoid the SimpleDB Hype
First of all I’d like to note that the below answers are not about SimpleDB but rather to prevent FUD about document-based databases.
1. Data integrity is not guaranteed.
This could be the case with SimpleDB, but overall nothing prevents document databases from managing data integrity very well.
Regarding the constraints, there is nothing that prevents defining validations in a document or its related “meta” document (this is pretty much how StrokeDB works — you can define your validations within meta document and they will let your document stay validated)
More interesting are the concerns about the conflicts. I’d say that this problem is hardly addressed in a common RDBMS approach. All you usually get is either user’s A or user’s B most recent update — there seems to be no easy way graceful conflict resulution. On the contrary, since document databases approach is rather novel there is certainly enough room to adopt ways to deal with conflicts. For example, with different and configurable algorithms — like merging them slot-by-slot 3-ways, or even some special programmer-defined algorithms. I can hardly imagine how to do this sort of stuff with traditional RDBMS in a relatively easy manner.
2. Inconsistency will provide a terrible user experience.
First of all, it should noted that described inconsistencies are also quite possible with distributed RDBMS setups — they too are constrained by a certain lag before the data is going to be propagated through replicas.
The actual problem is not with lag — it is more about leaving documents in a consistent state.
This problem could be easily addressed in any kind of database, either relational or document-based.
3. Aggregate operations will require more coding.
Again, while this seems to be true for SimpleDB, other document-based databases address this problem pretty well with Views approach (CouchDB, StrokeDB [Views is WIP]) — so you can define any kind of aggregation, even such that are simply not supported by RDBMS.
4. Complicated reports, and ad hoc queries, will require a lot more coding.
I’d refer to Views approach once again — it is quite a nice way to produce complicated reports as quickly as well-known RDBMS indexes do.
“Views” could be viewed as subroutines with a special well-defined API — and we can use these subroutines to index specific “queries” even in runtime. That’s pretty interesting.
5. Aggregate operations will be much slower if you don’t use an RDBMS.
This is a dubious statement. First, for the majority of the queries speed is defined by the speed of the index (all that B+ trees stuff). Document-oriented database views are indexed the very same way.
Speaking of those RDBMS “rows” and objects I wouldn’t say they are much different. An Object with key/value pairs slots is definitely a “row” in that sense. So what’s so different about them?
On the other hand, “real” relational database should actually use aggregating operations (joins) far more frequently than typical document database. Relational database is basically about storing short “facts” with relations between them and using lots of join operations to aggregate synthetic data. That wouldn’t be efficient/easy enough to program though — that is why most of relational database in the “real world” are organized in the form of fairly wide tables.
And, finally, for the well-done DODBs it is possible to use nice Map-Reduce API to build and incrementally update very complex aggregations.
6. Data import, export, and backup will be slow and difficult.
“There are no such tools for key-value data stores, because these products are so new.”
Is lack of maturity a good reason to blame new technologies?
SimpleDB implementation in particular might have its own flaws in this area — but nothing prevents it from improving things in theory and practice.
7. SimpleDB isn’t that fast.
Since this this post I am talking about document databases in general, I’d skip those “internet latency” issues. It’s kinda irrelevant.
8. Relational databases are scalable, even with massive data sets.
The main argument here is that “those guys do scale relational database, so they are scalable”. True. They are scalable. But at what cost? “Those guys” were able to do a lot of great stuff utilizing manpower before letting machinery do this back tens years ago. But is it a good excuse to manufacture goods without machinery these days just because it is possible? I doubt it. Throwing man power at a problem is not always the best approach.
And… you said “relational”? Facebook and others do a lot of denormalization, they don’t ever use JOIN, they’d rather do several consequent requests and build intermediate results on a webserver (when you have 20 times more webservers than DBs it’s obviously good to move some load there). They treat good old MySQL as object storage with very fast B+ tree indexes. Finally, the resulting database is not a relational one. One thousand of MySQLs is just a distributed object storage with simple fast indexes and a bunch of hand-written code in php/ruby/python/whatever around it.
9. Super-scalability is overrated. Slowing the pace of your product development is even worse.
Super-scalability issue is not really overrated. The problem with the approach of “why not wait and address super-scalability once you’ve created a super product” is that once you will address super-scalability, it will be quite a different product.
The issue with scalability these days is that less scalable applications are quite different from the the ones that are hugely scalable — and that is why writing a scalable application from the scratch is definitely a waste of time and money.
But what if scaling from SQLite-like backend to 2 datacenters will be quite painless and will not require you to rethink database interactions in your application? With the right database API design it is quite possible. BigTable, Amazon Dynamo, CouchDB, StrokeDB approaches are all about addressing this need.
10. SimpleDB is useful, but only in certain contexts.
Same can be said for relational databases. In the real world, data is not really well structured — it is rather versatile and it’s repsentation depends on point of view. This problem is very well addressed by document databases (and StrokeDB in particular was created in attempts to solve this problem).
“Amazon SimpleDB, Apache CouchDB, and the Google Datastore API aren’t bad products. But we do them a disservice when we construe them to be replacements for general-purpose databases. Used carefully, they can help your organization. But used indiscriminately, you’ll create a lot more work for your programmers and you’ll make your application perform even worse”
Relational databases are not bad products either. Used carefully, they can help your organization. But used indiscriminately, you’ll create a lot more work for your programmers and you’ll make your application development even more complex.
Ruby module exclusion 5
May be I am missing something (since I am not MRI hacker), but why there is no module exclusion functionality — i.e. you can extend object with module, but can’t reverse this operation? Any specific reason for this decision, do you know?
I’ve just sketched simplistic implementation for this kind of functionality and really wondering why there is no such stuff in MRI.
Again, am I missing something?
We've launched strokedb.com
First version of StrokeDB’s site has been launched: http://strokedb.com/
StrokeDB's experimental composite metas syntax
I was playing with this idea in my mind for a week or so and finally decided to put it into the code. The basic idea is that since any document can have multiple metas, why not improving API for this? Before latest commits, you needed to do things like:
u = User.new :name => "Yurii"
u.metas << Buyer
u.metas << Seller
u.save!
It isn’t really nice. So what I have done just a couple of minutes ago is a special syntax for composite metas. You can simply use Meta#+ to add meta to meta:
User = Meta.new
# ==> User
Buyer = Meta.new
# ==> Buyer
Seller = Meta.new
# ==> Seller
(User+Buyer+Seller).create! :name => "Yurii"
# ==> #<User,Buyer,Seller __version__: 5bf2..., name: "Yurii", uuid: "32ae72c8-a1da-4ead-8bfb-d2aa65e727f3">
where created document will have actually three metas:
_[:__meta__]
# ==> [#<StrokeDB::Meta __version__: b6a0..., name: "User", uuid: "292b3226-8c69-4d21-bb2c-d4d8cd924bf2">, #<StrokeDB::Meta __version__: d223..., name: "Buyer", uuid: "89d4c62a-b935-42c6-b680-78e1de0b7d9e">, #<StrokeDB::Meta __version__: 2291..., name: "Seller", uuid: "52babc48-5bda-4ea1-a4c3-1c931185c290">]
Also (User+Buyer+Seller).find will work just as expected:
# ==> [#<User,Buyer,Seller __version__: 5bf2..., name: "Yurii", uuid: "32ae72c8-a1da-4ead-8bfb-d2aa65e727f3">]
(please note that composition order is important, not only in #find but it also defines a way how combined meta document will look like [Document#meta]).
Please note that this stuff is not fully complete yet, but as always, it is already funny to play with it.
StrokeDB's early implementation of inter-store sync
This morning I’ve finalized an early implementation of inter-store synchronization. Please not that it is definitely not mature, API and details might change over time.
So, what is this for? Synchronization is a way to replicate document between stores back and forth, while preserving whole history of document changes.
Here you are an example:
# Main store is default store
store = StrokeDB.default_store
# Let's create another store
another_store = StrokeDB::Config.build(:base_path => 'tmp_db').stores[:default]
# Here we're creating some document in another_store
doc = Document.create!(another_store, :hello => 'world')
# Updating it
doc.test = 'passed'
doc.save!
# Syncing it to store
# doc.__versions__.all.reverse is basically pass all versions of doc in reverse order (after this reversion latest version will be at the end of list). Order is important
store.sync!(doc.__versions__.all.reverse,another_store.timestamp)
# updating document at store
doc_at_store = store.find(doc.uuid)
doc_at_store.ok = true
doc_at_store.save!
# Syncing it back to another_store
another_store.sync!
(doc_at_store.__versions__.all.reverse,store.timestamp)
# Now, let's reload document at another store:
doc.reload
# ==> #<Doc __previous_version__: fe9e..., __version__: bdf6..., hello: "world", ok: true, test: "passed", uuid: "e70aca78-69d3-4e2c-920c-813f2c17be75">
# as you can see, it got 'ok' slot with value of true
Voila!
Synchronization will work just fine if you have a fast-forward situation and will raise ConflictCondition if you have a conflict. Oleg is working on merge3 algorithm to let it be one of the scenarios to resolve ConflictConditions; I am also thinking about 1-2 simple scenarios for conflict resolving.
Anyway, as I’ve said above, its API/way to work isn’t mature yet, so things might change.
Poor man's web application with StrokeDB and Merb
Yesterday I’ve finally got to playing with StrokeDB as a database “server” for a web application (currently using Merb, but basically the following scenario should work for any Ruby-based framework)
So, while we’re going to create a nice realtime replication subsystem that will allow to build really largely scalable applications with StrokeDB, there is definitely a need to play with StrokeDB-powered web applications right away, even if it will not be that scalable and fast as we want it to be. That’s why I’ve tried to create something like simplistic way to use StrokeDB from few Merb application instances.
The basic idea is that all processes that want to access StrokeDB database need to access the only one “server” store over network connection. That’s why I’ve created RemoteStore concept that allows to access stores over DRb (more protocols could be added later). Since stores (SkiplistStore to be specific) aren’t really thread-safe, RemoteStore server is actually serving one client at a time. Unless you’re going to invoke long-running operations, it should be fast enough to play with this stuff.
So, here we go. Lets assume you have a Merb skeleton application (I am using merb-0.5.3 at the moment). All you need to do is:
- Update your config/merb_init.rb to contain something like
puts "Loading StrokeDB..."
$:.unshift File.dirname(__FILE__) + '/../../strokedb-ruby/' # replace with your StrokeDB path
require 'strokedb'
StrokeDB.use_global_default_config!
print "Configuring StrokeDB..."
if ARGV.include?('--strokedb-server')
print 'configuring server store...'
StrokeDB::Config.build :base_path => File.dirname(__FILE__) + "/../db", :default => true
STROKEDB_THREAD = StrokeDB.default_store.remote_server('druby://localhost:9999').start
else
StrokeDB.default_store = StrokeDB::RemoteStore::DRb::Client.new('druby://localhost:9999')
end
puts "done."
- define some metas in app/models, like app/models/user.rb:
User = Meta.new
- run StrokeDB server:
$ merb -r "STROKEDB_THREAD.join" --strokedb-server
- enjoy StrokeDB-powered application!
For example, you can use it in your controllers, or simply play within merb console (merb -i)
Of course, this method is definitely poor man’s one and we’re working on a better ways to build StrokeDB-powered applications, but something is more than nothing :)
If you’ll have any problems with the above scenario, let me know.
StrokeDB's first validation
Folks,
this morning I’ve crafted first validation primitive for StrokeDB— validates_presence_of
Here you are few examples of it:
User = StrokeDB::Meta.new do
validates_presence_of :login
end
# ==> User
User.create! rescue $!.message
# ==> "User's login should be present on save"
User = StrokeDB::Meta.new do
validates_presence_of :login, :on => :create # or :save or :update
end
User = StrokeDB::Meta.new do
validates_presence_of :login, :message => '#{meta} should always have #{slotname} to be able to log in'
end
# ==> User
User.create! rescue $!.message
# ==> "User should always have login to be able to log in"
More validations to appear soon.
Enjoy!
Experimental has_many support
I am happy to announce that StrokeDB got an experimental has_many support recently.
Now you can easily collect documents that refer to your document by defining has_many for your meta:
(All examples shown are done in test/console)
Playlist = Meta.new do
has_many :songs
end
==>Playlist
Song = Meta.new
==>Song
playlist = Playlist.create!(:name => "My playlist")
==>#<Playlist name: "My playlist", __version__: 330f...>
song = Song.create!(:name => "My song", :playlist => playlist)
==>#<Song name: "My song", __version__: 530f..., playlist: #<Playlist name: "My playlist", __version__: 330f...>>
playlist.songs
==>[#<Song name: "My song", __version__: 530f..., playlist: #<Playlist name: "My playlist", __version__: 330f...>>]
So, here we define has_many :songs, which inflates :songs to Song meta and uses Song’s :playlist slot as a reference to Playlist. Like in ActiveRecord, this definition uses configuration by convention. And again, like in ActiveRecord, you can tweak it:
Playlist = Meta.new do
has_many :all_songs, :through => :songs, :foreign_reference => :belongs_to_playlist
end
==>Playlist
Song = Meta.new
==>Song
playlist = Playlist.create!(:name => "My playlist")
==>#<Playlist name: "My playlist", __version__: 3245...>
song = Song.create!(:name => "My song", :belongs_to_playlist => playlist)
==>#<Song name: "My song", __version__: 5245..., belongs_to_playlist: #<Playlist name: "My playlist", __version__: 3245...>>
playlist.all_songs
==>[#<Song name: "My song", __version__: 5245..., belongs_to_playlist: #<Playlist name: "My playlist", __version__: 3245...>>]
Here we use :through option which tells StrokeDB to use Song meta to find documents, and :foreign_reference specifies Song’s slot name for the reference to Playlist. You can also add some conditions to has_many:
Playlist = Meta.new do
has_many :rock_songs, :through => :songs, :foreign_reference => :belongs_to_playlist, :conditions => {:genre => "Rock"}
end
==>Playlist
Song = Meta.new
==>Song
playlist = Playlist.create!(:name => "My playlist")
==>#<Playlist name: "My playlist", __version__: 3cd6...>
rock_song = Song.create!(:name => "My song", :belongs_to_playlist => playlist, :genre => "Rock")
==>#<Song name: "My song", __version__: 5cd6..., belongs_to_playlist: #<Playlist name: "My playlist", __version__: 3cd6...>, genre: "Rock">
pop_song = Song.create!(:name => "My song 2", :belongs_to_playlist => playlist, :genre => "Pop")
==>#<Song name: "My song 2", __version__: 6cd6..., belongs_to_playlist: #<Playlist name: "My playlist", __version__: 3cd6...>, genre: "Pop">
playlist.rock_songs
==>[#<Song name: "My song", __version__: 5cd6..., belongs_to_playlist: #<Playlist name: "My playlist", __version__: 3cd6...>, genre: "Rock">]
Isn’t it nice? But lets go further. What if you want to know all authors of music in your playlist? That’s quite simple!
Playlist = Meta.new do
has_many :authors, :through => [:songs, :author]
end
==>Playlist
Song = Meta.new
==>Song
playlist = Playlist.create!(:name => "My playlist")
==>#<Playlist name: "My playlist", __version__: 3903...>
song = Song.create!(:name => "My song", :playlist => playlist, :author => "John Doe")
==>#<Song name: "My song", __version__: 5903..., author: "John Doe", playlist: #<Playlist name: "My playlist", __version__: 3903...>>
playlist.authors
==>["John Doe"]
or
Playlist = Meta.new do
has_many :authors, :through => [:songs, :author]
end
==>Playlist
Song = Meta.new
==>Song
Author = Meta.new
==>Author
playlist = Playlist.create!(:name => "My playlist")
==>#<Playlist name: "My playlist", __version__: 3b19...>
author = Author.create!(:name => "John Doe")
==>#<Author name: "John Doe", __version__: 5b19...>
song = Song.create!(:name => "My song", :playlist => playlist, :author => author)
==>#<Song name: "My song", __version__: 7b19..., author: #<Author name: "John Doe", __version__: 5b19...>, playlist: #<Playlist name: "My playlist", __version__: 3b19...>>
playlist.authors
==>[#<Author name: "John Doe", __version__: 5b19...>]
So here in these examples has_meta fetches all Songs and gets all their :author slots.
So here it is. Current has_many implementation is quite experimental and might change later (for example, we’re still thinking about improving :conditions stuff for :through => [...] case, since :conditions are currently applying only to Songs). And most probably it has some bugs :)
How to play with StrokeDB easily
As I’ve told in the previous post about StrokeDB, we’ve added a simplistic test console to StrokeDB recently.
Here is an instruction how to use it.
First, you’ll need StrokeDB :) You can get it easily with Git:
$ git clone git://gitorious.org/strokedb/mainline.git strokedb
Then, change your work directory and start console:
$ cd strokedb/strokedb-ruby
$ ./test/console --prompt xmp
(—prompt xmp isn’t obligatory)
Now you can play with it:
$ ./test/console --prompt xmp
StrokeDB 0.0.1 Console
Type 'h' for help
User = Meta.new # here we define User
==>{User meta module}
User.create! :login => "test", :email => "test@foobar.com"
==>#<{User} __version__: 3c3c..., login: "test", email: "test@foobar.com">
save!
==>true
quit
$ ./test/console --prompt xmp
StrokeDB 0.0.1 Console
Type 'h' for help
User = Meta.new # here we define User
==>{User meta module}
User.find(:login => "test") # let's fine Users by login
==>[#<{User} __version__: 3c3c..., login: "test", email: "test@foobar.com">]
u = _.first
==>#<{User} __version__: 3c3c..., login: "test", email: "test@foobar.com">
u.name = "John Doe" # update user's name
==>"John Doe"
u.save! # save user
==>#<{User} name: "John Doe", __version__: 1e50..., __previous_version__: 3c3c..., login: "test", email: "test@foobar.com">
u.name = "John Doe" # get name
==>"John Doe"
u.email
==>"test@foobar.com" # get email
clear! # wipe out database
==>true
User.find(:login => "test")
==>[] # nobody found!
quit
You can also type ‘h’ to get some help.
Hope you’ll enjoy.
StrokeDB short intro
I’ve decided to write a short introductory material about StrokeDB usage.
Disclaimer: StrokeDB is still pretty young (still less than one month of development), so I can’t promise that API shown will remain the same forever. In fact, some portions of it will definitely change
First, lets load StrokeDB and initialize it:
require "strokedb"
StrokeDB::Config.build :default => true, :base_path => 'some_test'
I willn’t go into guts of config builder (that’s a bit complicated for those who are new to StrokeDB, but I’ll probably post some materials about it later).
So effectively, now we have a database initialized and it’s base path for file storages is some_test/
Now, we’re going to get into some fun. I’ll define several metas.
User:
User = StrokeDB::Meta.new do
def to_s
self[:name]
end
end
Unlike ActiveRecord, StrokeDB uses a kind of mixin model. Each document can have any number of metadocuments it refer to. As I described previously metadocuments are documents that describe document’s essense and Ruby modules to extend Document’s behavior.
In the above code, I am defining User meta, which have only one method #to_s, which will render slot ‘name’.
Now, a little bit more complex example:
Buyer = StrokeDB::Meta.new do
on_initialization do |buyer|
unless buyer[:balance]
puts "Providing $100 to #{buyer}, since he is a new buyer"
buyer.balance = 100
end
unless buyer[:products_bought]
buyer.products_bought = []
end
end
after_save do |buyer|
puts "Now #{buyer} has #{buyer.products_bought.empty? ? 'nothing' : buyer.products_bought.map(&:name).to_sentence} (and his balance is $#{buyer.balance})"
end
def buy!(product)
puts "#{self} is buying #{product}"
product.checkout!
self.products_bought << product
self.balance -= product.price
save!
end
end
Buyer is another metadocument that defines Buyer-specific functionality. Besides #buy! method it defines two callbacks: on_initialization and after_save. Their names are pretty self-descriptive (I hope)
And here is a last metadocument:
Product = StrokeDB::Meta.new do
after_save do |product|
puts "#{product.quantity} items of #{product} left"
end
def to_s
"'#{name}' for $#{price}"
end
def checkout!
self.quantity -= 1
save!
end
end
Nothing really new comparing to Buyer. So lets go further.
u = User.new(:name => "Yurii")
u.metas << Buyer
u.save!
Here I create a document with meta ‘User’ and add ‘Buyer’ meta to it, so it is both User and Buyer at the same time!
apple = Product.create!(:name => "green apple", :price => 2,:quantity => 100)
pizza = Product.create!(:name => "big pizza", :price => 15,:quantity => 5)
u.buy!(apple)
u.buy!(pizza)
In the above lines I create two products, apple and pizza and use Buyer’s #buy! method to purchase them.
Here is an output of this test code:
Providing $100 to Yurii, since he is a new buyer
Now Yurii has nothing (and his balance is $100)
100 items of 'green apple' for $2 left
5 items of 'big pizza' for $15 left
Yurii is buying 'green apple' for $2
99 items of 'green apple' for $2 left
Now Yurii has green apple (and his balance is $98)
Yurii is buying 'big pizza' for $15
4 items of 'big pizza' for $15 left
Now Yurii has green apple and big pizza (and his balance is $83)
Here we are, everything works!
Also, if you’ll inspect user’s document, you’ll see the following:
#<{User,Buyer} name: "Yurii", __version__: c33e..., products_bought: [#<{Product} name: "green apple", __version__: 933e..., price: 2, quantity: 99, __previous_version__: 733e...>, #<{Product} name: "big pizza", __version__: b33e..., price: 15, quantity: 4, __previous_version__: 833e...>], __previous_version__: a33e..., balance: 83>
Notice that in the beginning, it is defined as {User,Buyer} — both metas are displayed. This way you can easily understand what this document is actually.
Ok, I hope that’s enough for the beginning. You can see complete test source code at Gitorious
Also I would like to mention that we’ve added test console recently
Join our mailing list or get source code
P.S. I need to warn you: StrokeDB is quite immature, definitely has bugs and definitely will evolve. Period.
StrokeDB persistable incremental views
This weekend StrokeDB got so called “persistable incremental views”. What is this?
Well, lets start from View concept. It is basically a map-reduce filter with map and reduce functions defined in Ruby.
By default, it maps all documents and lets you reduce them (lets say we want to find users with age > 21):
my_view = View.create!(:name => "my view").reduce_with {|doc| doc.is_a?(User) && doc.age > 21 }
Or, you can specify your own map block (if you need to create new documents set to be reduced):
my_view = View.create!(:name => "my view").map_with do |doc|
new_doc = Document.create!(:doc => doc)
end.reduce_with {|doc| doc.doc.is_a?(User) && doc.doc.age > 21 }
To get results, simply use
my_view.emit.to_a # or my_view.emit.documents, that's the same
Okay, that’s simple. We map documents to documents and then reducing them using some criteria. Also I would like to mention that Views could be argument-polymorphic. If you’ll define your map and reduce blocks having more than one argument, you can emit results using some parameters:
my_view = View.create!(:name => "my view").reduce_with {|doc,age| doc.is_a?(User) && doc.age > age }
my_view.emit(21).to_a
I think that’s simple and nice :)
Now incremental views come in. When you callmy_view.emit View emits first “view cut” which is a set of documents map/reduced for the whole database. Now, you can use this view cut to get new view updates:
first_cut = my_view.emit
# ... work with database, add some new documents, update old documents
next_cut = first_cut.emit
next_cut view cut will contain only newly created/updated documents — so, you get updates incrementally.
Now, what about persistency declared above? That’s really simple — View and ViewCut are documents themselves — so you can easily save them and reuse later!
P.S. Currently Views are pretty slow — but things will change hopefully
P.P.S. Incremental views are really, really young in StrokeDB so I can’t promise that they are bug-free. Also API isn’t stable by any means (yet!).
nyc << yrashk
By the way, I will be in New York City (and may be some other cities) on Feb 17-26. I’d be happy to talk about funny things (document databases, Ruby on Rails, etc.), drink some tea, etc. with somebody who enjoy this stuff too!
StrokeDB goes public
For the past two weeks Oleg Andreev and me spent most of our time working on a stuff we enjoyed really a lot — StrokeDB project
What’s it?
StrokeDB is a lightweight approach to document-oriented database, currently implemented in Ruby. The concept is pretty much simple:
- each document is uniquely identified by UUID
- each document has a set of slots, which are basically key/value pairs, where key is a string and value is a simplistic data structure (boolean, number, string, array, hash — like in JSON)
- each time you update documents, its version is updated. Version is basically a hash-function for document content.
- reference to previous version is automatically maintained by StrokeDB
- each document may reference 1+ “meta documents”, which are the documents that declaratively describe an essence of a particular document
One of the motivations for StrokeDB was my desire to decentralize some databases. Currently databases are pretty much centralized, like in SaaS you use — you basically host your data at some company’s data center. I believe that in some cases it is not a proper way of managing your data. Due to centralization you put your data security at risk, you need their database software to be really shining fast (because there a lot of clients working with their data), etc. But what I really want is to have my data right where I am working with it (i.e. on my laptop), be able to share it with other parties in a secure way, back it up, etc.
So, yes, I just want to return some data to the client’s computer.
That’s how I came to StrokeDB, which was greatly inspired by Git and my previous experiments in metaframe databases.
Why another document database?
Why not CouchDB/ThruDB/SimpleDB? Well, we had a number of reasons to launch own project:
- We want it to be really lightweight, and basically, embeddable. That’s how it is implemented now — it is just a Ruby library.
- We want to workaround natural limitations of the mentioned DBs. CouchDB does not support code injection to the database core, indexes in particular (like in PostgreSQL). SimpleDB is hosted elsewhere, supports very primitive queries, not extendable. ThruDB supports only keyword-based search index (no special indexes). Also, partitioning and distribution is done via SimpleDB.
- We want to build a system on the top of concept of asynchronous operation. We do not rely on locking or a synchronous conflict resolution (aka optimistic locking). Well-designed asynchronous workflow leads to several useful features: unlimited data distribution, offline work, replication-based load balancing, data consistency, availability and fast access altogether.
Metadocuments?
Here is a simple example of metadocuments usage: Imagine you have document that represents some concrete apple:
some_apple:
weight: 3oz
color: green
price: $3
it could have three metadocuments that “describe it”: Apple, Fruit and Product:
some_apple:
__meta__: [Apple, Fruit, Product]
weight: 3oz
color: green
price: $3
Upon this document load ruby object will be extended by three modules (Apple, Fruit and Product).
For example, you have them defined as
Apple = Meta.new
Fruit = Meta.new do
def green?
color == 'green'
end
end
Product = Meta.new do
def sell!
# ...
end
end
So when you load that some_apple document (by finding it with slot-based search, or by its UUID), you will have an object that also responds to #green? and #sell! methods.
It will also will respond positively to #is_a?(Apple), #is_a?(Fruit), #is_a?(Product)
Some examples?
Here you go:
config = StrokeDB::Config.new(true)
config.add_storage :mem, :memory_chunk
config.add_storage :fs, :file_chunk, 'test/storages/test'
config.chain :mem, :fs
config[:mem].authoritative_source = config[:fs]
config.add_storage :index_storage, :inverted_list_file, 'test/storages/index'
config.add_index :default, :inverted_list, :index_storage
config.add_store :default, :skiplist, :mem, :cut_level => 4
User = Stroke::Meta.new
unless u = config.indexes[:default].find(:__meta__ => User.document, :email => "someemail@gmail.com").first
puts "User not found, creating new user"
u = User.new :email => "someemail@gmail.com"
u.save!
else
puts "We've found him!"
end
puts u
config[:mem].sync_chained_storages!
What do we still miss?
A lot:
- Transactions (though we have some building blocks ready to build them)
- Replication (but again, we have building blocks for streaming replication already)
- Efficient indexes
- Nice API (time cures this disease!)
But hey, it was only two weeks of hacking — so stuff is definitely coming.
Questions? Ideas?
Join our mailing list
Lilu updates
Lilu got another public git repo
I’ve uploaded some slides about recent Lilu updates I was talking about in Oslo few days ago. I have crafted it in the train heading to Oslo, just few hours before actual presentation, so it is far from being perfect :)
Have a good weekend!




