Experimental has_many support

Posted by yrashk

I am happy to announce that StrokeDB got an experimental has_many support recently.

Now you can easily collect documents that refer to your document by defining has_many for your meta:

(All examples shown are done in test/console)

 
Playlist = Meta.new do
  has_many :songs
end
    ==>Playlist
Song = Meta.new
    ==>Song
playlist = Playlist.create!(:name => "My playlist")
    ==>#<Playlist name: "My playlist", __version__: 330f...>
song = Song.create!(:name => "My song", :playlist => playlist)
    ==>#<Song name: "My song", __version__: 530f..., playlist: #<Playlist name: "My playlist", __version__: 330f...>>
playlist.songs
    ==>[#<Song name: "My song", __version__: 530f..., playlist: #<Playlist name: "My playlist", __version__: 330f...>>]
 

So, here we define has_many :songs, which inflates :songs to Song meta and uses Song’s :playlist slot as a reference to Playlist. Like in ActiveRecord, this definition uses configuration by convention. And again, like in ActiveRecord, you can tweak it:

 
Playlist = Meta.new do
  has_many :all_songs, :through => :songs, :foreign_reference => :belongs_to_playlist
end
    ==>Playlist
Song = Meta.new
    ==>Song
playlist = Playlist.create!(:name => "My playlist")
    ==>#<Playlist name: "My playlist", __version__: 3245...>
song = Song.create!(:name => "My song", :belongs_to_playlist => playlist)
    ==>#<Song name: "My song", __version__: 5245..., belongs_to_playlist: #<Playlist name: "My playlist", __version__: 3245...>>
playlist.all_songs
    ==>[#<Song name: "My song", __version__: 5245..., belongs_to_playlist: #<Playlist name: "My playlist", __version__: 3245...>>]
 

Here we use :through option which tells StrokeDB to use Song meta to find documents, and :foreign_reference specifies Song’s slot name for the reference to Playlist. You can also add some conditions to has_many:


Playlist = Meta.new do
  has_many :rock_songs, :through => :songs, :foreign_reference => :belongs_to_playlist, :conditions => {:genre => "Rock"}
end
    ==>Playlist
Song = Meta.new
    ==>Song
playlist = Playlist.create!(:name => "My playlist")
    ==>#<Playlist name: "My playlist", __version__: 3cd6...>
rock_song = Song.create!(:name => "My song", :belongs_to_playlist => playlist, :genre => "Rock")
    ==>#<Song name: "My song", __version__: 5cd6..., belongs_to_playlist: #<Playlist name: "My playlist", __version__: 3cd6...>, genre: "Rock">
pop_song = Song.create!(:name => "My song 2", :belongs_to_playlist => playlist, :genre => "Pop")
    ==>#<Song name: "My song 2", __version__: 6cd6..., belongs_to_playlist: #<Playlist name: "My playlist", __version__: 3cd6...>, genre: "Pop">
playlist.rock_songs
    ==>[#<Song name: "My song", __version__: 5cd6..., belongs_to_playlist: #<Playlist name: "My playlist", __version__: 3cd6...>, genre: "Rock">]

Isn’t it nice? But lets go further. What if you want to know all authors of music in your playlist? That’s quite simple!

 
Playlist = Meta.new do
  has_many :authors, :through => [:songs, :author]
end
    ==>Playlist
Song = Meta.new
    ==>Song
playlist = Playlist.create!(:name => "My playlist")
    ==>#<Playlist name: "My playlist", __version__: 3903...>
song = Song.create!(:name => "My song", :playlist => playlist, :author => "John Doe")
    ==>#<Song name: "My song", __version__: 5903..., author: "John Doe", playlist: #<Playlist name: "My playlist", __version__: 3903...>>
playlist.authors
    ==>["John Doe"]

or

 
Playlist = Meta.new do
  has_many :authors, :through => [:songs, :author]
end
    ==>Playlist
Song = Meta.new
    ==>Song
Author = Meta.new
    ==>Author
playlist = Playlist.create!(:name => "My playlist")
    ==>#<Playlist name: "My playlist", __version__: 3b19...>
author = Author.create!(:name => "John Doe")
    ==>#<Author name: "John Doe", __version__: 5b19...>
song = Song.create!(:name => "My song", :playlist => playlist, :author => author)
    ==>#<Song name: "My song", __version__: 7b19..., author: #<Author name: "John Doe", __version__: 5b19...>, playlist: #<Playlist name: "My playlist", __version__: 3b19...>>
playlist.authors
    ==>[#<Author name: "John Doe", __version__: 5b19...>]

So here in these examples has_meta fetches all Songs and gets all their :author slots.

So here it is. Current has_many implementation is quite experimental and might change later (for example, we’re still thinking about improving :conditions stuff for :through => [...] case, since :conditions are currently applying only to Songs). And most probably it has some bugs :)

How to play with StrokeDB easily

Posted by yrashk

As I’ve told in the previous post about StrokeDB, we’ve added a simplistic test console to StrokeDB recently.

Here is an instruction how to use it.

First, you’ll need StrokeDB :) You can get it easily with Git:


$ git clone git://gitorious.org/strokedb/mainline.git strokedb

Then, change your work directory and start console:


$ cd strokedb/strokedb-ruby
$ ./test/console --prompt xmp

(—prompt xmp isn’t obligatory)

Now you can play with it:


$ ./test/console --prompt xmp
StrokeDB 0.0.1 Console
Type 'h' for help
User = Meta.new  # here we define User
    ==>{User meta module}
User.create! :login => "test", :email => "test@foobar.com" 
    ==>#<{User} __version__: 3c3c..., login: "test", email: "test@foobar.com">
save!
    ==>true
quit
$ ./test/console --prompt xmp
StrokeDB 0.0.1 Console
Type 'h' for help
User = Meta.new  # here we define User
    ==>{User meta module}
User.find(:login => "test") # let's fine Users by login
    ==>[#<{User} __version__: 3c3c..., login: "test", email: "test@foobar.com">]
u = _.first
    ==>#<{User} __version__: 3c3c..., login: "test", email: "test@foobar.com">
u.name = "John Doe" # update user's name
    ==>"John Doe" 
u.save! # save user
    ==>#<{User} name: "John Doe", __version__: 1e50..., __previous_version__: 3c3c..., login: "test", email: "test@foobar.com">
u.name = "John Doe" # get name
    ==>"John Doe" 
u.email
    ==>"test@foobar.com" # get email
clear! # wipe out database
    ==>true
User.find(:login => "test")
    ==>[] # nobody found!
quit

You can also type ‘h’ to get some help.

Hope you’ll enjoy.

StrokeDB short intro

Posted by yrashk

I’ve decided to write a short introductory material about StrokeDB usage.

Disclaimer: StrokeDB is still pretty young (still less than one month of development), so I can’t promise that API shown will remain the same forever. In fact, some portions of it will definitely change

First, lets load StrokeDB and initialize it:


require "strokedb" 
StrokeDB::Config.build :default => true, :base_path => 'some_test'

I willn’t go into guts of config builder (that’s a bit complicated for those who are new to StrokeDB, but I’ll probably post some materials about it later).

So effectively, now we have a database initialized and it’s base path for file storages is some_test/

Now, we’re going to get into some fun. I’ll define several metas.

User:


User = StrokeDB::Meta.new do
  def to_s
    self[:name]
  end
end

Unlike ActiveRecord, StrokeDB uses a kind of mixin model. Each document can have any number of metadocuments it refer to. As I described previously metadocuments are documents that describe document’s essense and Ruby modules to extend Document’s behavior.

In the above code, I am defining User meta, which have only one method #to_s, which will render slot ‘name’.

Now, a little bit more complex example:


Buyer = StrokeDB::Meta.new do

  on_initialization do |buyer|
    unless buyer[:balance]
      puts "Providing $100 to #{buyer}, since he is a new buyer" 
      buyer.balance = 100 
    end
    unless buyer[:products_bought]
      buyer.products_bought = []
    end
  end

  after_save do |buyer|
    puts "Now #{buyer} has #{buyer.products_bought.empty? ? 'nothing' : buyer.products_bought.map(&:name).to_sentence} (and his balance is $#{buyer.balance})" 
  end

  def buy!(product)
    puts "#{self} is buying #{product}" 
    product.checkout!
    self.products_bought << product
    self.balance -= product.price
    save!
  end
end

Buyer is another metadocument that defines Buyer-specific functionality. Besides #buy! method it defines two callbacks: on_initialization and after_save. Their names are pretty self-descriptive (I hope)

And here is a last metadocument:


Product = StrokeDB::Meta.new do
  after_save do |product|
    puts "#{product.quantity} items of #{product} left" 
  end

  def to_s
    "'#{name}' for $#{price}" 
  end

  def checkout!
    self.quantity -= 1
    save!
  end
end

Nothing really new comparing to Buyer. So lets go further.


u = User.new(:name => "Yurii")
u.metas << Buyer
u.save!

Here I create a document with meta ‘User’ and add ‘Buyer’ meta to it, so it is both User and Buyer at the same time!


apple = Product.create!(:name => "green apple", :price => 2,:quantity => 100)
pizza = Product.create!(:name => "big pizza", :price => 15,:quantity => 5)
u.buy!(apple)
u.buy!(pizza)

In the above lines I create two products, apple and pizza and use Buyer’s #buy! method to purchase them.

Here is an output of this test code:


Providing $100 to Yurii, since he is a new buyer
Now Yurii has nothing (and his balance is $100)
100 items of 'green apple' for $2 left
5 items of 'big pizza' for $15 left
Yurii is buying 'green apple' for $2
99 items of 'green apple' for $2 left
Now Yurii has green apple (and his balance is $98)
Yurii is buying 'big pizza' for $15
4 items of 'big pizza' for $15 left
Now Yurii has green apple and big pizza (and his balance is $83)

Here we are, everything works!

Also, if you’ll inspect user’s document, you’ll see the following:


#<{User,Buyer} name: "Yurii", __version__: c33e..., products_bought: [#<{Product} name: "green apple", __version__: 933e..., price: 2, quantity: 99, __previous_version__: 733e...>, #<{Product} name: "big pizza", __version__: b33e..., price: 15, quantity: 4, __previous_version__: 833e...>], __previous_version__: a33e..., balance: 83>


Notice that in the beginning, it is defined as {User,Buyer} — both metas are displayed. This way you can easily understand what this document is actually.

Ok, I hope that’s enough for the beginning. You can see complete test source code at Gitorious

Also I would like to mention that we’ve added test console recently

Join our mailing list or get source code

P.S. I need to warn you: StrokeDB is quite immature, definitely has bugs and definitely will evolve. Period.

StrokeDB persistable incremental views

Posted by yrashk

This weekend StrokeDB got so called “persistable incremental views”. What is this?

Well, lets start from View concept. It is basically a map-reduce filter with map and reduce functions defined in Ruby.

By default, it maps all documents and lets you reduce them (lets say we want to find users with age > 21):

 
   my_view = View.create!(:name => "my view").reduce_with {|doc| doc.is_a?(User) && doc.age > 21 }
 

Or, you can specify your own map block (if you need to create new documents set to be reduced):

 
   my_view = View.create!(:name => "my view").map_with do |doc|
                     new_doc = Document.create!(:doc => doc)
   end.reduce_with {|doc| doc.doc.is_a?(User) && doc.doc.age > 21 }
 

To get results, simply use

 
  my_view.emit.to_a # or my_view.emit.documents, that's the same
 

Okay, that’s simple. We map documents to documents and then reducing them using some criteria. Also I would like to mention that Views could be argument-polymorphic. If you’ll define your map and reduce blocks having more than one argument, you can emit results using some parameters:

 
   my_view = View.create!(:name => "my view").reduce_with {|doc,age| doc.is_a?(User) && doc.age > age }
   my_view.emit(21).to_a
 

I think that’s simple and nice :)

Now incremental views come in. When you call my_view.emit View emits first “view cut” which is a set of documents map/reduced for the whole database. Now, you can use this view cut to get new view updates:
   
     first_cut = my_view.emit 
     # ... work with database, add some new documents, update old documents
     next_cut = first_cut.emit
   

next_cut view cut will contain only newly created/updated documents — so, you get updates incrementally.

Now, what about persistency declared above? That’s really simple — View and ViewCut are documents themselves — so you can easily save them and reuse later!

P.S. Currently Views are pretty slow — but things will change hopefully

P.P.S. Incremental views are really, really young in StrokeDB so I can’t promise that they are bug-free. Also API isn’t stable by any means (yet!).

Get StrokeDB

nyc << yrashk

Posted by yrashk

By the way, I will be in New York City (and may be some other cities) on Feb 17-26. I’d be happy to talk about funny things (document databases, Ruby on Rails, etc.), drink some tea, etc. with somebody who enjoy this stuff too!

StrokeDB goes public

Posted by yrashk

For the past two weeks Oleg Andreev and me spent most of our time working on a stuff we enjoyed really a lot — StrokeDB project

What’s it?

StrokeDB is a lightweight approach to document-oriented database, currently implemented in Ruby. The concept is pretty much simple:

  • each document is uniquely identified by UUID
  • each document has a set of slots, which are basically key/value pairs, where key is a string and value is a simplistic data structure (boolean, number, string, array, hash — like in JSON)
  • each time you update documents, its version is updated. Version is basically a hash-function for document content.
  • reference to previous version is automatically maintained by StrokeDB
  • each document may reference 1+ “meta documents”, which are the documents that declaratively describe an essence of a particular document

One of the motivations for StrokeDB was my desire to decentralize some databases. Currently databases are pretty much centralized, like in SaaS you use — you basically host your data at some company’s data center. I believe that in some cases it is not a proper way of managing your data. Due to centralization you put your data security at risk, you need their database software to be really shining fast (because there a lot of clients working with their data), etc. But what I really want is to have my data right where I am working with it (i.e. on my laptop), be able to share it with other parties in a secure way, back it up, etc.

So, yes, I just want to return some data to the client’s computer.

That’s how I came to StrokeDB, which was greatly inspired by Git and my previous experiments in metaframe databases.

Why another document database?

Why not CouchDB/ThruDB/SimpleDB? Well, we had a number of reasons to launch own project:

  • We want it to be really lightweight, and basically, embeddable. That’s how it is implemented now — it is just a Ruby library.
  • We want to workaround natural limitations of the mentioned DBs. CouchDB does not support code injection to the database core, indexes in particular (like in PostgreSQL). SimpleDB is hosted elsewhere, supports very primitive queries, not extendable. ThruDB supports only keyword-based search index (no special indexes). Also, partitioning and distribution is done via SimpleDB.
  • We want to build a system on the top of concept of asynchronous operation. We do not rely on locking or a synchronous conflict resolution (aka optimistic locking). Well-designed asynchronous workflow leads to several useful features: unlimited data distribution, offline work, replication-based load balancing, data consistency, availability and fast access altogether.

Metadocuments?

Here is a simple example of metadocuments usage: Imagine you have document that represents some concrete apple:


some_apple: 
        weight: 3oz 
        color: green 
        price: $3 

it could have three metadocuments that “describe it”: Apple, Fruit and Product:


some_apple: 
        __meta__: [Apple, Fruit, Product] 
        weight: 3oz 
        color: green 
        price: $3 

Upon this document load ruby object will be extended by three modules (Apple, Fruit and Product).

For example, you have them defined as


Apple = Meta.new
Fruit = Meta.new do 
        def green? 
                color == 'green' 
        end 
end 
Product = Meta.new do 
        def sell! 
                # ... 
        end 
end 

So when you load that some_apple document (by finding it with slot-based search, or by its UUID), you will have an object that also responds to #green? and #sell! methods.

It will also will respond positively to #is_a?(Apple), #is_a?(Fruit), #is_a?(Product)

Some examples?

Here you go:


config = StrokeDB::Config.new(true)

config.add_storage :mem, :memory_chunk
config.add_storage :fs, :file_chunk, 'test/storages/test'

config.chain :mem, :fs
config[:mem].authoritative_source = config[:fs]

config.add_storage :index_storage, :inverted_list_file, 'test/storages/index'
config.add_index :default, :inverted_list, :index_storage

config.add_store :default, :skiplist, :mem, :cut_level => 4

User = Stroke::Meta.new
unless u = config.indexes[:default].find(:__meta__ => User.document, :email => "someemail@gmail.com").first
  puts "User not found, creating new user" 
  u = User.new :email => "someemail@gmail.com" 
  u.save!
else
  puts "We've found him!" 
end
puts u

config[:mem].sync_chained_storages!

What do we still miss?

A lot:

  • Transactions (though we have some building blocks ready to build them)
  • Replication (but again, we have building blocks for streaming replication already)
  • Efficient indexes
  • Nice API (time cures this disease!)

But hey, it was only two weeks of hacking — so stuff is definitely coming.

Questions? Ideas?

Join our mailing list