How do you write a fast program? Easy: It should only do what is absolutely necessary with the smallest number of steps possible. Isn't that ultimate laziness?

Lazy == Fast !

With this idea in mind we were looking for ways to do less work in our query processor and to empower your application to be able to tell db4o to do less work. The solution we came up with is simple and straightforward: Instead of fully processing queries upon calls to Query#execute(), we only choose the best index, create an iterator against it and do all the rest of the query processing object-per-object while your application iterates through the ObjectSet and calls #next()/#MoveNext().

This new feature is completed, it is in SVN and we are very happy with the excellent benchmark results that we are measuring. We find that lazy queries have even more advantages than we had thought about:

(1) Lazy queries are extremely fast for partial resultsets.

When you run a query you are sometimes not interested in iterating through the entire result ObjectSet. Sometimes you just want one good result or you are maybe only interested in 50 good results so you can display them in a GUI. In this case you do not want to run the entire query completely before you get the first result back. You are perfectly OK with getting the first 50 results and stopping the query there.

 

(2) No more long running queries

Have you ever experienced the phenomena that you fire a query and get a blocked server for seconds or even minutes? With lazy queries it just can't happen. Your application gets back control after every single object or after a configured number of objects. Your application can decide how much CPU power it wants to use for a specific query. If a query does not work good enough because of a user entry that returns millions of results, you can time query processing out after any number of objects, whenever you want.

 

(3) Zero memory consumption

Our lazy queries do not need an intermediate representation as a set of IDs. We work exclusively with iterators. With this approach a lazy query ObjectSet does not have to cache a single object or ID. The memory consumption for a query is practically zero, no matter how large the resultset is going to be.

 

(4) Parallel query execution and query result processing

Lazy queries start delivering query results before the entire query has completed. That's cool! You can start using these results already, while another thread completes the rest of the query.

 

So much for theory, here are some concrete results, running our adapted Poleposition benchmark suite with db4o 5.7 and db4o 6.0 preview in IMMEDIATE, SNAPSHOT and LAZY query mode:

http://www.db4o.com/downloads/Pp60.pdf

All results look very good.

Our marketing department loves to publish concrete "factor x" improvements. Please take a look at Indianapolis#getOneFromBigRangeQuery(), this one is a new high score. For this specific usecase db4o performance has improved from 140166 milliseconds to 6 milliseconds. Let's convert the ratio: Something that used to take more than six hours is now done in a single second.

(I can already see the headline when marketing releases this to the press: db4o v6 is now 23361 x faster.)

 

To use this new feature you only need to know one single configuration method:

Db4o.configure().queries().evaluationMode(QueryEvaluationMode mode);

(Our .NET team will surprise you with what they make of the above Java syntax. More news to come soon.)

 

You can pass three different constants to this method:

QueryEvaluationMode.IMMEDIATE

QueryEvaluationMode.SNAPSHOT

QueryEvaluationMode.LAZY

 

The behaviour can also be configured directly for an ObjectContainer, by calling

ObjectContainer#ext().configure().queries().evaluationMode()

Since the client configuration determines the mode to be used in a Client/Server setup, every single query can be configured individually.

For a more detailed explanation and a a differentiation between LAZY mode and SNAPSHOT mode please refer to the API documentation for QueryConfiguration#evaluationMode().

 

There are some points to be aware of when you use lazy queries:

- ObjectSet#size() is a very expensive method for lazy queries. It forces the query to be fully evaluated.

- LAZY mode is not fully compatible to concurrent Client/Server updates. For now we recommend SNAPSHOT mode for Client/Server, if other transactions can possibly modify candidate objects while a lazy query is processing. It is also a possible option to use a system of semaphores, for instance to prevent updates while lazy queries are being processed.

- If you iterate through a lazy ObjectSet and update or delete the returned objects, you may possibly still influence the result of the query at this time. If you use lazy queries and if you want to update or delete objects, it can make sense to put all objects that you want to touch into your own list first, just like you would do it to prevent concurrent modifications in collections.

Because of the above issues we decided to make the behaviour configurable, so all our users can get what they really need. For now the default setting continues to be IMMEDIATE mode, so the default behaviour is 100% consistent with previous versions.

 

From looking at our benchmark results we expect that many users will experience huge performance improvements. Please let us know how lazy queries work for you!

The feature is in SVN and will be made available in a development build on next tuesday, November 14.