Wooohooo! Our new BTree Field Index Query Processor is now available from our SVN for testing! This big step forward will be released as "Development Build 5.7" in the coming week.
We have been using the Poleposition benchmark to check our progress. The special tests in our own "Indianapolis" circuit for polepos (available from our SVN, project: "db4opolepos") have been great to visualize the effect of our daily work and to motivate us to beat all previous db4o versions by miles.
Feel free to test for youself. In case you want to do that, make sure to get the most recent polepos code from the SourceForge SVN location. Here is the polepos SVN connect string:
https://svn.sourceforge.net/svnroot/polepos
For your convenience I have put our Indianapolis results online here:
http://www.db4o.com/downloads/PpIndianapolis.pdf
Some interpretations:
- queryRange - We have always been good at range queries. Both the old and the new processor are specifically optimized for them.
- query5Links - The old processor used a limit for deep (multi-descend) constraints. Because the new algorithm to go from leaves to roots is so much more efficient we could happily drop this limit. The changes are dramatic.
- queryPreferShortPath - This is a test to see how good the query processor is to choose the right index. The new processor simply starts working from the best leaf. Surprisingly this seems to work just as good or better than the old processor, where there was quite a bit of thought put into continously looking for the best strategy while executing the query.
- queryOr - This is a really big one. With an ORed query against 100,000 objects we improved by 200x .
- queryOrRange - The improvement for ORed ranges is just as dramatic. It is at least 100x for 100,000 objects.
- queryNotGreater - This has always been good. We just wanted to make sure not to loose any speed.
- queryNotRange - As you can see this case still needs some improvement...
- queryOrTwoLevels - as well as this one.
- queryByChildIdentity - This is a case that the new processor apparently does not cover yet. I recall that we had this case running fast already, but we may have lost it when we made all tests pass in our last sessions, avoiding false positives at all cost.
- addSingleObjectAndCommit - This is actually one of the most important cases we wrote new BTrees for: We intend commit time to be improved, because only some parts of the index need to be rewritten on commit. The output of this graphic looks incorrect: We get better performance results with more objects. That looks strange. The test could accidentally measure a side effect for garbage collection for the first result. We will retest this one with more input parameters.
The huge progress on ANDs and ORs has been achieved by writing a completely new BTree implementation. These BTrees have already been deployed in db4o versions since 5.4 where they are used for class indexes. A sidenote: You can try to tune BTrees for your specific usecase by changing the following two configuration switches.
Db4o.configure().bTreeCacheHeight(height);
Db4o.configure().bTreeNodeSize(size);
In the last weeks and months we wrote a completely new query processor that operates on these new BTrees. You can find the sourcecode in com.db4o.inside.btree and com.db4o.inside.btree.algebra.
In the new code we work wth "pointers into BTrees" (a node and an index of an element in a node) as long as possible, avoiding state, reading and memory consumption as much and as long as possible. Two pointers make up a "BTreeRange". Our BTree algebra has code to create unions (OR) and intersections (AND) of ranges.
While we wrote the new code, we developed a great new idea: We see potential to get the in-memory representations of query results very close to zero by working even lazier than we are now. This feature certainly is a candidate to go into our near-term roadmap, feel free to vote for it in Jira.
Enjoy testing new BTree Field Indexes with your application! They should dramatically improve the performance of queries with #and()s and #or()s on single field constraints. Please let us know about your results!
This is not a production release yet. Some performance cases and some memory consumption cases are still work-in-progress, as our Indianapolis Poleposition graphics show.