Tweaking the configuration options for any sufficiently complex library or application can often become indistinguishable from black magic for the "uninitiated". Turning the knob for one single setting, you usually can't expect linear changes in runtime behavior - in most cases there's a variety of boundary conditions and side effects to be taken into account. To avoid random probing through an abundance of seemingly erratic changes in behavior, it's mandatory to know what internals are affected by the given configuration setting, and to monitor the effects of changes to this setting experimentally to verify expectations. In this post, we'd like to exercise this throughout a simple example involving two features that have recently been introduced to db4o: Deep Prefetching for C/S mode and runtime statistics. (Even if you don't use db4o in C/S mode, it may still be worthwhile to keep on reading, since some of the observations made apply to db4o configuration optimization in general - or actually even more, to performance tweaking in general.)

As a sample application, we're using a rather minimalist and artificial "task manager" application. A task simply consists of a name and a (potentially empty) collection of subtasks.

We recommend to download the full sample application project attached to this post and refer to the sources as you read along.

The application simply consists of a client requesting full task object graphs from a server hosting a pre-created database. To simplify matters even further, all root tasks in the database have the same size, i.e. they all represent complete n-ary trees of depth d, n and d being configurable for different runs.

The relevant application settings are:

  • NUM_ROOT_TASKS: The number of top level tasks in the database
  • NUM_SUBTASKS: The number of subtasks each task owns
  • TASK_DEPTH: The depth of a root task object graph
  • RESULT_SIZE: The number of root task graphs retrieved in a single query

We'll compare three different combinations of settings for the configuration options relevant for deep prefetching:

  • No prefetching at all (no_prefetch)
  • Prefetching the full graph for a single root task (single_prefetch)
  • Prefetching the full result set (full_prefetch)

Due to the artifical nature of our setup, we can give accurate numbers for each of these settings - in real world scenarios, this will have to be replaced by heuristic approximizations. Here are the configuration settings used:

  prefetchDepth
prefetchObjectCount
prefetchCacheSize
no_prefetch
 0  0 0
single_prefetch  GRAPH_SIZE * 2
1
GRAPH_SIZE * 2
 full_prefetch  GRAPH_SIZE * 2
RESULT_SIZE
RESULT_SIZE * GRAPH_SIZE * 2

GRAPH_SIZE is the total task count per root graph, i.e. (1 - NUM_SUBTASKS^TASK_DEPTH) / (1 - NUM_SUBTASKS). The number of objects as seen by db4o is twice this number, since there is one additional ArrayList per task to keep the subtasks.

For our scenario, we'll use the following application settings:

  • NUM_ROOT_TASKS = 500
  • NUM_SUBTASKS = 5
  • TASK_DEPTH = 5
  • RESULT_SIZE = 10

GRAPH_SIZE for these settings is 781, the total number of objects retrieved in a single query is 15620 accordingly.

Running the application in no_prefetching mode with timed readings yields the following results:

Prefetch depth: 0
Prefetch object count: 0
Prefetch cache size: 0
AVG QUERY TIME: 1805.30 ms
AVG ACTIVATION TIME: 9844.80 ms
TOTAL TIME: 582616 ms

We certainly do hope there's room for improvement here, so let's take a peek at client networking activity through our JMX runtime statistics:

We can see that there's a lot of message exchange going on between client and server. With no prefetching enabled, the client will request the data for each single object separately during activation.

Switching to full_prefetching, we get the following results:

Prefetch depth: 10
Prefetch object count: 10
Prefetch cache size: 15620
AVG QUERY TIME: 5204.46 ms
AVG ACTIVATION TIME: 1011.90 ms
TOTAL TIME: 310848 ms

Let's take a look at the networking statistics for this scenario:

With this setup, the complete query result (15620 objects) is transferred to the client in one single message. Accordingly, we see a significant drop in the number of messages sent by the client: Instead of a continuous stream of single object data requests, we are now left with the query request messages exclusively. Note that query execution time takes significantly longer than in the no_prefetching setup, since the actual network transfer now is measured as part of this reading. Activation is much faster, because it's now happening completely on the client, using the client-side slot cache, without any intermediate message exchange.

Nevertheless, there doesn't seem to be a steady flow of incoming bytes. Instead we find a sequence of high activity peaks followed by idling periods. We suspect an overhead incurred by the huge network packet size as well as for client-side cache maintenance. Let's try the middle ground and reduce prefetching to what's needed to transfer a single root task graph each time ObjectSet#next() is called:

Prefetch depth: 10
Prefetch object count: 1
Prefetch cache size: 1562
AVG QUERY TIME: 1795.88 ms
AVG ACTIVATION TIME: 539.92 ms
TOTAL TIME: 116819 ms

Cool, this setup gave us the best results. We have improved 5x over the no_prefetching setup, and almost 2x over the full_prefetching. For completeness, let's take a peek at network activity for this scenario:

We find slightly increased message exchange (because now we get one request per result set element graph) and a much more steady flow of incoming bytes.

Some thoughts to take away from this experiment:

  • The best prefetching settings highly depend on your object model and the characteristics of your queries. A good rule of thumb is to provide enough cache to accommodate for a single result set entry subgraph.
  • Try to approximate the "sweet spot" incrementally and verify your assumptions by monitoring the changes in behavior.
  • Results may differ depending on network quality - the figures we present here have been gathered from an intranet setup. The slower your network is, the more impact you may gain from reduced message exchange.