Have you ever wondered how exactly the space inside your db4o database file is used? You can now take a peek under the hood, using a newly introduced command line tool that will give you a statistical overview on the distribution of bytes over the major components of the db4o marshalling format. Currently this tool is rather minimalist and experimental, it will certainly become more expressive and detailed in the future.

The main entry point is

com.db4o.filestats.FileUsageStatsCollector

in the db4oj.optional module. You can use it as a command line tool, passing the path to the database as the single argument:

java -classpath <db4o classpath> com.db4o.filestats.FileUsageStatsCollector <path to db>

Alternatively, you can use it from inside your application to obtain a FileUsageStats instance:

FileUsageStats stats = FileUsageStatsCollector.runStats(dbPath);

FileUsageStats#toString() will yield a formatted summary of the results just as it is printed to stdout in command line mode, but you can also access the single data points via getters - please check the API documentation for details.

We will exercise the tool using a tiny example database. Let's assume the following object model...

public class Item {
  private int _id;
  private String _name;

  public Item(int id, String name) {
    _id = id;
    _name = name;
  }
}

public class Holder {
  public List<Item> _items;

  public Holder(List<Item> items) {
    _items = items;
  }
}

...and populate a database from this model:

config.common().objectClass(Item.class).objectField("_name").indexed(true);
config.file().generateUUIDs(ConfigScope.GLOBALLY);
config.file().generateVersionNumbers(ConfigScope.GLOBALLY);
[...]
for (int idx = 0; idx < NUM_HOLDERS; idx++) {
  ArrayList<Item> items = new ArrayList<Item>();
  for (int itemIdx = 0; itemIdx < idx; itemIdx++) {
    items.add(new Item(itemIdx, idx + "/" + itemIdx));
  }
  db.store(new Holder(items));
}

Running the tool against this database will yield output like this:

com.db4o.StaticClass
               Slots:            0
         Class index:           30
       Field indices:            0
               Total:           30
com.db4o.StaticField
               Slots:            0
         Class index:           30
       Field indices:            0
               Total:           30
com.db4o.ext.Db4oDatabase
               Slots:           70
         Class index:           38
       Field indices:            0
               Total:          108
com.db4o.samples.filestats.Holder
               Slots:         4600
         Class index:          430
       Field indices:            0
               Total:         5030
com.db4o.samples.filestats.Item
               Slots:       295020
         Class index:        21055
       Field indices:        61047
               Total:       377122
java.util.AbstractCollection
               Slots:            0
         Class index:          444
       Field indices:            0
               Total:          444
java.util.AbstractList
               Slots:            0
         Class index:          448
       Field indices:            0
               Total:          448
java.util.ArrayList
               Slots:        66800
         Class index:          434
       Field indices:            0
               Total:        67234

         File header:          242
           Freespace:        67011
           ID system:        66520
      Class metadata:         1237
     Freespace usage:            0
          UUID usage:        63517

               Total:       648973
         Unaccounted:            0
                File:       648973

A few words of explanation...

  • In a (very tiny) nutshell, db4o views the database file as a conglomerate of slots, a slot just being a segment of the file with a start address and a length. A slot may contain a marshalled object, a BTree page, a class metadata definition, etc. References between slots are represented by logical IDs. The ID system maintains a mapping from logical IDs to slots. The freespace manager keeps track of "abandoned" slots that may be re-used (as preferred to growing the database file).
  • The unit for all numbers given is byte.
  • The tool lists the stats for all classes found in the database, followed by the stats for "global" items that are not related to a single, specific class.
  • To achieve "full coverage" (i.e. no unaccounted bytes), classes internal to db4o (such as Db4oDatabase) are listed as well.
  • For each class, the output will differentiate between the space used for actual object instance slots, for the class instance index, for field indices (if there are any).
  • The "global" compartment covers the file header, reusable freespace, the ID system, the class metadata repository (including the actual class metadata definitions), the space used by the freespace system itself (which will be 0 in the case of the default in-memory freespace manager), and the UUID index (if it exists).
  • Finally, the tool will print the aggregated sum of all entries, along with the actual file size, the difference hopefully being 0. (However, since this tool is still work in progress, some of the more exotic configurations may not be covered, yet.)

A sibling to this tool is the consistency checker. If you suspect your database file to suffer from corruption (which should be a very rare occasion), you can quick-check using

java -classpath <db4o classpath> com.db4o.consistency.ConsistencyChecker <path to db>

Running this against the database we created earlier on, you should hopefully just get

no inconsistencies detected

Just for the purpose of demonstration, let's force an inconsistency by forcefully freeing the slot of an existing object instance. (This is only possible using db4o internal API. I know this is stating the obvious, but just to be sure: Please don't try this at home!)

int id = (int) localDb.getID(localDb.query(Holder.class).get(0));
Slot slot = localDb.idSystem().committedSlot(id);
localDb.freespaceManager().free(slot);

Re-running the consistency checker yields:

INCONSISTENCIES DETECTED
1 overlaps
0 bogus slots
0 invalid class ids
0 invalid field index entries
(slot lengths are non-blocked)
OVERLAPS
Pair.of([A:662,L:26](Freespace), [A:662,L:26](IdSystem))
  • In the overlaps section, all "collisions" within the ID system and the freespace system are reported. This includes collisions within one system as well as collisions between the two systems. In our case, we find the latter: The same slot is registered with the ID system as well as the freespace system, which is clearly a recipe for disaster.
  • The bogus slots section will contain slots with invalid (i.e. negative) address values.
  • Invalid class IDs will list any "dangling" references from the class indices, i.e. object instance IDs that have not been registered with the ID system.
  • Similarly, invalid field index entries will be reported if a field index refers to a parent object ID that is unknown to the ID system.

Note that both tools won't handle legacy databases, in particular those that have been created prior to the introduction of the BTree based ID system!

These tools have already proven quite valuable for debugging sessions. The consistency checker has already been made part of our continuous build - we have included a run of our JDK5 test suite against a fixture that will inject a check/defrag/check sequence whenever a database is opened/reopened/closed during the tests. We will certainly continue to improve and extend both tools.

Usage stats and consistency checks can already be used programmatically on the .NET side, but they have to be integrated into Db4oTool for command line usage, still.

If you have suggestions or comments, please let us know.