Skip to the content.

LevelDB

LevelDB is a Google-created, key-value database. It maps string keys to string values and stores them in byte arrays. It doesn’t have a command line interface, and it is not relational like SQL databases. It’s goal is to be simple and provide fast, sequential reads over the data.

LevelDB is used in the Chrome browser and in Chrome-based apps/extensions. While there are programming libraries (e.g. python’s leveldb and golang’s goleveldb) and forensic tools (e.g, hindsight) that can access and read the databases, using them alone can leave a lot of information behind.

The database uses snappy compression, so key word searches may not result in matches if your tool set doesn’t first decompress them. However, the log files, which contain uncommitted data, are not compressed and it is keyword hits in these files that first brought the database to my attention.

This explanation is primarily designed to help you recognize a LevelDB implementation and explain what you can expect to find in it’s various files.

Structure

The database is not composed of one file, but many. They work together to provide fault tolerance and create an ordered structure.

Analysis

From the structure, one can see that the log files contain recent database changes and the sorted table files contain previous commits to the database. Opening the database programmatically will cause changes to these files and you may miss valuable data.

I recently found Discord chat messages in a LevelDB log file, but upon opening the database, the chat content wasn’t present. This means that in the log, the data was added, and then deleted before being committed to sorted tables, and had I only studied the open database data, I would have missed the chat.

That said, there is always value in seeing the data as intended. Just make sure your are operating on a copy of the data.

Examining Individual Database Files vs Opening the Database

To collect the most data from a LevelDB, the individual components need to be read and then considered as a whole. A read of the Info logs will quickly demonstrate how the database is recording and deleting values on an ongoing basis, but it is doing so by writing the changes (additions and deletions) to the log files before committing the changes to the sorted tables. However, opening the database merges

Log File Analysis

Log files are not compressed and can be interpreted in a hex editor. The file format is partially documented by Google.

Generally, the data is recorded in the format of:

The documentation clearly defines the record header format, but not the data format. I have observed a pattern of start-byte (\x01), length, key content, length, value content, but it is a little more complex.

I am presently attempting to create a python tool to extract the contents of a log file.

Sorted Table Analysis

Because the sorted tables are compressed, you may not get keyword hits. But unless your tool set decompresses the tables, you might be missing data. The file format is documented by Google.

I plan to follow the log file parsing tool with a sorted table parsing tool.