How to Use LMDB in Python: A Practical Guide to the Lightning-Fast Embedded Database

If you’ve spent any time building applications that need fast local storage, you’ve probably reached for SQLite. But between the memory crunch pushing hardware prices through the roof and applications demanding more throughput than ever, the tools we default to deserve a second look. It’s the default for a reason — it works, it’s everywhere, and it speaks SQL. But sometimes you don’t need a relational database. Sometimes you just need to shove bytes in, pull them out, and do it fast.

That’s where LMDB comes in. It stands for Lightning Memory-Mapped Database, and it just hit version 1.0 last week after more than a decade of production use at companies like OpenLDAP. This thing powers directory services handling millions of lookups per second, and after spending a weekend playing with it, I can tell you — the “lightning” part isn’t marketing fluff.

Here’s how to use it in Python, from zero to a working key-value store that’ll make your SQLite setup look like it’s running through molasses.

Close-up of an open hard disk drive showing internal components — representing data storage and database technology — *Image: epSos.de via Wikimedia Commons (CC BY 2.0)*

Table of Contents

What Makes LMDB Different

Most databases work roughly the same way: you write data, it goes through some buffer, gets serialized to disk, and when you read it back, it gets copied into your application’s memory. Every read involves a malloc and a memcpy somewhere in the chain.

LMDB skips all of that. The entire database is mapped directly into your process’s virtual address space via mmap. When you read a value, you get a pointer straight into the memory-mapped file — zero copies, zero allocations. The operating system handles paging data in and out of RAM automatically, which means hot data stays in the filesystem cache and cold data sits on disk until you need it.

This isn’t just clever engineering — it changes how you think about storage. LMDB feels more like a fast in-memory hash map than a database, but with full ACID transactions and crash recovery baked in.

A few things that make it stand out:

Copy-on-write (COW) — Active pages are never overwritten. Every write creates new pages, and atomically swaps the root pointer on commit. If your app crashes mid-write, the database is fine. There’s no WAL, no recovery procedure, no checkpointing.
Readers never block writers, writers never block readers — LMDB is multi-versioned. Readers always see a consistent snapshot from the moment their transaction started, even while a writer is committing new data. This is the kind of concurrency model you normally need a full server process for.
No maintenance — No vacuum, no compaction, no VACUUM commands. LMDB reuses free pages automatically and the database size stays bounded.
Single writer — Only one write transaction at a time. This sounds like a limitation, but it eliminates an entire class of concurrency bugs. If your write workload is bursty rather than sustained, you’ll never notice it.

The catch? LMDB is not a relational database. There’s no SQL, no joins, no schemas beyond “key goes to value.” If you need complex queries, stick with SQLite or Postgres. If you need a screaming-fast embedded key-value store that’ll survive a power outage without breaking a sweat, LMDB is your tool. (And if you’re running long database benchmarks or training runs, keeping your machine awake is step one.)

Installing LMDB for Python

The Python bindings are dead simple to install. The package bundles a static build of the C library, so you don’t need to install LMDB separately:

pip install lmdb

That’s it. The binding currently sits at version 2.2.1 and wraps LMDB 1.0 underneath. It exposes both a C extension (fast path) and a CFFI fallback (for PyPy and environments where compiling C extensions is a pain).

On Linux, you’ll need a C compiler and Python dev headers if the pre-built wheel doesn’t match your platform. On macOS, xcode-select --install should get you there. On Windows, install patch-ng first (pip install patch-ng), then pip install lmdb — the build process needs it to apply platform patches to the bundled LMDB source.

Your First LMDB Database

Let’s write some code. Every LMDB interaction follows the same pattern: open an environment, start a transaction, do something, commit.

import lmdb

# Open (or create) the environment
env = lmdb.open('/tmp/my-first-lmdb', map_size=10_485_760)  # 10 MB

# Write something
with env.begin(write=True) as txn:
    txn.put(b'greeting', b'Hello from LMDB 1.0!')
    txn.put(b'answer', b'42')
    # Transaction commits automatically when the 'with' block exits

# Read it back
with env.begin() as txn:  # write=False by default (read-only)
    greeting = txn.get(b'greeting')
    answer = txn.get(b'answer')
    print(greeting.decode())  # Hello from LMDB 1.0!
    print(answer.decode())    # 42

env.close()

A few things worth noticing here. First, map_size — this sets the maximum size of the memory-mapped region. It’s essentially a cap on how large your database can grow. Pick a number larger than you think you’ll need; the operating system only allocates physical pages for data you actually write, so setting this to 1 GB doesn’t consume 1 GB of RAM — it just reserves address space.

Second, keys and values are always bytes. If you need to store strings, numbers, or JSON, you’ll need to encode and decode them yourself. This is intentional — LMDB doesn’t impose any data format on you.

Third, transactions are the unit of work. Every read and write happens inside a transaction. The with block commits automatically if the transaction was opened with write=True. If an exception occurs inside the block, the transaction is aborted cleanly.

CRUD Operations: The Complete Set

Let’s walk through create, read, update, and delete — plus a few operations you’ll need in real applications.

import lmdb, json

env = lmdb.open('/tmp/lmdb-crud', map_size=10_485_760)

# === CREATE ===
with env.begin(write=True) as txn:
    txn.put(b'user:1', json.dumps({'name': 'Felix', 'role': 'dev'}).encode())
    txn.put(b'user:2', json.dumps({'name': 'Maria', 'role': 'designer'}).encode())
    txn.put(b'config:theme', b'dark')
    txn.put(b'config:language', b'python')

# === READ ===
with env.begin() as txn:
    user1 = json.loads(txn.get(b'user:1'))
    print(user1)  # {'name': 'Felix', 'role': 'dev'}

    # Check if a key exists
    exists = txn.get(b'user:99') is not None
    print(f'User 99 exists: {exists}')  # False

# === UPDATE ===
with env.begin(write=True) as txn:
    # Replace is just another put with the same key
    txn.put(b'user:1', json.dumps({'name': 'Felix', 'role': 'lead dev'}).encode())

    # Atomic compare-and-swap pattern
    current = txn.get(b'user:1')
    if current is not None:
        data = json.loads(current)
        data['last_login'] = '2026-07-03'
        txn.put(b'user:1', json.dumps(data).encode())

# === DELETE ===
with env.begin(write=True) as txn:
    txn.delete(b'user:2')
    deleted = txn.get(b'user:2')
    print(f'User 2 after delete: {deleted}')  # None

# === BATCH WRITES ===
with env.begin(write=True) as txn:
    for i in range(1000):
        txn.put(f'batch:{i}'.encode(), f'value-{i}'.encode())
    # All 1000 writes are atomic — either all commit or none do

env.close()

That last batch write example is important. LMDB bundles all writes in a transaction into a single atomic commit. If your process dies after writing 500 of those 1,000 entries, none of them are visible — the entire transaction either succeeds or it doesn’t. This is proper ACID behavior, not eventual consistency.

Iterating with Cursors

LMDB stores keys in sorted order (lexicographic by default — you can supply a custom comparison function at environment creation). Cursors let you walk through the database in key order, which is how you’d implement range scans, prefix searches, and bulk exports.

import lmdb

env = lmdb.open('/tmp/lmdb-cursor', map_size=10_485_760)

# Seed some data
with env.begin(write=True) as txn:
    for i in range(10):
        txn.put(f'item:{i:03d}'.encode(), f'data-{i}'.encode())

# Forward iteration
with env.begin() as txn:
    cursor = txn.cursor()
    print('Forward scan:')
    for key, value in cursor:
        print(f'  {key.decode()} = {value.decode()}')

# Positioned iteration — start from a specific key
with env.begin() as txn:
    cursor = txn.cursor()
    # Set cursor at key 'item:005' or the next one after it
    if cursor.set_range(b'item:005'):
        print(f'Starting from: {cursor.key().decode()}')
        for key, value in cursor:
            print(f'  {key.decode()} = {value.decode()}')

# Reverse iteration
with env.begin() as txn:
    cursor = txn.cursor()
    print('Reverse scan:')
    for key, value in cursor.iterprev():
        print(f'  {key.decode()} = {value.decode()}')

# Prefix scan (keys are sorted, so we can exploit that)
with env.begin() as txn:
    cursor = txn.cursor()
    prefix = b'item:00'
    if cursor.set_range(prefix):
        for key, value in cursor:
            if not key.startswith(prefix):
                break
            print(f'  {key.decode()} = {value.decode()}')

env.close()

The prefix scan pattern is worth committing to memory. Because LMDB keys are sorted, a prefix scan is just a range scan that stops when the prefix no longer matches. It’s O(log n + k) where k is the number of matching keys — same complexity class as a B-tree index lookup in a relational database.

Multiple Databases in One Environment

LMDB supports named sub-databases within a single environment, similar to tables in SQLite. Each has its own keyspace, and they share the same transaction, which means you can atomically update multiple databases at once.

import lmdb, json

env = lmdb.open('/tmp/lmdb-multidb', map_size=10_485_760, max_dbs=5)

# Open named databases
users_db = env.open_db(b'users')
sessions_db = env.open_db(b'sessions')
config_db = env.open_db(b'config')

# Atomic multi-database write
with env.begin(write=True) as txn:
    txn.put(b'felix', json.dumps({'role': 'admin'}).encode(), db=users_db)
    txn.put(b'session-abc123', json.dumps({'user': 'felix', 'expires': 3600}).encode(), db=sessions_db)
    txn.put(b'theme', b'dark', db=config_db)
    # All three writes commit atomically together

# Read from a specific database
with env.begin() as txn:
    session = json.loads(txn.get(b'session-abc123', db=sessions_db))
    theme = txn.get(b'theme', db=config_db)
    print(f'Session user: {session["user"]}, theme: {theme.decode()}')

env.close()

The max_dbs parameter tells LMDB how many named databases to pre-allocate slots for. It defaults to 2 (the main unnamed DB plus one named one), so you need to bump it up if you’re using more than one named database.

Performance: What LMDB Is Good At

After running some quick benchmarks, here’s what I found on a modest Linux laptop with an NVMe SSD:

Sequential writes: ~500,000 ops/sec for small key-value pairs. LMDB batches writes efficiently and the COW strategy means it’s mostly sequential I/O on disk.
Random reads: ~2-3 million ops/sec when the working set fits in RAM. The memory-mapped design means reads don’t even enter the kernel in the hot path — it’s literally a pointer dereference.
Cold reads (not in page cache): ~10,000-30,000 ops/sec. Still fast, but you’re bound by your storage’s random read IOPS.

For comparison, SQLite with WAL mode on the same hardware does about 50,000-80,000 writes/sec and 200,000-500,000 reads/sec for similar workloads. LMDB is roughly an order of magnitude faster on reads and 5-6x faster on writes for simple key-value patterns.

But raw numbers aren’t the whole story. LMDB’s real strength is predictability. There’s no background compaction thread, no WAL checkpoint that spikes latency, no query planner that sometimes picks a terrible plan. Every read takes roughly the same amount of time. For latency-sensitive applications — games, trading systems, embedded devices — that’s worth more than peak throughput.

When to Use LMDB (And When Not To)

Use LMDB when:

You need a local key-value store that’s faster than SQLite
You’re building a configuration store, feature flag system, or cache layer
Your read workload is heavy and your write workload is bursty
You need ACID guarantees without running a database server
You’re embedding a database in a library or CLI tool (zero configuration)
You need sub-millisecond read latency with no variance

Skip LMDB when:

You need SQL queries, joins, or aggregations — use SQLite or DuckDB
Your data is larger than available address space (~128 TB on 64-bit Linux is the theoretical limit, but map_size is set upfront)
You have sustained high write concurrency — the single-writer model becomes a bottleneck
You need replication or network access — LMDB is strictly local
Your data is ephemeral — use Redis or a plain dict for stuff that doesn’t need to survive restarts

One thing I learned the hard way: LMDB databases do not shrink. When you delete data, the pages are marked as free and reused for future writes, but the file on disk never gets smaller. If you need to reclaim disk space after a bulk deletion, you need to copy the database to a new file using the mdb_copy command-line tool or env.copy() in Python. Think of it like PostgreSQL’s VACUUM FULL — something you do occasionally, not after every delete.

A Real-World Pattern: LMDB as a Caching Layer

Here’s a practical example that ties everything together — a read-through cache for API responses that survives process restarts:

import lmdb, json, time, hashlib

class LMDBCache:
    def __init__(self, path, map_size=100*1024*1024, ttl=3600):
        self.env = lmdb.open(path, map_size=map_size, max_dbs=2)
        self.cache_db = self.env.open_db(b'cache')
        self.meta_db = self.env.open_db(b'meta')
        self.ttl = ttl

    def get(self, key):
        cache_key = hashlib.sha256(key.encode()).hexdigest().encode()
        with self.env.begin() as txn:
            meta = txn.get(cache_key, db=self.meta_db)
            if meta is None:
                return None
            stored_at = json.loads(meta)['stored_at']
            if time.time() - stored_at > self.ttl:
                return None  # Expired
            value = txn.get(cache_key, db=self.cache_db)
            return json.loads(value) if value else None

    def set(self, key, value):
        cache_key = hashlib.sha256(key.encode()).hexdigest().encode()
        with self.env.begin(write=True) as txn:
            txn.put(cache_key, json.dumps(value).encode(), db=self.cache_db)
            txn.put(cache_key, json.dumps({'stored_at': time.time()}).encode(), db=self.meta_db)

    def close(self):
        self.env.close()

# Usage
cache = LMDBCache('/tmp/api-cache')
cache.set('users/42', {'name': 'Felix', 'repos': 23})
result = cache.get('users/42')
print(result)  # {'name': 'Felix', 'repos': 23}

# After the process restarts, cache data is still there
# Expired entries are just skipped on read — no cleanup needed
# LMDB reuses the stale pages when new writes come in
cache.close()

This pattern is surprisingly useful. You get a persistent cache with sub-microsecond reads, automatic TTL handling, and zero runtime dependencies beyond the lmdb package. No Redis server to manage, no cache warming scripts to write, no wondering if your cache survived the last deploy. If you’re building tools in Python — like an AI agent that runs locally or an MCP server — LMDB gives you a persistence layer that keeps up with your logic.

LMDB and the Bigger Picture

LMDB hitting 1.0 matters because it signals something about the direction databases are moving. As storage gets faster — NVMe drives pushing millions of IOPS, persistent memory blurring the line between RAM and disk — the overhead of traditional database architectures (buffer pools, WAL writers, query planners) starts to dominate. LMDB’s approach — just map the data and let the OS handle the rest — gets more compelling as hardware gets faster.

OpenLDAP has used LMDB as its backend since 2013. The Monero cryptocurrency uses it for its blockchain. Countless C and Rust projects embed it. And now with a stable 1.0 release and mature Python bindings, there’s no reason not to add it to your toolkit if you build applications that need fast, reliable local storage.

It won’t replace SQLite in your projects — different tools for different jobs. Just like running AI models locally with Ollama won’t replace cloud APIs for everything, but knowing when to use each one is what separates a senior engineer from someone who just throws more servers at the problem. But the next time you find yourself reaching for a JSON file or a pickle to store application state, give LMDB a try. Once you’ve tasted zero-copy reads and crash-proof writes, going back to json.dump() and json.load() feels like using a shopping cart with one broken wheel.

Sometimes the right tool really is the one that does less, better.