Skip to content

Indexing

While Bundlebase can query any attached data, the base formats are not always the most efficient to query.

Creating indexes on columns you frequently filter on will allow for faster query execution.

BTree Indexes

BTree indexes accelerate exact-match lookups and range queries. Use create_index() with index_type="btree".

await bundle.create_index("email", index_type="btree")
bundle.create_index("email", index_type="btree")
CREATE BTREE INDEX ON email

Note

Until you commit, the index will not be used when the bundle is reopened.

Text Indexes

Text indexes enable full-text search with BM25 ranking. Use create_index() with index_type="text" and one or more columns. If no name is provided, indexes are auto-named as idx_{columns}.

# Single-column text index (auto-named "idx_description")
await bundle.create_index("description", "text")

# Multi-column text index (auto-named "idx_title_description")
await bundle.create_index(["title", "description"], "text")

# With explicit name and tokenizer
await bundle.create_index("content", "text", name="content_search", args={"tokenizer": "en_stem"})
bundle.create_index("description", "text")
bundle.create_index(["title", "description"], "text")
bundle.create_index("content", "text", name="content_search", args={"tokenizer": "en_stem"})

For more details on querying and tokenizers, see Text Search.

Drop Index

Remove an index from a column.

await bundle.drop_index("email")
bundle.drop_index("email")
DROP INDEX email

Rebuild Index

Rebuild an existing index on a column. Useful if the index has become stale or corrupted.

await bundle.rebuild_index("email")
bundle.rebuild_index("email")
REBUILD INDEX ON email

Automatic Reindexing

Indexes stay in sync automatically. After every ATTACH, REPLACE, or FETCH, Bundlebase folds an index-rebuild for the new/replaced blocks into the same change, so queries see freshly-attached data through the index immediately. No manual REINDEX step is needed.

To opt out (for example, when bulk-loading many files and you'd rather rebuild once at the end) pass NO INDEX:

ATTACH 'jan.parquet' NO INDEX;
ATTACH 'feb.parquet' NO INDEX;
FETCH base ADD NO INDEX;
REINDEX;   -- run once when you're ready

Old index files are kept on disk and stay referenced by older commits, so opening the bundle pinned to a previous version still finds the matching index data.

Reindex

Create index files for any blocks that are missing them. This checks existing indexes and avoids redundant work, so you typically only need to call it explicitly after using NO INDEX.

await bundle.reindex()
bundle.reindex()
REINDEX