Text Search¶
Bundlebase includes full-text search powered by Tantivy with BM25 relevance ranking. Text indexes can span one or more columns, and search results include a _score column for ranking.
Creating a Text Index¶
Use create_index() with one or more columns and index_type="text". You can optionally provide a name for the index (used when querying). If no name is provided, indexes are auto-named as idx_{columns joined by _}.
# Single-column text index (auto-named "idx_description")
await bundle.create_index("description", "text")
# Multi-column text index (auto-named "idx_title_description")
await bundle.create_index(["title", "description"], "text")
# With an explicit name
await bundle.create_index(["title", "description"], "text", name="product_search")
# With a custom tokenizer
await bundle.create_index("content", "text", args={"tokenizer": "en_stem"})
await bundle.commit("Created text indexes")
Note
Until you commit, the index will not be used when the bundle is reopened.
Multi-Column Indexes¶
A multi-column index stores all specified columns in a single search index. This lets you search across columns in a single query using field-specific syntax, and get a unified relevance score.
Querying¶
Use the search() table function to perform full-text search. It replaces FROM bundle in your SQL. You can call it with one or two arguments:
search('query')— When only one text index exists, you can omit the index name.search('index_name', 'query')— Specify which index to search when multiple exist.
Results include all columns from the bundle plus a _score column with the BM25 relevance score.
-- Basic search (when only one text index exists)
SELECT * FROM search('machine learning')
-- Specify index name
SELECT * FROM search('desc_search', 'machine learning')
-- Order by relevance
SELECT title, description, _score
FROM search('product_search', 'machine learning')
ORDER BY _score DESC
LIMIT 10
-- Additional WHERE filters on top of search results
SELECT * FROM search('product_search', 'machine learning')
WHERE category = 'AI'
Query Syntax¶
The query string uses Tantivy query syntax:
| Syntax | Example | Description |
|---|---|---|
| Terms | machine learning |
Matches rows containing any of the words (OR by default) |
| Required terms | +machine +learning |
Matches rows containing both words |
| Phrases | "machine learning" |
Matches the exact phrase |
| Field-specific | title:learning |
Searches only the specified column |
| Combined fields | +title:machine +description:learning |
Different terms in different columns |
Field-Specific Queries¶
With multi-column indexes, you can target specific columns using the column:term syntax. The column names match the columns specified when creating the index.
-- Search only the title column
SELECT * FROM search('product_search', 'title:learning')
-- Require matches in both title and description
SELECT * FROM search('product_search', '+title:machine +description:learning')
When no field is specified, all columns in the index are searched.
Result Limit¶
By default, search() returns up to 10,000 results when no SQL LIMIT clause is specified. If your query matches more documents than this, results are truncated to the top 10,000 by relevance score. Always use LIMIT for best performance and predictable results:
The Score Column¶
Every search() query returns a _score column (Float64) with the BM25 relevance score. Higher scores indicate better matches. Use ORDER BY _score DESC to rank results by relevance.
SELECT title, _score
FROM search('product_search', 'machine learning')
ORDER BY _score DESC
LIMIT 10
Tokenizers¶
When creating a text index, you can specify a tokenizer to control how text is split and normalized. The default is simple.
| Tokenizer | Aliases | Description |
|---|---|---|
simple |
Splits on whitespace and lowercases. Good general-purpose default. | |
raw |
No tokenization — treats the entire field value as a single token. Use for exact-match fields like IDs or codes. | |
en_stem |
english_stem, english |
English stemming with stop word removal. Matches word variants (e.g., "running" matches "run"). |
de_stem |
german_stem, german |
German stemming with stop word removal. |
fr_stem |
french_stem, french |
French stemming with stop word removal. |
es_stem |
spanish_stem, spanish |
Spanish stemming with stop word removal. |
it_stem |
italian_stem, italian |
Italian stemming with stop word removal. |
pt_stem |
portuguese_stem, portuguese |
Portuguese stemming with stop word removal. |
nl_stem |
dutch_stem, dutch |
Dutch stemming with stop word removal. |
sv_stem |
swedish_stem, swedish |
Swedish stemming with stop word removal. |
no_stem |
norwegian_stem, norwegian |
Norwegian stemming with stop word removal. |
da_stem |
danish_stem, danish |
Danish stemming with stop word removal. |
fi_stem |
finnish_stem, finnish |
Finnish stemming with stop word removal. |
ru_stem |
russian_stem, russian |
Russian stemming with stop word removal. |