LLM Setup
GenDB uses an LLM to analyze your database schema and produce a generation plan — a mapping of each column to an appropriate data generator. This page covers how to configure different LLM providers.
How GenDB Uses LLMs
The LLM generates all data values directly. GenDB sends your schema (table names, column names, data types, constraints, foreign key relationships) to the LLM, which returns realistic, semantically coherent data as JSON. Data is generated in batches per table (up to 50 rows per LLM call), with larger tables chunked automatically.
Ollama (Local)
Ollama is the default provider and runs entirely on your machine.
Setup
-
Install Ollama: ollama.com
-
Pull a model:
-
GenDB uses Ollama by default — no configuration changes needed:
Tip
Qwen 2.5 7B is the recommended default — it produces reliable structured JSON output and runs well on machines with 8GB+ RAM. For constrained hardware (4GB RAM), use llama3.2 (3B parameters) instead, but expect lower JSON reliability.
OpenAI
Setup
-
Get an API key from platform.openai.com
-
Configure in
gendb.yaml:
Tip
gpt-4o-mini is recommended for a good balance of quality and cost. Schema analysis is a single API call per generate_data, so costs are minimal.
Custom / Self-Hosted
Any OpenAI-compatible API endpoint works — including vLLM, text-generation-inference, DeepSeek, and other providers.
llm:
provider: custom
model: deepseek-coder
base_url: http://localhost:8000/v1
api_key: "" # if required by your endpoint
Column Instructions
You can provide per-column instructions in gendb.yaml to guide the LLM:
generation:
tables:
users:
columns:
bio:
prompt: "Write a short professional bio"
status:
generator: one_of
values: [active, inactive, pending]
These instructions are included in the LLM prompt alongside the schema context.
Cost Profile
| Operation | LLM calls | Typical cost (OpenAI) |
|---|---|---|
| Data generation | 1 per 50 rows per table | ~$0.01-0.05 per table |
With Ollama, all operations are free and local.