Configuration¶
Bundlebase uses configuration to control how it connects to cloud storage providers, remote servers, and external services.
Priority Order¶
Configuration values are resolved with the following priority (highest wins):
| Source | Description | Priority |
|---|---|---|
| Runtime config | Set via SET CONFIG or set_config() during the session |
Highest |
| Passed config | Dict passed to create()/open() |
High |
| Environment variables | BB_* env vars set at runtime |
Medium |
| Stored config | Persisted in the bundle via SAVE CONFIG |
Lowest |
This means a value set via SET CONFIG during a session always overrides the same key from any other source.
Configuration Methods¶
Runtime Config — SET CONFIG (Highest Priority)¶
Use SET CONFIG or set_config() to set a configuration value for the current session only. This is the highest-priority config source. The value is not persisted and is lost when the session ends.
Passed Config (High Priority)¶
Pass a config dict to create() or open(). These values take effect for the current session only and are not persisted.
import bundlebase as bb
# Scoped to specific provider
bundle = await bb.create("my/data", config={
"s3": {
"region": "us-west-2",
"access_key_id": "AKIA...",
"secret_access_key": "secret...",
}
})
# Multiple scopes with override
bundle = await bb.create("my/data", config={
"s3": {"region": "us-west-2"}, # default for S3
"s3/prod-bucket": { # override for specific bucket
"endpoint": "http://localhost:9000",
}
})
Environment Variables (Medium Priority)¶
Set BB_* environment variables. These apply to all bundles in the process.
# Scoped (applies to a specific config scope)
export BB_S3_REGION=us-east-1
export BB_S3__MY_BUCKET__ENDPOINT=http://localhost:9000
Stored Config (Lowest Priority)¶
Use SAVE CONFIG to persist configuration in the bundle manifest. These values are saved when you commit and apply every time the bundle is opened.
Scope Format¶
Scopes are /-separated paths that identify which config values apply to which storage locations. Each scope matches its own name and any child paths.
| Scope | Meaning |
|---|---|
system |
Bundlebase-level settings (e.g., max_memory, catalog_name) |
s3 |
Matches all S3 URLs |
s3/my-bucket |
Matches s3://my-bucket and anything under it |
s3/my-bucket/subfolder |
Matches s3://my-bucket/subfolder and below |
| ## Config Key Patterns |
All config sources support scoping keys to specific scopes. The syntax varies by source:
| Pattern | Runtime Config | Passed Config | Environment Variable | Stored Config |
|---|---|---|---|---|
| Provider default | SET CONFIG key = 'val' FOR 's3' |
{"s3": {"key": "val"}} |
BB_S3_KEY=val |
SAVE CONFIG key = 'val' FOR 's3' |
| URL-scoped | SET CONFIG key = 'val' FOR 's3/bucket' |
{"s3/bucket": {"key": "val"}} |
BB_S3__BUCKET__KEY=val |
SAVE CONFIG key = 'val' FOR 's3/bucket' |
Environment Variable Scoping¶
Environment variables use a single underscore (_) to separate the scope name from the config key. Double-underscore (__) is reserved for encoding sub-path separators within the scope.
After stripping the BB_ prefix:
- No
__: the first_separates scope name from key (e.g.,BB_S3_REGION→ scopes3, keyregion) - With
__: split on__— first part = scope name, last part = key, middle parts = scope sub-path segments (e.g.,BB_S3__MY_BUCKET__KEY→ scopes3/my_bucket, keykey)
| Environment Variable | Scope | Key |
|---|---|---|
BB_S3_REGION=us-west-2 |
s3 |
region |
BB_S3_ACCESS_KEY_ID=AKIA... |
s3 |
access_key_id |
BB_S3__MY_BUCKET__KEY=val |
s3/my_bucket |
key |
Scope Resolution¶
When a URL is accessed, config is resolved using longest-prefix matching:
- Start with the provider's default scope (e.g.,
s3) - Apply the longest matching scope-prefix override
For example, if you have config for both s3 and s3/my-bucket/subfolder, a request for s3://my-bucket/subfolder/data.csv uses the more specific config.
Provider-Specific Keys¶
Each storage provider accepts specific configuration keys. Global defaults are not validated, but keys scoped to a URL prefix are checked against the provider's allowed keys.
S3 (s3://)¶
| Key | Description |
|---|---|
region |
AWS region (e.g., us-west-2) |
access_key_id |
AWS access key ID |
secret_access_key |
AWS secret access key |
session_token |
AWS session token (temporary credentials) |
endpoint |
Custom endpoint URL (for S3-compatible services like MinIO) |
bucket |
Bucket name |
allow_http |
Allow HTTP (non-HTTPS) connections (true/false) |
skip_signature |
Skip request signing (true/false) |
virtual_hosted_style_request |
Use virtual hosted-style requests (true/false) |
token |
Authentication token |
imdsv1_fallback |
Allow IMDSv1 fallback (true/false) |
metadata_endpoint |
Custom metadata endpoint |
container_credentials_relative_uri |
ECS container credentials URI |
unsigned_payload |
Send unsigned payloads (true/false) |
checksum_algorithm |
Checksum algorithm to use |
copy_if_not_exists |
Copy-if-not-exists behavior |
conditional_put |
Conditional put behavior |
Google Cloud Storage (gs://)¶
| Key | Description |
|---|---|
service_account_key |
JSON service account key (inline) |
service_account_path |
Path to service account key file |
bucket |
Bucket name |
application_credentials |
Application default credentials path |
Azure Blob Storage (azure://)¶
| Key | Description |
|---|---|
account |
Storage account name |
access_key |
Storage account access key |
container |
Container name |
sas_token |
Shared access signature token |
bearer_token |
Bearer token |
client_id |
Service principal client ID |
client_secret |
Service principal client secret |
tenant_id |
Azure AD tenant ID |
authority_host |
Azure AD authority host |
use_emulator |
Use Azurite emulator (true/false) |
FTP (ftp://)¶
| Key | Description |
|---|---|
username |
FTP username |
password |
FTP password |
SFTP (sftp://)¶
| Key | Description |
|---|---|
key_path |
Path to SSH private key file |
Kaggle (kaggle://)¶
| Key | Description |
|---|---|
base_url |
Kaggle API base URL (default: https://www.kaggle.com) |
username |
Kaggle username |
key |
Kaggle API key |
System (system)¶
| Key | Description |
|---|---|
max_memory |
Maximum memory for query execution |
catalog_name |
Name of the DataFusion catalog |