Bundlebase 0.7.0¶
This release adds custom connectors, user-defined functions, and column operations. The big theme: you can now extend Bundlebase with your own code.
New Features¶
Custom Connectors¶
You can now write your own data connectors in Python, Go, Java, Rust, or anything that runs in Docker. Connectors define how to connect to a data source — you import a connector, create source instances from it, then fetch data.
# Import a connector (persisted into the bundle)
bundle.import_connector('acme.weather', 'ipc::./my_connector')
# Create a source instance from it
bundle.create_source('acme.weather', {'region': 'us-east'})
# Fetch the data
bundle.fetch("base", "add")
Multiple runtimes are available depending on your needs:
python— in-process, zero-copy Arrow transfer (useimport_temp_connectorsince Python code can't be serialized into the bundle)ffi— load a compiled shared library, also zero-copyipc— run any executable as a subprocessjava— run a JAR filedocker— run a container image
SDKs for Python, Go, Java, and Rust handle the IPC protocol for you. Here's a complete Python connector:
from bundlebase_sdk import Connector, Location, serve
import pyarrow as pa
class WeatherConnector(Connector):
def discover(self, attached_locations, **kwargs):
return [Location("forecast.parquet", format="parquet", version="v1")]
def data(self, location, **kwargs):
return pa.table({"city": ["NYC", "LA"], "temp_f": [45, 72]})
if __name__ == "__main__":
serve(WeatherConnector())
To clarify terminology: connectors are the new concept here — they define how to connect to data. Sources are instances created from connectors (or from built-in connectors like remote_dir and kaggle, which have always been there). This isn't a rename.
See the custom connectors guide for the full details.
User-Defined Functions¶
Extend Bundlebase's SQL with your own scalar and aggregate functions. Same runtime options as connectors — python, ffi, ipc, java, docker.
# Import and use a function — types auto-detected from bundlebase_metadata()
bundle.import_temp_function("tools.double_val", "ipc::python:my_functions.py")
bundle.query("SELECT tools.double_val(amount) FROM bundle")
-- Import all functions from a module, types auto-detected
IMPORT TEMP FUNCTION tools.* FROM 'ipc::python:my_functions.py'
Aggregate UDFs work too — implement create_state, accumulate, merge, and evaluate, and you can use them with GROUP BY like any built-in aggregate.
See the functions guide.
bundlebase init-sdk¶
Writing the IPC boilerplate for a new connector or function is tedious, so there's now a scaffolding command. It generates a working project with example code, build files, and a README:
That gives you:
The generated connector is runnable out of the box — pip install -e . and python connector.py gives you a working IPC server you can point import_connector at.
Supported languages: Python, Go, Java, and Rust. Each generates idiomatic build files (pyproject.toml, go.mod, pom.xml, Cargo.toml).
The --type flag controls what gets scaffolded:
--type connector— connector only--type function— function provider with example scalar and aggregate UDFs--type both— connector + functions in one project
# Rust function provider
bundlebase init-sdk rust my_functions --type function
# Go project with both
bundlebase init-sdk go my_project --type both
Managing Functions and Connectors¶
Imported functions and connectors can be renamed without re-importing them:
bundle.rename_function("acme.double_val", "acme.double_val_v2")
bundle.rename_connector("acme.weather", "acme.weather_v2")
Or via SQL:
RENAME FUNCTION acme.double_val TO acme.double_val_v2
RENAME CONNECTOR acme.weather TO acme.weather_v2
Works for both committed and temporary functions/connectors.
You can remove connectors and sources you no longer need:
Also available as bundle.drop_connector() in Python.
External Code Security¶
Running custom connectors and functions means executing external code, so there's a new allow_external_code config setting that defaults to false. You need to opt in:
Column Operations¶
Three new operations for wrangling messy columns:
-
standardize_column_names()— normalizes column names to lowercase, underscore-separated identifiers."Customer Id"becomescustomer_id,"Phone 1"becomesphone_1. -
add_column(name, expr)— creates a computed column from a SQL expression. Even a SQL expression using your custom functions. -
cast_column(name, type)— changes a column's data type, with optional regex cleaning to strip junk before conversion. Type names are now case-insensitive, and common aliases work (intforInt32, etc.).
Give the connector and function system a try — I'm curious how people end up using it. Let me know if you run into issues.