Bundlebase 0.7.0¶

This release adds custom connectors, user-defined functions, and column operations. The big theme: you can now extend Bundlebase with your own code.

New Features¶

Custom Connectors¶

You can now write your own data connectors in Python, Go, Java, Rust, or anything that runs in Docker. Connectors define how to connect to a data source — you import a connector, create source instances from it, then fetch data.

# Import a connector (persisted into the bundle)
bundle.import_connector('acme.weather', 'ipc::./my_connector')

# Create a source instance from it
bundle.create_source('acme.weather', {'region': 'us-east'})

# Fetch the data
bundle.fetch("base", "add")

Multiple runtimes are available depending on your needs:

python — in-process, zero-copy Arrow transfer (use import_temp_connector since Python code can't be serialized into the bundle)
ffi — load a compiled shared library, also zero-copy
ipc — run any executable as a subprocess
java — run a JAR file
docker — run a container image

SDKs for Python, Go, Java, and Rust handle the IPC protocol for you. Here's a complete Python connector:

from bundlebase_sdk import Connector, Location, serve
import pyarrow as pa

class WeatherConnector(Connector):
    def discover(self, attached_locations, **kwargs):
        return [Location("forecast.parquet", format="parquet", version="v1")]

    def data(self, location, **kwargs):
        return pa.table({"city": ["NYC", "LA"], "temp_f": [45, 72]})

if __name__ == "__main__":
    serve(WeatherConnector())

To clarify terminology: connectors are the new concept here — they define how to connect to data. Sources are instances created from connectors (or from built-in connectors like remote_dir and kaggle, which have always been there). This isn't a rename.

See the custom connectors guide for the full details.

User-Defined Functions¶

Extend Bundlebase's SQL with your own scalar and aggregate functions. Same runtime options as connectors — python, ffi, ipc, java, docker.

# Import and use a function — types auto-detected from bundlebase_metadata()
bundle.import_temp_function("tools.double_val", "ipc::python:my_functions.py")
bundle.query("SELECT tools.double_val(amount) FROM bundle")

-- Import all functions from a module, types auto-detected
IMPORT TEMP FUNCTION tools.* FROM 'ipc::python:my_functions.py'

Aggregate UDFs work too — implement create_state, accumulate, merge, and evaluate, and you can use them with GROUP BY like any built-in aggregate.

See the functions guide.

`bundlebase init-sdk`¶

Writing the IPC boilerplate for a new connector or function is tedious, so there's now a scaffolding command. It generates a working project with example code, build files, and a README:

bundlebase init-sdk python my_weather --type connector

That gives you:

my_weather/
├── pyproject.toml
├── connector.py      # Working example connector
└── README.md

The generated connector is runnable out of the box — pip install -e . and python connector.py gives you a working IPC server you can point import_connector at.

Supported languages: Python, Go, Java, and Rust. Each generates idiomatic build files (pyproject.toml, go.mod, pom.xml, Cargo.toml).

The --type flag controls what gets scaffolded:

--type connector — connector only
--type function — function provider with example scalar and aggregate UDFs
--type both — connector + functions in one project

# Rust function provider
bundlebase init-sdk rust my_functions --type function

# Go project with both
bundlebase init-sdk go my_project --type both

Managing Functions and Connectors¶

Imported functions and connectors can be renamed without re-importing them:

bundle.rename_function("acme.double_val", "acme.double_val_v2")
bundle.rename_connector("acme.weather", "acme.weather_v2")

Or via SQL:

RENAME FUNCTION acme.double_val TO acme.double_val_v2
RENAME CONNECTOR acme.weather TO acme.weather_v2

Works for both committed and temporary functions/connectors.

You can remove connectors and sources you no longer need:

DROP CONNECTOR acme.weather
DROP SOURCE acme.weather

Also available as bundle.drop_connector() in Python.

External Code Security¶

Running custom connectors and functions means executing external code, so there's a new allow_external_code config setting that defaults to false. You need to opt in:

config = {"system": {"allow_external_code": "true"}}
bundle = bb.create("my/data", config=config)

Column Operations¶

Three new operations for wrangling messy columns:

standardize_column_names() — normalizes column names to lowercase, underscore-separated identifiers. "Customer Id" becomes customer_id, "Phone 1" becomes phone_1.
```
bundle.standardize_column_names()
```
add_column(name, expr) — creates a computed column from a SQL expression. Even a SQL expression using your custom functions.
```
bundle.add_column("full_name", "first_name || ' ' || last_name")
```
cast_column(name, type) — changes a column's data type, with optional regex cleaning to strip junk before conversion. Type names are now case-insensitive, and common aliases work (int for Int32, etc.).
```
bundle.cast_column("price", "integer", clean="[^0-9]")
```

pip install bundlebase==0.7.0

Give the connector and function system a try — I'm curious how people end up using it. Let me know if you run into issues.