Skip to content

Python Quick Start

Choose your API style

Bundlebase has two Python API styles:

  • Sync (bundlebase.sync) — for scripts and Jupyter notebooks. No await needed.
  • Async (bundlebase) — for concurrent operations and production code.
import bundlebase.sync as bb

c = bb.create("s3://mybucket/path")
c.attach("data.parquet")
df = c.to_pandas()
import bundlebase as bb

c = await bb.create("s3://mybucket/path")
await c.attach("data.parquet")
df = await c.to_pandas()

Create a bundle

The path can be a local filepath or a remote URL (S3, Azure, GCS):

import bundlebase.sync as bb

c = bb.create("s3://mybucket/sales-q1")
import bundlebase as bb

c = await bb.create("s3://mybucket/sales-q1")

Attach data

Parquet, CSV, and JSON are all supported. Attaching multiple files unions them together, even across formats. Paths can be relative to the bundle or absolute URLs.

c.attach("local_data.parquet")
c.attach("s3://other_bucket/more_data.csv")
c.attach("https://example.com/additional.json")
await c.attach("local_data.parquet")
await c.attach("s3://other_bucket/more_data.csv")
await c.attach("https://example.com/additional.json")

Note

CSV columns are imported as text. Use cast_column() to convert to integer, float, etc. See Column Types for details.

Transform

c.filter("age >= 18")
c.drop_column("ssn")
c.rename_column("fname", "first_name")
await c.filter("age >= 18")
await c.drop_column("ssn")
await c.rename_column("fname", "first_name")

Commit

c.commit("Initial commit")

# Anyone with the path can open it
c = bb.open("s3://mybucket/sales-q1")
await c.commit("Initial commit")

c = await bb.open("s3://mybucket/sales-q1")

Query with SQL

Full Apache DataFusion SQL syntax:

rs = c.query("SELECT * FROM bundle WHERE revenue > 100")
df = rs.to_polars()
rs = await c.query("SELECT * FROM bundle WHERE revenue > 100")
df = await rs.to_polars()

Export

df = c.to_pandas()
df = c.to_polars()
arrays = c.to_numpy()
df = await c.to_pandas()
df = await c.to_polars()
arrays = await c.to_numpy()

Method chaining

All mutation methods return self:

import bundlebase.sync as bb

c = (bb.create("s3://mybucket/sales-q1")
    .attach("data.parquet")
    .filter("active = true")
    .drop_column("temp_field")
    .commit("Initial commit"))
import bundlebase as bb

c = await (bb.create("s3://mybucket/sales-q1")
    .attach("data.parquet")
    .filter("active = true")
    .drop_column("temp_field")
    .commit("Initial commit"))

Next steps