Skip to content

Python API Reference

Bundlebase provides both async and sync Python APIs for maximum flexibility.

API Styles

Quick Navigation

Creating Bundles

Core Classes

Utilities

Choosing an API Style

Use the Async API when:

  • Building production applications
  • Running concurrent operations
  • Working with other async libraries
  • Need fine-grained control over async execution
import bundlebase

c = await (bundlebase.create()
    .attach("data.parquet")
    .filter("active = true"))

df = await c.to_pandas()

Use the Sync API when:

  • Writing simple scripts
  • Working in Jupyter notebooks
  • Prefer synchronous code style
  • Don't need concurrent operations
import bundlebase.sync as dc

c = (dc.create()
    .attach("data.parquet")
    .filter("active = true"))

df = c.to_pandas()

Common Patterns

Method Chaining

All mutation methods return self for fluent chaining:

# Async
c = await (bundlebase.create()
    .attach("data.parquet")
    .filter("age >= 18")
    .drop_column("ssn")
    .rename_column("fname", "first_name"))

# Sync
c = (dc.create()
    .attach("data.parquet")
    .filter("age >= 18")
    .drop_column("ssn")
    .rename_column("fname", "first_name"))

Error Handling

Handle errors with standard Python try/except:

try:
    c = await bundlebase.create()
    c = await c.attach("nonexistent.parquet")
except ValueError as e:
    print(f"Failed to load: {e}")

Streaming Large Datasets

Use streaming for datasets larger than RAM:

import bundlebase

c = await bundlebase.open("huge_dataset.parquet")

async for batch in bundlebase.stream_batches(c):
    # Process batch (~100MB)
    process(batch)

Type Hints

Bundlebase includes comprehensive type hints for IDE support:

from bundlebase import PyBundle, PyBundleBuilder
import pandas as pd

async def process_data(path: str) -> pd.DataFrame:
    """Type-checked function using bundlebase."""
    c: PyBundleBuilder = await bundlebase.create()
    c = await c.attach(path)
    c = await c.filter("active = true")
    df: pd.DataFrame = await c.to_pandas()
    return df

Next Steps