Python API Reference¶

Bundlebase provides both async and sync Python APIs for maximum flexibility.

API Styles¶

Async API - Modern async/await interface for concurrent operations
Sync API - Synchronous interface for scripts and notebooks
Conversion - Export to pandas, polars, numpy, dict
Progress Tracking - Monitor long-running operations
Operation Chains - Fluent method chaining

Creating Bundles¶

create() - Create a new bundle
open() - Open an existing bundle

Core Classes¶

PyBundle - Read-only bundle
PyBundleBuilder - Mutable bundle
PyBundleStatus - Status information
PyChange - Change tracking

Utilities¶

stream_batches() - Stream data efficiently
set_rust_log_level() - Configure Rust logging

Choosing an API Style¶

Use the Async API when:¶

Building production applications
Running concurrent operations
Working with other async libraries
Need fine-grained control over async execution

import bundlebase

c = await (bundlebase.create()
    .attach("data.parquet")
    .filter("active = true"))

df = await c.to_pandas()

Use the Sync API when:¶

Writing simple scripts
Working in Jupyter notebooks
Prefer synchronous code style
Don't need concurrent operations

import bundlebase.sync as dc

c = (dc.create()
    .attach("data.parquet")
    .filter("active = true"))

df = c.to_pandas()

Common Patterns¶

Method Chaining¶

All mutation methods return self for fluent chaining:

# Async
c = await (bundlebase.create()
    .attach("data.parquet")
    .filter("age >= 18")
    .drop_column("ssn")
    .rename_column("fname", "first_name"))

# Sync
c = (dc.create()
    .attach("data.parquet")
    .filter("age >= 18")
    .drop_column("ssn")
    .rename_column("fname", "first_name"))

Error Handling¶

Handle errors with standard Python try/except:

try:
    c = await bundlebase.create()
    c = await c.attach("nonexistent.parquet")
except ValueError as e:
    print(f"Failed to load: {e}")

Streaming Large Datasets¶

Use streaming for datasets larger than RAM:

import bundlebase

c = await bundlebase.open("huge_dataset.parquet")

async for batch in bundlebase.stream_batches(c):
    # Process batch (~100MB)
    process(batch)

Type Hints¶

Bundlebase includes comprehensive type hints for IDE support:

from bundlebase import PyBundle, PyBundleBuilder
import pandas as pd

async def process_data(path: str) -> pd.DataFrame:
    """Type-checked function using bundlebase."""
    c: PyBundleBuilder = await bundlebase.create()
    c = await c.attach(path)
    c = await c.filter("active = true")
    df: pd.DataFrame = await c.to_pandas()
    return df

Next Steps¶

Async API Reference - Complete async API documentation
Sync API Reference - Complete sync API documentation
Examples - Practical code examples
Guides - Deep dives into advanced topics