Python API Reference¶
Bundlebase provides both async and sync Python APIs for maximum flexibility.
API Styles¶
- Async API - Modern async/await interface for concurrent operations
- Sync API - Synchronous interface for scripts and notebooks
- Conversion - Export to pandas, polars, numpy, dict
- Progress Tracking - Monitor long-running operations
- Operation Chains - Fluent method chaining
Quick Navigation¶
Creating Bundles¶
Core Classes¶
PyBundle- Read-only bundlePyBundleBuilder- Mutable bundlePyBundleStatus- Status informationPyChange- Change tracking
Utilities¶
stream_batches()- Stream data efficientlyset_rust_log_level()- Configure Rust logging
Choosing an API Style¶
Use the Async API when:¶
- Building production applications
- Running concurrent operations
- Working with other async libraries
- Need fine-grained control over async execution
import bundlebase
c = await (bundlebase.create()
.attach("data.parquet")
.filter("active = true"))
df = await c.to_pandas()
Use the Sync API when:¶
- Writing simple scripts
- Working in Jupyter notebooks
- Prefer synchronous code style
- Don't need concurrent operations
import bundlebase.sync as dc
c = (dc.create()
.attach("data.parquet")
.filter("active = true"))
df = c.to_pandas()
Common Patterns¶
Method Chaining¶
All mutation methods return self for fluent chaining:
# Async
c = await (bundlebase.create()
.attach("data.parquet")
.filter("age >= 18")
.drop_column("ssn")
.rename_column("fname", "first_name"))
# Sync
c = (dc.create()
.attach("data.parquet")
.filter("age >= 18")
.drop_column("ssn")
.rename_column("fname", "first_name"))
Error Handling¶
Handle errors with standard Python try/except:
try:
c = await bundlebase.create()
c = await c.attach("nonexistent.parquet")
except ValueError as e:
print(f"Failed to load: {e}")
Streaming Large Datasets¶
Use streaming for datasets larger than RAM:
import bundlebase
c = await bundlebase.open("huge_dataset.parquet")
async for batch in bundlebase.stream_batches(c):
# Process batch (~100MB)
process(batch)
Type Hints¶
Bundlebase includes comprehensive type hints for IDE support:
from bundlebase import PyBundle, PyBundleBuilder
import pandas as pd
async def process_data(path: str) -> pd.DataFrame:
"""Type-checked function using bundlebase."""
c: PyBundleBuilder = await bundlebase.create()
c = await c.attach(path)
c = await c.filter("active = true")
df: pd.DataFrame = await c.to_pandas()
return df
Next Steps¶
- Async API Reference - Complete async API documentation
- Sync API Reference - Complete sync API documentation
- Examples - Practical code examples
- Guides - Deep dives into advanced topics