Skip to content

Attaching Data

Data is added to the bundle via the .attach() method.

Basic Usage

import bundlebase as bb

bundle = await bb.create("my/data")
await bundle.attach("customers.csv")
import bundlebase.sync as bb

bundle = bb.create("my/data")
bundle.attach("customers.csv")
ATTACH 'customers.csv'

Attaching to a Join Pack

By default, data attaches to the base pack. Use the pack parameter to attach data to a joined pack instead.

await bundle.join("orders", on="customer_id = orders.id")
await bundle.attach("orders.parquet", pack="orders")
bundle.join("orders", on="customer_id = orders.id")
bundle.attach("orders.parquet", pack="orders")
ATTACH 'orders.parquet' TO orders

Path Resolution

The attach() method handles paths flexibly:

  • Paths can be any supported URL and the data will be read from there.
  • Paths can be relative to the data_dir. But NOT .. to a parent dir.

Attaching From Another Bundle

You can attach the query output of another committed bundle using a bundle:// URL. This reads the target bundle's full query output — including any filters, column operations, and joins that have been applied.

For filesystem bundles, use bundle:// followed by the path:

ATTACH 'bundle:///path/to/other/bundle'
await bundle.attach("bundle:///path/to/other/bundle")
bundle.attach("bundle:///path/to/other/bundle")

For remote bundles (S3, etc.), use the compound scheme bundle+<scheme>://:

ATTACH 'bundle+s3://bucket/path/to/bundle'
await bundle.attach("bundle+s3://bucket/path/to/bundle")
bundle.attach("bundle+s3://bucket/path/to/bundle")

Note

The target bundle must be committed. The attached data reflects the target's full query output at read time — including any filters, column operations, and joins that have been applied.

Supported Formats

  • CSV (.csv)
  • TSV (.tsv) — tab-separated values
  • JSON Lines (.json, .jsonl)
  • Parquet (.parquet)

Note

Only JSON Lines format (one JSON object per line) can be directly attached. For arbitrary JSON files — including API responses with wrapper objects, nested structures, or JSON arrays — use a connector (CREATE SOURCE USING http, remote_dir, etc.) with json_record_path. The connector transforms and copies the data into the bundle as Parquet. See Sources: JSON Options.

Column Types

CSV and TSV files are imported with all columns as text (Utf8). Because these are text-based formats, type inference from sampled rows is unreliable — a column that looks numeric in the first 100 rows might contain non-numeric values later. By defaulting to text, bundlebase avoids silent data corruption.

JSON files retain their native types (string, number, boolean) since the JSON format encodes types directly in the data.

Parquet files retain their native types since the schema is embedded in the file.

To convert text columns to specific types after attaching CSV data, use cast_column:

await bundle.attach("sales.csv")
await bundle.cast_column("revenue", "float")
await bundle.cast_column("quantity", "integer")
bundle.attach("sales.csv")
bundle.cast_column("revenue", "float")
bundle.cast_column("quantity", "integer")
ATTACH 'sales.csv'
CAST COLUMN revenue TO float
CAST COLUMN quantity TO integer

See Cast Column for more details.

Detaching Data

Remove a previously attached block by its location with detach_block().

await bundle.detach_block("customers.csv")
bundle.detach_block("customers.csv")
DETACH 'customers.csv'

Replacing Data

Swap where a block's data is read from without changing the block's identity with replace_block().

await bundle.replace_block("old_data.csv", "new_data.csv")
bundle.replace_block("old_data.csv", "new_data.csv")
REPLACE 'old_data.csv' WITH 'new_data.csv'