Persistent Rules¶
always_delete and always_update encode cleanup rules into the bundle itself. Once defined, they fire automatically on every future attach or fetch — whoever runs the pipeline, whatever tool they use.
always_delete¶
Remove rows that should never be in the bundle, regardless of what the source sends:
Now every future attach cleans automatically — no need to remember to add the filter:
always_update¶
Normalize or transform column values on every incoming record:
Combining delete and update rules¶
Rules are applied in definition order — deletes first, then updates:
import bundlebase.sync as bb
bundle = bb.create("s3://analytics/events")
# Remove records that should never be ingested
bundle.always_delete("WHERE event_type = 'internal_test'")
bundle.always_delete("WHERE user_id IS NULL")
# Normalize what remains
bundle.always_update("SET event_type = LOWER(event_type)")
bundle.always_update("SET region = UPPER(region)")
bundle.always_update("SET currency = UPPER(currency)")
bundle.commit("Initial setup with cleanup rules")
Viewing defined rules¶
Why this matters¶
Without persistent rules, every engineer who runs the pipeline has to remember to apply the same filters. When someone forgets — or writes a new pipeline from scratch — dirty data gets in. Persistent rules make the cleanup part of the bundle definition, not the pipeline script.