Delta Lake Format in Automatic Exports

Automatic Exports is in early access. Request access to enable this feature for your site.

Chargebee supports data export in Delta Lake format, which provides reliable, ACID-compliant updates, efficient versioning, and seamless integration with modern data lakes and analytics platforms.

Advantages of Delta Lake Format

While JSON and Parquet are widely used formats, Delta Lake offers several practical advantages for you:

Reliable updates: Unlike Parquet or JSON, Delta Lake supports updates and deletes natively. This ensures that exported data always reflects the latest state without manual intervention.
Schema evolution: When your data structure changes (e.g., new fields), Delta Lake handles this smoothly, triggering a full export to ensure your analytics stay in sync.
Audit and historical views: Delta Lake supports time travel, letting your team query past versions of your data for audit, reporting, or rollback needs.
Easier maintenance: Vacuuming and optimization are managed automatically by Chargebee, eliminating the need for manual tuning.

Note:

Delta Lake format is supported only in environments compatible with the Delta Lake open standard, such as Databricks Unity Catalog, AWS Glue, Hive Metastore, and Apache Spark.

Format Specifications

The following table lists the Delta Lake format specifications:

Parameter	Value
Delta Version	3.3.1
Block Size	128 MB
Log Retention	72 hours (default)
File Retention	72 hours (default)
Compaction	Snappy (default)

Metadata and Change Tracking

Each record in the Delta export includes key metadata fields to support ingestion, auditability, and Change Data Capture (CDC) pipelines:

Field Description

Field	Description
`change_type`	Indicates type of change: read → Full snapshot create → New record update → Updated record delete → Deleted record
`record_extracted_at`	Timestamp when the updated data became available in Chargebee’s data lakehouse.
`record_exported_at`	Timestamp when the data was exported to the configured destination.

change_type

Indicates type of change:

read → Full snapshot
create → New record
update → Updated record
delete → Deleted record

record_extracted_at

Timestamp when the updated data became available in Chargebee’s data lakehouse.

record_exported_at

Timestamp when the data was exported to the configured destination.

When the export configuration is modified (e.g., columns added), a full export is automatically triggered. All records in this batch will have change_type = read to denote a snapshot. The first sync for every table will also be a full export.

Creating Delta Tables from Exported Data

Chargebee writes Delta-formatted data directly to your configured cloud storage. You can register and query the exported data using Delta Lake-compatible tools.

Using AWS Glue / Hive Metastore / Databricks Unity Catalog

CREATE TABLE my_table
USING DELTA
LOCATION 's3://<your-bucket>/<your-subfolder>/<your-domain>/deltalake/data/<dataset name>';

Using Apache Spark without external catalog

SELECT * FROM delta.'s3://<your-bucket>/<your-subfolder>/<your-domain>/deltalake/data/<dataset name>';

Using Spark DataFrame API

df = spark.read.format("delta").load("s3://<your-bucket>/<your-subfolder>/<your-domain>/deltalake/data/<dataset name>")
df.show()

Export File Structure

Exported Delta Lake datasets follow the standard Delta Lake directory structure:

s3://<your-bucket>/<your-subfolder>/<your-domain>/deltalake/data/<dataset name>;

├── _delta_log/
│   ├── 00000000000000000000.json
│   ├── 00000000000000000001.json
├── part-00000-...snappy.parquet
├── part-00001-...snappy.parquet

A corresponding schema file is available at:

s3://<your-bucket>/<your-subfolder>/<your-domain>/deltalake/schema/<dataset name>.json;

Automated Maintenance

Chargebee handles routine Delta Lake maintenance automatically, including:

Compaction: Periodically merges small files to optimize read performance.
Vacuum: Cleans obsolete data and metadata in line with Delta Lake retention policies.

View as Markdown

Was this article helpful?

Product Updates

Getting Started

Implementing Chargebee

AI in Chargebee

Developer Resources

Product Catalog

Subscriptions

Customers

Entitlements

Usage Based Billing

Chargebee CPQ

Invoices, Credit Notes, and Quotes

Taxes

Hosted Capabilities

Site Configuration

Multi Business Entity

Mobile Subscriptions

Reports and Analytics

Integrations

Data Privacy & Security

Data Operations