Docs

Log in to read the version of docs relevant to your site, or use the dropdown versions

Delta Lake Format in Automatic Exports

Automatic Exports is in early access. Request access to enable this feature for your site.

Chargebee supports data export in Delta Lake format, which provides reliable, ACID-compliant updates, efficient versioning, and seamless integration with modern data lakes and analytics platforms.

Advantages of Delta Lake Format

While JSON and Parquet are widely used formats, Delta Lake offers several practical advantages for you:

  • Reliable updates: Unlike Parquet or JSON, Delta Lake supports updates and deletes natively. This ensures that exported data always reflects the latest state without manual intervention.
  • Schema evolution: When your data structure changes (e.g., new fields), Delta Lake handles this smoothly, triggering a full export to ensure your analytics stay in sync.
  • Audit and historical views: Delta Lake supports time travel, letting your team query past versions of your data for audit, reporting, or rollback needs.
  • Easier maintenance: Vacuuming and optimization are managed automatically by Chargebee, eliminating the need for manual tuning.

Note:

Delta Lake format is supported only in environments compatible with the Delta Lake open standard, such as Databricks Unity Catalog, AWS Glue, Hive Metastore, and Apache Spark.

Format Specifications

The following table lists the Delta Lake format specifications:

ParameterValue
Delta Version3.3.1
Block Size128 MB
Log Retention72 hours (default)
File Retention72 hours (default)
CompactionSnappy (default)

Metadata and Change Tracking

Each record in the Delta export includes key metadata fields to support ingestion, auditability, and Change Data Capture (CDC) pipelines:

FieldDescription
change_type

Indicates type of change:

  • read → Full snapshot
  • create → New record
  • update → Updated record
  • delete → Deleted record
record_extracted_at

Timestamp when the updated data became available in Chargebee’s data lakehouse.

record_exported_at

Timestamp when the data was exported to the configured destination.

When the export configuration is modified (e.g., columns added), a full export is automatically triggered. All records in this batch will have change_type = read to denote a snapshot. The first sync for every table will also be a full export.

Creating Delta Tables from Exported Data

Chargebee writes Delta-formatted data directly to your configured cloud storage. You can register and query the exported data using Delta Lake-compatible tools.

Using AWS Glue / Hive Metastore / Databricks Unity Catalog

CREATE TABLE my_table
USING DELTA
LOCATION 's3://<your-bucket>/<your-subfolder>/<your-domain>/deltalake/data/<dataset name>';

Using Apache Spark without external catalog

SELECT * FROM delta.'s3://<your-bucket>/<your-subfolder>/<your-domain>/deltalake/data/<dataset name>';

Using Spark DataFrame API

df = spark.read.format("delta").load("s3://<your-bucket>/<your-subfolder>/<your-domain>/deltalake/data/<dataset name>")
df.show()

Export File Structure

Exported Delta Lake datasets follow the standard Delta Lake directory structure:

s3://<your-bucket>/<your-subfolder>/<your-domain>/deltalake/data/<dataset name>;
├── _delta_log/
│   ├── 00000000000000000000.json
│   ├── 00000000000000000001.json
├── part-00000-...snappy.parquet
├── part-00001-...snappy.parquet

A corresponding schema file is available at:

s3://<your-bucket>/<your-subfolder>/<your-domain>/deltalake/schema/<dataset name>.json;

Automated Maintenance

Chargebee handles routine Delta Lake maintenance automatically, including:

  • Compaction: Periodically merges small files to optimize read performance.
  • Vacuum: Cleans obsolete data and metadata in line with Delta Lake retention policies.

Was this article helpful?