Docs
You are viewing:
Product Catalog 2.0
Automatic Exports is in early access. Request access to enable this feature for your site.
Chargebee supports data export in Delta Lake format, which provides reliable, ACID-compliant updates, efficient versioning, and seamless integration with modern data lakes and analytics platforms.
While JSON and Parquet are widely used formats, Delta Lake offers several practical advantages for you:
Note:
Delta Lake format is supported only in environments compatible with the Delta Lake open standard, such as Databricks Unity Catalog, AWS Glue, Hive Metastore, and Apache Spark.
The following table lists the Delta Lake format specifications:
Parameter | Value |
---|---|
Delta Version | 3.3.1 |
Block Size | 128 MB |
Log Retention | 72 hours (default) |
File Retention | 72 hours (default) |
Compaction | Snappy (default) |
Each record in the Delta export includes key metadata fields to support ingestion, auditability, and Change Data Capture (CDC) pipelines:
Field | Description |
---|---|
change_type | Indicates type of change:
|
record_extracted_at | Timestamp when the updated data became available in Chargebee’s data lakehouse. |
record_exported_at | Timestamp when the data was exported to the configured destination. |
When the export configuration is modified (e.g., columns added), a full export is automatically triggered. All records in this batch will have change_type
= read
to denote a snapshot. The first sync for every table will also be a full export.
Chargebee writes Delta-formatted data directly to your configured cloud storage. You can register and query the exported data using Delta Lake-compatible tools.
CREATE TABLE my_table
USING DELTA
LOCATION 's3://<your-bucket>/<your-subfolder>/<your-domain>/deltalake/data/<dataset name>';
SELECT * FROM delta.'s3://<your-bucket>/<your-subfolder>/<your-domain>/deltalake/data/<dataset name>';
df = spark.read.format("delta").load("s3://<your-bucket>/<your-subfolder>/<your-domain>/deltalake/data/<dataset name>")
df.show()
Exported Delta Lake datasets follow the standard Delta Lake directory structure:
s3://<your-bucket>/<your-subfolder>/<your-domain>/deltalake/data/<dataset name>;
├── _delta_log/
│ ├── 00000000000000000000.json
│ ├── 00000000000000000001.json
├── part-00000-...snappy.parquet
├── part-00001-...snappy.parquet
A corresponding schema file is available at:
s3://<your-bucket>/<your-subfolder>/<your-domain>/deltalake/schema/<dataset name>.json;
Chargebee handles routine Delta Lake maintenance automatically, including:
Was this article helpful?