Delta Lake Format in Automatic Exports
Chargebee supports data export in Delta Lake format, which provides reliable, ACID-compliant updates, efficient versioning, and seamless integration with modern data lakes and analytics platforms.
Advantages of Delta Lake Format
While JSON and Parquet are widely used formats, Delta Lake offers several practical advantages for you:
- Reliable updates: Unlike Parquet or JSON, Delta Lake supports updates and deletes natively. This ensures that exported data always reflects the latest state without manual intervention.
- Schema evolution: When your data structure changes (e.g., new fields), Delta Lake handles this smoothly, triggering a full export to ensure your analytics stay in sync.
- Audit and historical views: Delta Lake supports time travel, letting your team query past versions of your data for audit, reporting, or rollback needs.
- Easier maintenance: Vacuuming and optimization are managed automatically by Chargebee, eliminating the need for manual tuning.
Note
Delta Lake format is supported only in environments compatible with the Delta Lake open standard, such as Databricks Unity Catalog, AWS Glue, Hive Metastore, and Apache Spark.
Format Specifications
The following table lists the Delta Lake format specifications:
| Parameter | Value |
|---|---|
| Delta Version | 3.3.1 |
| Block Size | 128 MB |
| Log Retention | 72 hours (default) |
| File Retention | 72 hours (default) |
| Compaction | Snappy (default) |
Metadata and Change Tracking
Each record in the Delta export includes key metadata fields to support ingestion, auditability, and Change Data Capture (CDC) pipelines:
| Field | Description |
|---|---|
change_type | Indicates type of change:
|
record_extracted_at | Timestamp when the updated data became available in Chargebee’s data lakehouse. |
record_exported_at | Timestamp when the data was exported to the configured destination. |
When the export configuration is modified (e.g., columns added), a full export is automatically triggered. All records in this batch will have change_type = read to denote a snapshot. The first sync for every table will also be a full export.
Creating Delta Tables from Exported Data
Chargebee writes Delta-formatted data directly to your configured cloud storage. You can register and query the exported data using Delta Lake-compatible tools.
Using AWS Glue / Hive Metastore / Databricks Unity Catalog
CREATE TABLE my_table
USING DELTA
LOCATION 's3://<your-bucket>/<your-subfolder>/<your-domain>/deltalake/data/<dataset name>';
Using Apache Spark without external catalog
SELECT * FROM delta.'s3://<your-bucket>/<your-subfolder>/<your-domain>/deltalake/data/<dataset name>';
Using Spark DataFrame API
df = spark.read.format("delta").load("s3://<your-bucket>/<your-subfolder>/<your-domain>/deltalake/data/<dataset name>")
df.show()
Export File Structure
Exported Delta Lake datasets follow the standard Delta Lake directory structure:
s3://<your-bucket>/<your-subfolder>/<your-domain>/deltalake/data/<dataset name>;
├── _delta_log/
│ ├── 00000000000000000000.json
│ ├── 00000000000000000001.json
├── part-00000-...snappy.parquet
├── part-00001-...snappy.parquet
A corresponding schema file is available at:
s3://<your-bucket>/<your-subfolder>/<your-domain>/deltalake/schema/<dataset name>.json;
Automated Maintenance
Chargebee handles routine Delta Lake maintenance automatically, including:
- Compaction: Periodically merges small files to optimize read performance.
- Vacuum: Cleans obsolete data and metadata in line with Delta Lake retention policies.
Was this article helpful?