How Chargebee Built The Fastest Usage-Based Billing Engine

Earlier this week, we were proud to share that Chargebee has been named a Leader in the 2025 Gartner® Magic Quadrant™ for Recurring Billing Applications. ¹

We believe this recognition reflects our dedication to helping companies monetize innovation with speed, accuracy, and scale.

As AI seeps into every product, companies are experimenting with new ways to monetize product usage: workflow- or action-based models for autonomous agents, outcome-based pricing, credit packs for generative AI, and hybrid structures that blend usage with subscriptions. The shift is clear—seat-based pricing is under pressure, AI features carry real costs, and customers now expect to pay for value delivered, not just access granted.

For product and engineering leaders, this isn’t just a pricing evolution—it’s a systems challenge. Every AI feature introduces volatile unit economics, dynamic usage patterns, and the need for real-time alignment between product, finance, and infrastructure. Modern billing systems can no longer stop at processing revenue; they must help teams interpret the revenue signals hidden within millions of usage events and act on them instantly. That’s exactly what we set out to solve at Chargebee.

We are proud to share that during the Gartner demo, Chargebee processed 1 million usage events in just 25 seconds—the fastest in the category. ¹ More than a benchmark, it’s proof of an architecture built to turn complex usage data into real-time monetization at enterprise scale, so product and engineering teams can focus on what moves the business forward, not on maintaining billing infrastructure.

Because in the end, great billing isn’t about what it processes; it’s about what it makes possible.

Why Fast Aggregation and Rating Matter Now

Unlike traditional SaaS features with predictable margins, AI capabilities behave like microproducts, each with its own unit economics. A chat completion may cost fractions of a cent, while code or image generation can cost dollars. For startups, every interaction carries an unpredictable cost. For incumbents, layering AI features onto existing products is like launching new business lines overnight.

Without near real-time aggregation and rating:

Customers face bill shock. A startup wakes up to a $50,000 invoice for AI API calls they made a week ago. They do not understand what drove that bill. Trust evaporates.
Finance teams can’t enforce spend controls. You’ve set a $10,000 monthly threshold for a new customer. They blow through it on day two, but you don’t know until day five. You’ve now extended $15,000 in credit to an unvetted account.

Fast aggregation and rating enable companies to tie usage to cost and surface signals on time, whether that’s enforcing controls, resolving disputes, or seizing upsell opportunities.

Designing for the Long Runway

Chargebee powers usage-based billing for 500+ businesses, ranging from early-stage startups to Forbes’ Top 50 AI companies such as DeepL and Lambda. That breadth of customers shaped how we built our usage rating engine.

Some vendors optimize exclusively for high-throughput infrastructure companies. They can handle extreme scale, but with steep learning curves that make them impractical for most businesses. Others prioritize simplicity for basic metering, but leave you to stitch together separate systems for invoicing, revenue recognition, and payment collection. Still others handle traditional subscription billing well, but force you to code usage logic yourself.

We needed something different: a system that could serve a startup on day one and scale with them to enterprise volumes without requiring a platform migration. The same architecture that handles 10,000 events per day for a SaaS company must handle billions of events per day for an AI infrastructure company, with complete billing, invoicing, and revenue recognition built in.

Thus began our journey to convert product usage into a signal for real-time business decisions.

The Principles That Guided Our Build

We designed around three core principles:

1. Predictable latency beats absolute speed.

It’s better to consistently process usage in 30 seconds than to average 15 seconds but occasionally spike to 10 minutes. Predictability lets you set reliable SLAs with your customers and lets engineering reason about system behavior.

2. Accuracy is non-negotiable.

In billing, you can’t be “eventually accurate.” Every event is money. Miss one, and you’ve leaked revenue. Double-count one, and you’ve broken customer trust. Our deduplication strategy had to be airtight at scale.

3. Handle the messy edges, not just the happy path.

A rating engine that works only on perfect data is useless. Real life means events arriving six hours late because a Kafka cluster is down, or events with missing fields due to instrumentation changes.

The Architecture Behind Speed and Scale

Building for speed meant solving two challenges: getting usage events into the system quickly, and transforming them into accurate data ready for billing.

We built two ingestion pathways:

API ingestion for real-time, event-by-event streaming—ideal for businesses that need immediate visibility or feedback from the Chargebee ecosystem.
Batch ingestion for periodic usage uploads, offered in two modes:
- File-based (Push): You upload usage files on your schedule—daily, weekly, or as needed.
- S3-based (Pull): Chargebee retrieves files directly from your S3 storage, reducing integration effort and making ingestion frictionless.

Each pathway has different performance characteristics, but all three needed to handle the same downstream challenge: turning millions of raw events into accurate, aggregated usage data ready for rating.

Solving Speed Without Sacrificing Accuracy

The core tension in building a fast usage pipeline is this: single-record operations don’t scale, but batch operations risk accuracy.

Problem 1: The Database Bottleneck

When usage events arrive one at a time and you insert them individually, your database becomes the bottleneck. Even at a modest scale—say, 10,000 events per second—you’re hammering it with 10,000 separate write operations. Traditional databases are designed to handle a group of operations as a unit of work, ensuring that the data is not stored in a corrupted state. But, to achieve that robustness, these databases can’t perform concurrent writes at this scale. The result: increasing latency and ballooning infrastructure costs.

Our solution: decouple ingestion from processing and move to batched writes.

Instead of writing each event transactionally, we introduced a queue between the customer-facing API and the database. The API performs only lightweight validation and acknowledgment—responding in milliseconds, even at scale—and then pushes events into the queue. The queued events are accumulated and optimally written to storage in batches, drastically reducing write pressure on the database.

This design not only removed the database bottleneck but also had a powerful side effect: customers no longer wait for database operations to complete. The ingestion API remains consistently fast, while the backend ensures data integrity and high throughput.

By combining asynchronous queuing with batched writes, we achieved predictable latency and scalability, processing hundreds of thousands of events per second through our streaming API without sacrificing accuracy or reliability.

Problem 2: The Storage and Deduplication Challenge

Once we committed to bulk writes, we faced a fundamental tradeoff:

Transactional databases enforce uniqueness constraints natively, making deduplication trivial. But they can’t write at the speeds required for millions of events without prohibitive costs.

Columnar databases offer extremely fast writes and built-in aggregation capabilities. They’re designed for time-series data, where you ingest at scale and run analytical queries across time windows. But they’re optimized for append-only workloads where duplicates don’t matter much; if you log the same server metric twice, nothing breaks.

In billing, duplicates are catastrophic. Let’s say you submit the same API call twice due to a retry, and if we don’t catch it, your customer gets billed twice. Conversely, if we’re too aggressive and drop legitimate events, we risk losing revenue.

We needed the write throughput of a columnar system with the accuracy guarantees of a transactional system.

Our solution combines the strengths of both approaches. We built a lightweight deduplication layer using a composite primary key (subscription ID, timestamp, and deduplication ID) to ensure every event is unique before reaching our main storage. This happens during the bulk write process, not as an afterthought.

Our first attempt utilized the database’s built-in asynchronous deduplication; however, since it worked after ingestion, it could introduce inaccuracies at the aggregated level. We then tried custom methods—time-based bucketing, key-based deduplication with custom queries, and delayed aggregation. Each of these helped solve the problem of duplicate events, but didn’t scale as efficiently as other components in the pipeline.

Each attempt taught us something, but the fundamental problem remained: you can’t bolt reliable deduplication onto a system not designed for it.

These choices gave us both speed and accuracy. We were able to process 200K+ events per second with zero duplicate events.

Problem 3: Scaling Batch Ingestion

Many merchants might prefer to store raw usage data in an alternative storage as backups, from which a third party can retrieve it for billing. Others prioritize ease of integration and reliability over maintaining near-real-time data streams with complex retry and error handling. For both cases, Chargebee supports batch ingestion through files, offered in two modes: direct file uploads or S3 pull, where Chargebee retrieves data straight from a merchant’s storage. This design keeps integration simple while scaling efficiently as data volumes grow.

Processing large files at scale brings its own challenges. To prevent noisy-neighbor effects in a multi-tenant environment, we use tenant-isolated parallel batch processing, so workloads execute concurrently without cross-tenant interference and throughput remains consistent.

Failed records are captured and surfaced (via webhook or uploaded back to the merchant’s bucket), making it easy for teams to review, correct, and reprocess data without manual debugging.

Problem 4: Usage Metering at Scale

Handling raw events at query time doesn’t scale. Imagine totaling 10 million API calls at the time of billing. Scanning millions of rows would grind billing to a halt.

As events arrive, we bucket them into minute-level summaries. This finer granularity keeps aggregation windows small enough for near-real-time updates while maintaining predictable query performance. At billing time, instead of processing millions of raw rows, the system queries thousands of pre-aggregated buckets.

This approach not only reduces billing time from minutes to milliseconds, but it also ensures query reliability, as the number of buckets per billing period remains relatively constant, with a clear upper limit. That consistency makes performance predictable, regardless of whether a customer generates 1,000 usage events or 10 million.

Solving the Messy Realities of Billing

Fast pipelines are only helpful if they handle real-world edge cases:

1. Late-Arriving Events: Say your upstream system goes down for six hours. When it comes back online, it replays all queued usage. Those events are timestamped in the past—they belong to yesterday’s billing window. If you’ve already closed and invoiced that period, what do you do?

Chargebee offers configurable grace periods. Events that land within the grace period get added to the correct invoice, so finance doesn’t have to issue manual corrections, and customers see a consistent, trustworthy bill.

2. Customer Disputes: In legacy setups, when a customer disputes a charge, support teams have to reconstruct usage from scattered logs. “You charged me $5,000 for compute hours, but I only ran three jobs” turns into a multi-day investigation involving engineering.

Because Chargebee stores raw usage events alongside aggregates, you can drill down to the event level, filter by subscription, time range, and usage type, and pull exact records. Disputes can be debugged in minutes, not days.

3. Error Handling: Not all usage events are valid. Some fail schema validation, while others may be missing mandatory fields such as the subscription ID, deduplication ID, or timestamp. In some cases, the same event may be submitted more than once, resulting in duplicate records.

By surfacing failed events explicitly rather than dropping them silently, Chargebee makes debugging transparent and maintains a predictable ingestion pipeline, even under imperfect real-world data.

What This Unlocks for You

1. A Reliable Foundation That Scales

What sets Chargebee apart isn’t just that it can process millions of usage events per second—it’s that it does so predictably, transparently, and without friction as you grow. The same system that supports your first 10,000 events powers your next 10 million with identical guarantees on accuracy, latency, and auditability. No migrations. No re-architecture.

But scale alone isn’t the story. It’s about what this foundation enables. When your billing infrastructure is this reliable, product teams can experiment freely with new pricing models. Finance can trust the numbers flowing in from real-time usage data. Engineering doesn’t need to maintain one-off billing scripts or debug late-arriving events at 2 a.m.

We’ve built Chargebee’s usage engine so that your teams can spend less time maintaining systems and more time unlocking new monetization opportunities, whether that’s launching AI features as add-ons, introducing hybrid pricing, or turning usage data into actionable business insights.

2. Near Real-Time Usage Insights

Fast aggregation turns usage data into a live signal, not a historical artifact. We provide visibility into metered features at two levels:

At the feature level, product teams can see adoption patterns across their customer base. Is your new AI code completion feature getting traction? Can you start monetizing it? This helps prioritize the roadmap and business decisions.

At the subscription level, you can see allocated versus actual usage for individual customers. If a customer is approaching their limit mid-cycle, that’s actionable. GTM teams can reach out proactively about upgrading before they’re blocked, turning usage thresholds into upsell signals rather than friction points.

These insights are updated within minutes of usage, not days later. That’s the difference between reacting to what happened last week and acting on what’s happening right now.

3. Hybrid Pricing Without Custom Code

The most common request we hear: “We want to charge a base subscription fee, include X units of usage, then charge per unit beyond that, with tiered discounts at higher volumes.”

That sentence describes four different pricing concepts—subscriptions, entitlements, usage rating, and tiered pricing—that most billing systems force you to implement separately. Some vendors handle subscriptions well, but require you to code your own usage logic. Others handle metering, but stop short of invoicing and revenue recognition.

Chargebee’s usage ingestion pipeline integrates natively with our subscription platform. You define your hybrid pricing model once: base plan + included usage + overage tiers. The system handles the rest: no custom code, no fragile integrations, no manual reconciliation.

This is possible due to our architecture of usage-based billing for pricing agility, which features modular layers that enable business teams to configure pricing rules without requiring engineering involvement.

The Bigger Picture

Billing has outgrown its reputation as “back-office plumbing.” In a world where features carry real costs, pricing experiments move quickly, and customer trust depends on transparency, billing speed has become a factor in product speed.

The horizon is much bigger than automating invoices. It’s about enabling monetization agility, surfacing insights in real-time, and giving businesses the confidence to scale without friction.

At Chargebee, we’ve built the foundation for this future. If you’d like to explore how it applies to your business, let’s chat.

Get a Demo

¹Source: Gartner, “Magic Quadrant for Recurring Billing Applications,” Mark Lewis, Robert Anderson, October 13, 2025.

GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally, and MAGIC QUADRANT is a registered trademark of Gartner, Inc. and/or its affiliates and are used herein with permission. All rights reserved.

Gartner does not endorse any vendor, product or service depicted in our research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

How We Built the Category’s Fastest Usage Rating Engine