If you've shipped a payment API and not lost sleep over an idempotency edge case, you haven't shipped enough payment volume yet.
This is the playbook we wish we'd had. It's the result of watching enough double-charge incident reviews to know which patterns survive contact with reality and which ones don't.
What "idempotent" actually means
A formal definition: an operation is idempotent if performing it N times yields the same observable state as performing it once. For payments:
- POST
/orderswith the sameidempotency_key→ at most one order created, regardless of how many times the request is sent - POST
/payoutswith the sameidempotency_key→ at most one payout dispatched - PUT
/orders/{id}is naturally idempotent (assuming deterministic state derivation)
Important: idempotency is about the side effect, not the response. A second call with the same key should ideally return the same response, but the binding contract is that the system state mutates at most once.
Why every payment API needs it
Five real failure modes that idempotency keys defend against:
- Network retry on the client. Connection drops mid-request; client retries; without dedup the operation runs twice.
- Edge proxy retry. CDN, load balancer, or service mesh retries an "errored" request that actually succeeded server-side.
- Multi-tenant queue retry. A worker crashes after creating the order but before acking the message; another worker picks it up.
- Webhook retry storms. When you retry webhooks (and you should), the receiver needs to dedup by event_id.
- Operator double-click. A merchant clicks "Pay out USD 50K" twice in your dashboard.
In our production telemetry, retry-induced duplicate attempts happen on roughly 1 in 200 payment calls. Without idempotency keys, that's a 0.5% double-execution rate at base. With well-designed keys, it's effectively zero.
The naïve approach (and why it fails)
The instinct is something like:
def create_order(req):
key = req.idempotency_key
existing = db.orders.find_one({"idempotency_key": key})
if existing:
return existing
order = create_order_in_db(req)
return order
This breaks on race conditions. Two concurrent requests both pass the find_one check before either commits — both create orders.
You need atomicity. Either a database-level unique constraint or a transactional check-and-create.
The pattern that works
Here's the canonical pattern, distilled to the essentials:
def create_order(req):
key = require_idempotency_key(req)
request_hash = sha256(canonical_form(req.body))
# Step 1: insert the idempotency record OR find existing
inserted = db.idempotency.insert_or_get(
key=key,
request_hash=request_hash,
endpoint="POST /orders",
status="in_progress",
merchant_id=req.merchant_id,
ttl_seconds=86400, # 24h retention
)
# Step 2a: re-request with the SAME body — return cached response
if inserted.was_existing and inserted.request_hash == request_hash:
if inserted.status == "completed":
return cached_response(inserted.response_id)
if inserted.status == "in_progress":
# Caller is retrying while we're still processing the original.
# Return 409 Conflict with a Retry-After header.
return conflict_response(retry_after=2)
# Step 2b: re-request with a DIFFERENT body — semantic mismatch
if inserted.was_existing and inserted.request_hash != request_hash:
return error_response(
422,
"Idempotency-Key reused with different request body"
)
# Step 3: do the actual work
try:
order = create_order_in_db(req, idempotency_key=key)
response = serialize(order)
db.idempotency.update(
key=key,
status="completed",
response_id=response.id,
)
return response
except Exception as e:
db.idempotency.update(key=key, status="failed", error=str(e))
raise
Five things that matter:
1. Hash the request body. A client must not be able to "reuse" a key with different parameters. Hash a canonical form of the body (sorted keys, no whitespace) and store it. On retry, re-hash and compare.
2. Three states, not two. in_progress, completed, failed. The middle state lets you tell a retrying caller "I'm still working on the original request — back off and retry" instead of either making them wait or letting them race.
3. Scope by merchant. A leaked or guessed idempotency key should never let attacker A interfere with merchant B's transactions. The unique constraint should be (merchant_id, idempotency_key), not idempotency_key alone.
4. TTL the records. 24 hours is the sweet spot for most payment scenarios. Stripe uses 24h, we use 24h, this is the consensus.
5. The unique constraint is the real lock. Don't rely on application-level "check then write." Use DB-level uniqueness. PostgreSQL's INSERT ... ON CONFLICT DO NOTHING RETURNING * is the ideal primitive.
We see this anti-pattern often: developers reach for Redis because it's "fast." Two problems: (1) most production Redis setups don't have durable persistence by default, so a node restart loses keys; (2) Redis-only uniqueness has subtle race windows. Use your transactional database for idempotency. If volume warrants caching, layer Redis on top of the durable record, not in place of it.
Where this gets interesting: distributed payouts
The single-table pattern above works for "create order." It gets harder when the operation triggers a downstream side effect — say, dispatching to a banking partner.
Consider: you successfully create the payout record, then call partner API to actually move the money. Partner response times out. Did the partner receive your request? You don't know.
Naive retry: re-call the partner. Maybe move the money twice.
The correct pattern uses end-to-end idempotency: pass the same idempotency token (or a deterministic derivative) to the downstream system, and rely on their idempotency.
def execute_payout(payout_id, idempotency_key):
# Generate a deterministic downstream key
downstream_key = f"kxp-{payout_id}-{idempotency_key}"
response = partner_api.create_transfer(
body=payout.to_partner_format(),
headers={"X-Idempotency-Key": downstream_key},
)
return response
Partners with mature APIs (Stripe, Wise, Currencycloud, the major card schemes) will dedup on this. Partners without mature APIs are a liability — work with them at your own risk, and absolutely build a reconciliation job that runs the next day to detect and reverse duplicates.
The webhook side: idempotent consumers
Same idea, mirror image. When you publish webhooks, every event has a stable event_id. Receivers dedup on event_id. This is critical because any sane webhook publisher will retry on failure, often aggressively.
In Kaadxpay's case, every webhook carries:
event_id— globally unique UUIDevent_type— e.g.order.captureddelivery_id— unique per delivery attempt (so receivers can distinguish retries)signature— HMAC over the bodydelivered_at— server-side dispatch timestamp
Receivers should dedup on event_id and ignore delivery_id for state changes (just for diagnostics).
Edge cases to test
If you're building this, run these test scenarios. They catch real bugs:
- Same key, identical body, sent twice in quick succession → second returns 200 with same response or 409 if first still pending
- Same key, different body → 422 with clear error message
- Same key, different merchant → both succeed (correct scoping)
- Same key after TTL expiry → second is treated as a new request (not an error)
- Process crash mid-execution, retry after restart → idempotency record left in
in_progress, retry returns 409, your background worker eventually marks itfailedbased on absence of completion within TTL - DB primary failover during execution → exactly-once semantics preserved if you use transactional updates
- Network partition between you and downstream partner → reconciliation job catches and resolves any duplicates
What this looks like at scale
For perspective, our PostgreSQL-backed idempotency table at Kaadxpay handles roughly:
PostgreSQL with proper indexing handles this without breaking a sweat well past 10M operations/day. You don't need a special-purpose datastore until you're operating at a different scale than us.
Cleanup and operational tips
- Background sweep. Have a worker that marks
in_progressrecords older than X minutes asfailed. Catches stuck records from process crashes. - Retain
completedandfailedrecords full TTL. Don't garbage-collect aggressively — debug visibility into "why did the second call return what it did?" is invaluable in incident response. - Surface idempotency state in your dashboard. When merchants ask "did my request go through?", giving them the idempotency record's state is faster than human investigation.
- Add structured logs. Every idempotency match (cache hit) and conflict (422) should log with context. This is your single best data source for client-side bug patterns.
TL;DR for engineers
If you're new to payment APIs and just want the survival kit:
- Make every state-changing endpoint require an
Idempotency-Keyheader for POST and DELETE - Hash the request body, store it alongside the key
- Use three states:
in_progress,completed,failed - Scope keys by merchant ID
- TTL records for 24 hours
- Pass through to downstream systems
- Test the edge cases above
- Log everything
The teams that get this right ship payments without double-charge incidents. The teams that don't, eventually do — usually at the worst possible moment.
If you're building on Kaadxpay or evaluating us, you can see the production version of this design in our API reference.
Posts from the Kaadxpay engineering team covering API design, webhook reliability, reconciliation patterns, and the practical realities of running a cross-border payment platform.