Summary
On September 18, 2025, starting at 09:00 UTC, our monitoring systems detected abnormal behavior in the credit management logic of the API. A subset of customer requests (~7%) incorrectly received an "insufficient credits" error message, even though their balances were valid.
The incident was mitigated by 09:30 UTC and fully resolved by 09:50 UTC.
Impact
- Duration: 09:00 – 09:50 UTC
- Affected users: ~7% of customers
- Impact: Valid requests were rejected with "insufficient credits" errors, leading to degraded service availability.
Root Cause
The issue was caused by an optimization introduced in the credit validation logic. A conditional path introduced in the update misclassified certain valid balances as insufficient, erroneously denying requests.
As a result:
- Customers with sufficient credits intermittently received errors.
- Monitoring alerts correctly detected anomalies in request success rate.
Resolution
- 09:15 UTC: On-call engineers were paged after monitoring flagged increased error rates.
- 09:30 UTC: The team identified the issue as a recent change in the credit management logic. A rollback to the last stable version was initiated.
- 09:50 UTC: All systems were stable, and error rates returned to normal levels.
Preventive Measures
To reduce the likelihood of similar incidents:
- ✅ Stricter Code Reviews: Credit management logic will undergo enhanced peer review and validation before deployment.
- ✅ Pre-Deployment Testing: Expanded test coverage for credit scenarios, including edge cases for balances near thresholds.
- ✅ Canary Releases: Future changes to critical billing or credit systems will be deployed using gradual rollout strategies with automatic rollback triggers.
- ✅ Improved Monitoring: Specific metrics and alarms for anomalous "insufficient credits" responses have been added.