High Concurrency.
Building High-Concurrency Backend Systems: Lessons From OrdersPilot at 10K Orders/Day
Every backend engineer hits a wall. Your Node.js server handles 100 requests per second fine. Then you hit 500. The server starts sweating. Threads pile up. Response times creep from 50ms to 500ms. And suddenly you're debugging race conditions at 2 AM because your order database got corrupted from concurrent writes.
This is the moment most developers think they need to rewrite everything in Go or Java. They don't. I built **OrdersPilot** on Node.js + MongoDB + Redis, handling over 10,000 orders daily from 50+ Indian D2C Shopify brands, with sub-100ms response times even during peak traffic. The secret isn't the language—it's understanding where concurrency actually breaks and fixing it before it scales.
The Problem
When OrdersPilot first went live with real customers, I thought the architecture was solid. Node.js, Express, MongoDB with proper indexing, Redis caching layer. What could go wrong?
Everything. Within the first week, a brand hit 2,000 orders in a single day during a flash sale. The dashboard froze. Calling team couldn't confirm orders. Logistics integration timed out. The database showed 47 duplicate orders for the same customer. The calling team's WhatsApp messages were firing twice. And somehow, three orders marked as "confirmed" were simultaneously "pending" and "shipped."
The issue wasn't Node.js's single-threaded nature (that's actually an advantage). It was that I'd optimized for correctness at low volume, not safety at high concurrency. Every assumption I made about request ordering broke the second Shopify fired five webhook events simultaneously.
The Solution
High-concurrency systems require three layers of defense: **event serialization, transactional integrity, and rate-aware workload distribution**.
01. Event Serialization via Queue Systems
When Shopify fires webhooks, they don't guarantee order. An "order.updated" event might arrive before "order.created." In a traditional REST handler, this causes state corruption. I switched to **BullMQ** (Redis-backed job queue) with per-order-ID serialization. Every Shopify event becomes a job queued under the order's ID. BullMQ ensures jobs for the same order execute sequentially, while jobs for different orders run in parallel. This single change eliminated 90% of race conditions.
// Pseudo-code: Serialized webhook processing
const orderQueue = new Queue('order-events', { connection: redis });
app.post('/webhooks/order', async (req) => {
const { id: orderId, event } = req.body;
await orderQueue.add(
`order-${orderId}`,
{ orderId, event, timestamp: Date.now() },
{ jobId: `${orderId}-${event}-${Date.now()}` }
);
});
orderQueue.process(`order-*`, async (job) => {
const { orderId, event } = job.data;
// Database operations are now serialized per order ID
await handleOrderEvent(orderId, event);
});02. Transactional Database Writes
MongoDB doesn't have row-level locks like PostgreSQL. Without transactions, concurrent updates to the same order document would silently overwrite each other. I switched to MongoDB sessions and transactions for all multi-document operations. Any operation that touches inventory, order state, and logistics records happens atomically—all commit or all rollback.
03. Rate-Aware Workload Distribution
The third layer is knowing your system's true capacity and respecting it. I implemented adaptive backpressure: when the queue depth exceeds a threshold, incoming requests get rejected with a 429 (Too Many Requests) rather than queued infinitely. This forces clients (Shopify, our calling team UI) to retry with exponential backoff, preventing cascading failure.
Engineering Challenges
The Webhook Explosion
During a Black Friday sale, a single brand received 8,000 orders in 90 minutes. Shopify fired 24,000 webhook events. Without serialization, the database was trying to execute 24,000 concurrent writes on 8,000 documents. The solution was two-fold: increase BullMQ concurrency to match DB throughput, and implement webhook deduplication.
The Cascading Timeout
With 500 concurrent dashboard users, that's 500 database queries per second. I solved this with a three-tier caching strategy: in-memory Redis cache (10-min TTL), optimized database indexes, and query result caching. Dashboard response times dropped to 80ms.
Lessons Learned
Concurrency Bugs Surface at Scale, Not in Development
My local testing with 100 orders looked bulletproof. The first real customer with 5,000 orders exposed five separate race conditions. This taught me to think about concurrency from day one.
Observability Is Non-Negotiable
Without proper logging and monitoring, debugging concurrency issues is impossible. I implemented structured logging with request tracing, database slow-query logs, and Redis queue depth monitoring.
Your Database Is Your Bottleneck
Node.js can handle I/O beautifully, but MongoDB has limits. Understanding your database's actual throughput and designing your queue concurrency to match it prevents cascading failures.
Building for 10K+ daily orders taught me that scale isn't about architectural rewrites—it's about respecting concurrency, respecting your database's limits, and building observability in from the start.