Skip to main content

The Engineering That Made 60,000 Emergency Requests Possible

· 5 min read

In emergency assistance, latency is a metric signifying liability.

We were building a multi-agent AI system for roadside assistance. Voice agents, supervisor agents, dispatch agents - all orchestrating in real-time to help stranded drivers.

On paper, the initial architecture looked clean. Each agent had its own database connection, authentication flow, and lifecycle. Clear separation of concerns. Easy to reason about and easy to ship.

But the math didn’t hold for a system designed to serve 60,000 daily users.

Where The Maths Breaks

Before a single byte of useful data moves, every database connection pays a fixed cost:

TCP Handshake: ~30ms
TLS Negotiation: ~50-100ms
Auth & Session Setup: ~30-50ms
Total overhead: ~150-200ms per connection.

In our initial design, a single user emergency could trigger 20 sequential agent actions. If each agent opened its own connection, that’s 3-4 seconds of pure network overhead per user. In emergency assistance, 4 seconds is an eternity.

At full theoretical load, over a million connection attempts daily would push memory requirements into terabyte scale.

Even after accounting for realistic concurrency, connection overhead alone would consume a significant portion of available memory, exhausting our database threads and memory buffers long before we hit peak traffic.

Thus, with independent connections per agent, the system cascades failure as:

  1. Agents compete for connections
  2. Connection pools saturate
  3. Requests queue behind slower queries
  4. Timeouts trigger retries
  5. Retries amplify load

The Pivot: From Isolation to Agentic Concurrency

I thought of the problem statement from the perspective of autonomous vehicle coordination for inspiration, where self-driving cars share a unified, real-time state of the map they are driving on.

Similarly, our agents don’t act in isolation, rather in a swarm. They were part of the same workflow, operating on the same user context, within the same time window.

When a user says "I have a flat tire," the Voice Agent, Context Agent, and Dispatch Agent all need data simultaneously. Here, the database connection becomes a shared highway.

If they fight for separate connections from a standard pool, they create head-of-line blocking. They wait. They timeout. They fail.

Standard connection pooling helps, but it doesn’t solve the core issue: Agentic Concurrency.

The Solution: Intelligent Multiplexing

We built a shared connection layer that allowed multiple agents to pipeline queries over a single, persistent TLS tunnel. We moved from a "request-per-connection" model to a "session-per-workflows" model.

1. Super-connection (multiplexing) 

Multiple agents can send queries over the same TCP/TLS tunnel simultaneously. When the first agent (Voice) connects, it establishes the TLS handshake and auth. Subsequent agents (Context, Dispatch) don’t open new connections; they borrow the existing secure channel.

2. Query Pipelining (Removing Head-of-Line Blocking)

Standard pools are FIFO (First-In, First-Out). If Agent A is slow, Agent B waits.

Asynchronous query pipelining over the shared connection ensures agents send their database requests into a prioritized queue on the client side. The database processes them as fast as it can, returning results out-of-order if necessary.

For instance, Agents no longer block each other. A slow "log this event" query doesn’t delay a critical "get user location" query.

3. Intelligent Lifecycle Management

Resource leaks are prevented while maximizing reuse during the critical window of the emergency, by tying the connection lifecycle to the user’s emergency session, not the individual agent’s execution time.

The Impact:

MetricBefore (Isolated)After (Multiplexed)Improvement
Round Trips120-150 per workflow4-7 per workflow95% Reduction
Latency Overhead~4,000ms per workflow~20ms amortized99.5% Reduction
Memory Usage~40MB per concurrent workflow~2MB per concurrent workflow95% Reduction

We didn’t just save time. We changed the complexity class of the system.

Before: O(agents × users × connection overhead) After: O(active workflows × shared connections)

The Human Outcome

Engineering decisions stay abstract until they hit the road. Here’s what those milliseconds actually bought us:

  • 2-3 Minutes Faster Response: By shaving 4 seconds off every interaction, we accelerated the entire dispatch chain. In urban traffic, that’s the difference between a tow truck arriving before rush hour peaks or after.
  • 99% Fewer Failures: Connection storms cause timeouts. Timeouts cause retries. Retries cause cascading failures. By stabilizing the network layer, we stabilized the user experience.
  • Scalability Without Panic: When we hit regional disasters (spikes to 10x normal load), the system would scale linearly. We didn’t need to throw hardware at the problem; we had already solved it in software.

The Lesson: Complexity is a Choice

It’s tempting to let architecture drift. To let each microservice own its own connections, its own configs, its own chaos. It feels "clean" in the short term.

But in high-stakes systems, local decisions compound.

Treating connections as shared, high-cost primitives made the system faster and viable at scale.