Queues & Async β
Long-running or error-prone work doesn't belong in a request handler. It belongs in a queue. This is factor VIII of our architecture β scale out via the process model, not by making individual requests do more.
We use BullMQ backed by Redis. Workers are independent processes: they can be scaled, restarted, or rate-limited without touching the API.
The problem with doing work in the request β
A request handler has a hard contract: respond quickly, don't fail the user because a downstream service is slow. When you do real work inside a request:
- If the external API (OpenAI, GCS) is slow, the user waits β or times out
- If the service restarts mid-operation, the work is lost with no way to retry
- If traffic spikes, every request does the full work β no backpressure, no throttling
The same applies to in-process alternatives:
| Pattern | Problem |
|---|---|
EventEmitter for mutations | In-process β service crashes, event is gone. No retry, no trace. |
| MongoDB change streams | Listener is in-process β restart = every missed event is lost forever. |
@Cron / NestJS scheduler | Every running instance fires the job. Two instances = duplicate emails, duplicate reports. |
The rule: if the outcome matters and it can fail, it goes in a queue.
What queues give you β
- Retries with backoff β a failed job is retried automatically, not silently dropped
- Backpressure β a rate limiter on the worker protects external APIs from being hammered
- Chaining β a processor enqueues the next job, so a multi-step pipeline (receive β transcribe β parse) retries each step independently
- Observability β every job has an ID, a state, a history; Bull Board shows what's running, waiting, or failed
- Horizontal scaling β add workers for the slow queues, leave the fast ones alone
Queues in this codebase β
The audio pipeline is the clearest example of why this matters:
call-log.receive β asset.acquire β transcribe.openai β transcribe.parseEach step is independently retryable. A transient OpenAI timeout retries just the transcription, not the entire pipeline from the start.
Other domains follow the same pattern: chat message delivery, attachment analysis, schema audits, Slack notifications β anything that touches an external service or takes more than a few milliseconds.
Why BullMQ and not Pub/Sub or RabbitMQ β
Different brokers solve different problems. We use BullMQ because our jobs are tightly coupled to the backend β they need NestJS DI, MongoDB access, and Redis is already in the stack. It's the right tool for task queues within a single service.
| BullMQ | Pub/Sub (GCP) | RabbitMQ | |
|---|---|---|---|
| Best for | Background jobs within a service | Cross-service event fan-out at scale | Complex routing across many services |
| Delivery | At-least-once (exactly-once with Redis lock) | At-least-once (exactly-once available on pull) | At-least-once with manual ack; at-most-once with auto-ack |
| Model | Pull (workers poll Redis) | Push or pull (per subscription) | Push (broker pushes to subscribed consumers) |
| Persistence | Redis | Google-managed | Broker-managed |
| Retries | Built-in, per-job | Ack/nack, per-subscription | Ack/nack, per-consumer |
| Routing | Queue name | Topic + subscription filters | Exchanges + binding keys |
| Overhead | Low (Redis already present) | Managed, no infra to run | Requires a broker to operate |
If we ever need to fan events out to multiple independent services (e.g. a data pipeline consuming the same call log events as the backend), Pub/Sub would be the right addition β not a replacement.
Further reading: BullMQ docs Β· Cloud Pub/Sub overview Β· RabbitMQ tutorials
Related β
- Stateless Services β why work must leave the process
- Scheduling β repeatable jobs with Redis locking
- Bootstrap jobs β one-time setup jobs