Queues & Async

Long-running or error-prone work doesn't belong in a request handler. It belongs in a queue. This is factor VIII of our architecture — scale out via the process model, not by making individual requests do more.

We use BullMQ backed by Redis. Workers are independent processes: they can be scaled, restarted, or rate-limited without touching the API.

The problem with doing work in the request

A request handler has a hard contract: respond quickly, don't fail the user because a downstream service is slow. When you do real work inside a request:

If the external API (OpenAI, GCS) is slow, the user waits — or times out
If the service restarts mid-operation, the work is lost with no way to retry
If traffic spikes, every request does the full work — no backpressure, no throttling

The same applies to in-process alternatives:

Pattern	Problem
`EventEmitter` for mutations	In-process — service crashes, event is gone. No retry, no trace.
MongoDB change streams	Listener is in-process — restart = every missed event is lost forever.
`@Cron` / NestJS scheduler	Every running instance fires the job. Two instances = duplicate emails, duplicate reports.

The rule: if the outcome matters and it can fail, it goes in a queue.

What queues give you

Retries with backoff — a failed job is retried automatically, not silently dropped
Backpressure — a rate limiter on the worker protects external APIs from being hammered
Chaining — a processor enqueues the next job, so a multi-step pipeline (receive → transcribe → parse) retries each step independently
Observability — every job has an ID, a state, a history; Bull Board shows what's running, waiting, or failed
Horizontal scaling — add workers for the slow queues, leave the fast ones alone

Queues in this codebase

The audio pipeline is the clearest example of why this matters:

call-log.receive → asset.acquire → transcribe.openai → transcribe.parse

Each step is independently retryable. A transient OpenAI timeout retries just the transcription, not the entire pipeline from the start.

Other domains follow the same pattern: chat message delivery, attachment analysis, schema audits, Slack notifications — anything that touches an external service or takes more than a few milliseconds.

Why BullMQ and not Pub/Sub or RabbitMQ

Different brokers solve different problems. We use BullMQ because our jobs are tightly coupled to the backend — they need NestJS DI, MongoDB access, and Redis is already in the stack. It's the right tool for task queues within a single service.

	BullMQ	Pub/Sub (GCP)	RabbitMQ
Best for	Background jobs within a service	Cross-service event fan-out at scale	Complex routing across many services
Delivery	At-least-once (exactly-once with Redis lock)	At-least-once (exactly-once available on pull)	At-least-once with manual ack; at-most-once with auto-ack
Model	Pull (workers poll Redis)	Push or pull (per subscription)	Push (broker pushes to subscribed consumers)
Persistence	Redis	Google-managed	Broker-managed
Retries	Built-in, per-job	Ack/nack, per-subscription	Ack/nack, per-consumer
Routing	Queue name	Topic + subscription filters	Exchanges + binding keys
Overhead	Low (Redis already present)	Managed, no infra to run	Requires a broker to operate

If we ever need to fan events out to multiple independent services (e.g. a data pipeline consuming the same call log events as the backend), Pub/Sub would be the right addition — not a replacement.

Further reading: BullMQ docs · Cloud Pub/Sub overview · RabbitMQ tutorials

Stateless Services — why work must leave the process
Scheduling — repeatable jobs with Redis locking
Bootstrap jobs — one-time setup jobs

✅ Testing

🐙 Git

🚂 Deployment

👁️ Observability

Queues & Async

The problem with doing work in the request

What queues give you

Queues in this codebase

Why BullMQ and not Pub/Sub or RabbitMQ

Queues & Async ​

The problem with doing work in the request ​

What queues give you ​

Queues in this codebase ​

Why BullMQ and not Pub/Sub or RabbitMQ ​

Related ​

Queues & Async

The problem with doing work in the request

What queues give you

Queues in this codebase

Why BullMQ and not Pub/Sub or RabbitMQ

Related