ποΈ MongoDB: Records vs. References β
When designing Mongoose schemas in apps/agri-backend, data relationships follow two patterns: embedded documents (denormalized) and ObjectId references (normalized).
Note The Workspace model has isResources: boolean. This is the key distinction β isResources: true workspaces are "references" (resource/knowledge bases), while isResources: false are "records" (parse results from conversations).
Embedded Documents β
Used when data is hierarchical, immutable, or always read together with the parent:
// Field embeds GeoJSON geometry β always read together, never shared
@Prop({ type: Geometry, required: true })
geometry: Polygon | MultiPolygon;
// Draft embeds full message objects β snapshot, no join needed
@Prop({ type: [DraftMessageSchema], default: [] })
messages: DraftMessage[];
// CallLog embeds asset metadata β tightly coupled to the call record
@Prop({ type: Map, of: Object })
assets: Map<string, { gcs: GcsAsset; transcriptions: Record<string, Transcription> }>;ObjectId References β
Used for cross-domain relationships where each entity has its own lifecycle:
// Draft references User β user has independent lifecycle
@Prop({ type: Types.ObjectId, ref: 'User' })
userId: Types.ObjectId;
// Draft references multiple CallLogs β grow independently
@Prop({ type: [{ type: Types.ObjectId, ref: 'CallLog' }] })
callLogIds: Types.ObjectId[];
// Virtual reverse lookup β Transcription owns the foreign key
CallLogSchema.virtual('transcriptions', {
ref: 'Transcription',
localField: '_id',
foreignField: 'callLogId',
});Decision Guide β
| Use embedding when... | Use references when... |
|---|---|
| Data is a value object (GeoJSON, metadata bundle) | The sub-entity has its own lifecycle |
| Always read together with the parent | The sub-collection can grow very large |
| Never shared across documents | Cross-domain queries are needed |
| Immutable or append-only | Reverse lookups are required |
Domain Relationship Map β
User
ββ userId βββββββββββββββββββββββββββββββββββββββββββββββ
β β
βΌ βΌ
Draft CallLog
ββ messages[] (embedded DraftMessage[]) ββ assets (embedded GCS + transcription data)
ββ messageIds[] βββββββββββΊ Message ββ parseId βββββββββββΊ ParseResult
ββ callLogIds[] βββββββββββΊ CallLog ββ virtual: transcriptions β Transcription
Field (standalone β no cross-domain refs)
ββ geometry (embedded GeoJSON Polygon)Note:
FieldandDrafthave no direct schema link. They connect through the BullMQ job queue at processing time β a deliberate decoupling so geospatial and conversational data evolve independently.