Skip to content

πŸ—„οΈ MongoDB: Records vs. References ​

When designing Mongoose schemas in apps/agri-backend, data relationships follow two patterns: embedded documents (denormalized) and ObjectId references (normalized).

Note The Workspace model has isResources: boolean. This is the key distinction β€” isResources: true workspaces are "references" (resource/knowledge bases), while isResources: false are "records" (parse results from conversations).

Embedded Documents ​

Used when data is hierarchical, immutable, or always read together with the parent:

typescript
// Field embeds GeoJSON geometry β€” always read together, never shared
@Prop({ type: Geometry, required: true })
geometry: Polygon | MultiPolygon;

// Draft embeds full message objects β€” snapshot, no join needed
@Prop({ type: [DraftMessageSchema], default: [] })
messages: DraftMessage[];

// CallLog embeds asset metadata β€” tightly coupled to the call record
@Prop({ type: Map, of: Object })
assets: Map<string, { gcs: GcsAsset; transcriptions: Record<string, Transcription> }>;

ObjectId References ​

Used for cross-domain relationships where each entity has its own lifecycle:

typescript
// Draft references User β€” user has independent lifecycle
@Prop({ type: Types.ObjectId, ref: 'User' })
userId: Types.ObjectId;

// Draft references multiple CallLogs β€” grow independently
@Prop({ type: [{ type: Types.ObjectId, ref: 'CallLog' }] })
callLogIds: Types.ObjectId[];

// Virtual reverse lookup β€” Transcription owns the foreign key
CallLogSchema.virtual('transcriptions', {
  ref: 'Transcription',
  localField: '_id',
  foreignField: 'callLogId',
});

Decision Guide ​

Use embedding when...Use references when...
Data is a value object (GeoJSON, metadata bundle)The sub-entity has its own lifecycle
Always read together with the parentThe sub-collection can grow very large
Never shared across documentsCross-domain queries are needed
Immutable or append-onlyReverse lookups are required

Domain Relationship Map ​

User
 β”œβ”€ userId ──────────────────────────────────────────────┐
 β”‚                                                       β”‚
 β–Ό                                                       β–Ό
Draft                                              CallLog
 β”œβ”€ messages[]  (embedded DraftMessage[])           β”œβ”€ assets  (embedded GCS + transcription data)
 β”œβ”€ messageIds[] ──────────► Message                β”œβ”€ parseId ──────────► ParseResult
 └─ callLogIds[] ──────────► CallLog                └─ virtual: transcriptions ← Transcription

Field (standalone β€” no cross-domain refs)
 └─ geometry (embedded GeoJSON Polygon)

Note: Field and Draft have no direct schema link. They connect through the BullMQ job queue at processing time β€” a deliberate decoupling so geospatial and conversational data evolve independently.