Notes, Voice Notes & Attachments β Upload Architecture β
Reference implementation:
apps/agri-frontend/src/components/field_app/Map/NoteEditorDrawer.tsxandapps/agri-frontend/src/hooks/useAudioRecorder.ts
Overview: What is a "Note"? β
A Note is a geo-located observation created from the Map view. On the backend, it is stored as a Draft with source: 'map'. There is no separate Note model.
The frontend uses a lightweight type for rendering:
type Note = {
id: string;
title: string;
body: string;
createdAt: number;
lat?: number;
lng?: number;
status?: 'sending' | 'sent' | 'error';
attachments?: any[];
};On the backend, the Draft document stores: name (title), transcript (body), source: 'map', metadata.locations (coordinates), attachments (file references), and callLogIds (audio recording references).
Key Concept: Two Separate Upload Paths β
Attachments and Audio Notes use completely different upload pipelines and are linked to the note differently.
| Attachments (images, PDFs, docs) | Audio Notes (voice recordings) | |
|---|---|---|
| Upload mechanism | 3-step signed URL via /attachments/upload | 4-step signed URL via /call-logs/generate-s3-signed-url |
| Backend model | Attachment collection | CallLog collection (legacy) |
| Post-upload processing | Confirm upload, optional analysis | Transcription via OpenAI (BullMQ pipeline) |
| Link to Note (Draft) | attachments[] array | callLogIds[] array |
| Real-time feedback | None needed | WebSocket transcription_completed event |
| Max count | 5 files per note | Unlimited recordings |
Flow 1: File Attachments (images, PDFs, documents) β
Uses a 3-step signed-URL pattern (no backend proxy, files go directly to S3).
Step 1 β Request upload URL β
POST /api/attachments/upload
Content-Type: application/json
Authorization: Bearer <firebase-id-token>
{
"fileName": "photo.jpg",
"fileType": "image/jpeg",
"attachmentType": "image",
"companyId": "<company-id>",
"draftId": "<optional>",
"conversationId": "<optional>",
"metadata": {
"gps": { "latitude": 48.856, "longitude": 2.352, "altitude": 35 }
}
}Response:
{
"attachment": { "_id": "abc123", "attachmentType": "image", ... },
"uploadUrl": "https://s3.amazonaws.com/bucket/images/1234_photo.jpg?X-Amz-..."
}This creates an Attachment record in MongoDB with uploadingStatus: 'uploading'.
Step 2 β Upload directly to S3 β
PUT <uploadUrl>
Content-Type: image/jpeg
<raw file bytes>Step 3 β Confirm upload β
POST /api/attachments/<attachment-id>/confirm-upload
Authorization: Bearer <firebase-id-token>Marks uploadingStatus: 'completed'. For HEIC images, the backend converts them to JPEG.
Frontend preprocessing (images) β
- Images larger than 2.5 MB are downscaled to max 1600px on the longest side (JPEG at 82% quality).
- GPS EXIF data is extracted from the original image and sent as
metadata. - HEIC/HEIF images are converted client-side when possible.
The returned attachment._id is collected and sent in the note's attachments[] array when saving.
Flow 2: Audio Notes (Voice Recordings) + Transcription β
This is more complex: the audio is uploaded through a legacy CallLog pipeline and then transcribed asynchronously.
A. Recording β
The web app uses the browser MediaRecorder API via the useAudioRecorder hook (apps/agri-frontend/src/hooks/useAudioRecorder.ts).
- Auto-detects MIME type:
audio/mp4>audio/webm;codecs=opus>audio/webm>audio/ogg;codecs=opus - Collects chunks every 250ms
- Creates a
Filewith a timestamped name (e.g.audio-2026-02-26T10-30-00-000Z.m4a)
For mobile: Use the equivalent native recording API (e.g., expo-av or react-native-audio-recorder-player). Output format should be m4a (AAC) or webm (Opus) β the backend supports both.
B. Upload + Transcription Request (4 sequential steps) β
When recording stops, the upload chain runs automatically:
Step 1 β Generate signed URL + create CallLog β
POST /api/call-logs/generate-s3-signed-url
Content-Type: application/json
Authorization: Bearer <firebase-id-token>
{
"fileName": "audio-2026-02-26T10-30-00-000Z.m4a",
"fileType": "audio/mp4",
"companyId": "<company-id>",
"userId": "<user-id>",
"source": "notes"
}Important:
source: 'notes'maps toCallSource.TRANSCRIBE_ONLYβ the audio will be transcribed but NOT parsed into structured records. Use'file_upload'for the dashboard flow that also runs AI parsing.
Response:
{
"_id": "calllog-abc123",
"signedUrl": "https://s3.amazonaws.com/bucket/audio/1234_audio.m4a?X-Amz-..."
}Step 2 β Upload to S3 β
PUT <signedUrl>
Content-Type: audio/mp4
<raw audio bytes>Step 3 β Update CallLog with permanent URL β
PUT /api/call-logs/calllog-abc123/file-url
Content-Type: application/json
Authorization: Bearer <firebase-id-token>
{
"fileUrl": "https://s3.amazonaws.com/bucket/audio/1234_audio.m4a"
}The permanent URL is the signed URL without query parameters (strip everything after
?).
Step 4 β Request transcription β
POST /api/transcription
Content-Type: application/json
Authorization: Bearer <firebase-id-token>
{
"audioUrl": "https://s3.amazonaws.com/bucket/audio/1234_audio.m4a",
"userId": "<user-id>",
"callLogId": "calllog-abc123",
"companyId": "<company-id>",
"conversationId": "<optional>"
}After this call, the audio status becomes transcribing.
C. Backend Transcription Pipeline β
The transcription runs asynchronously through 3 BullMQ queues:
POST /api/transcription
β
ββ> callLogsService.queueAudioProcessing()
β
βΌ
ββββββββββββββββββββ
β asset.acquire β Downloads from S3, uploads to GCS for archival
ββββββββ¬ββββββββββββ
β Fan-out to 3 transcription engines (in parallel)
β
ββββββββΌβββββββββββββββ
βΌ βΌ βΌ
openai whisper-1 qwen3
(gpt-4o) (whisper-1) (qwen3)
β β β
ββββββββΌβββββββββββββββ
βΌ
TranscriptionService.save() (deduplicated via SHA256)
β
βΌ
ChatGateway.emitTranscriptionCompleted() (WebSocket)Three models run in parallel for accuracy. Results are deduplicated by SHA256 hash of the transcription text.
D. WebSocket: Receiving the Transcription Result β
When transcription finishes, the backend emits a Socket.io event scoped to the user's room:
Event name: 'transcription_completed'
Room: 'user:<userId>'
Payload: {
callLogId: string;
status: 'completed' | 'ai-processed' | 'ai-processing-failed' | 'error';
transcript?: string; // the transcribed text
error?: string;
}For mobile: Connect to the Socket.io server and listen for transcription_completed. Match incoming events by callLogId to update the correct recording.
A successful transcription has status === 'completed' or status === 'ai-processed'.
E. Transcription Text Auto-Append β
In the web app, when a transcription completes, the onTranscribed callback fires and the transcript text is appended to the note's description automatically. This lets users dictate notes hands-free and see text appear in real-time.
F. RecordedAudio State Machine β
Each audio recording goes through these states:
idle β recording β processing β uploading β transcribing β transcribed
β β
ββββ (cancelled) ββββ errorThe RecordedAudio interface:
interface RecordedAudio {
id: string;
fileName: string;
blobUrl?: string;
status: 'recording' | 'uploading' | 'transcribing' | 'transcribed' | 'error';
error?: string;
permanentUrl?: string;
callLogId?: string; // set after Step 1, used to match WebSocket events
audioFile?: File;
transcriptionText?: string; // set when transcription completes
}Flow 3: Saving the Note (Combining Everything) β
When the user presses "Send", the handleSaveNote function in MapView.tsx orchestrates the final save:
Pre-save Validation β
- Title or description must be non-empty
- All audio recordings must be in
transcribedorerrorstatus (wait for pending transcriptions) - Max 5 file attachments
Save Sequence β
- Upload pending file attachments via
uploadMultipleFiles()(the 3-step attachment flow above) - Normalize attachments into
{ id, type, url, name, metadata }objects - Collect callLogIds from transcribed audio recordings (filter for
status === 'transcribed'and truthycallLogId) - Build payload and call the backend:
POST /api/drafts/from-map-note
Content-Type: application/json
Authorization: Bearer <firebase-id-token>
{
"companyId": "<company-id>",
"title": "Field observation",
"transcript": "The crop looks healthy in the north section...",
"attachments": [
{
"id": "att-abc123",
"type": "image",
"url": "https://s3.../images/photo.jpg",
"name": "photo.jpg",
"metadata": { "gps": { "latitude": 48.856, "longitude": 2.352 } }
}
],
"callLogIds": ["calllog-abc123", "calllog-def456"],
"location": {
"type": "Point",
"coordinates": [2.352, 48.856],
"properties": {
"fieldId": "field-xyz",
"fieldName": "North Parcel",
"timestamp": "2026-02-26T10:30:00.000Z",
"label": "Field observation"
}
}
}Backend Processing β
DraftService.createFromMapNote():
- Geo-matches coordinates against the
Fieldcollection to find which agricultural field the note belongs to - Fetches full attachment documents from DB by IDs
- Creates the Draft with
source: 'map',status: 'processing' - Enqueues attachment analysis (BullMQ) for image/document processing
Offline Support (Reference) β
The web app's VoiceNoteButton component (used in the chat, not in notes) has offline support via offlineQueue.ts:
- When offline, audio blobs are stored in IndexedDB (using
idb-keyval) - When the app comes back online, a flush function replays the queue
- It uses a 5-second delay between uploads to avoid overwhelming the backend
For mobile: Consider implementing a similar queue using AsyncStorage or SQLite for offline note creation.
Quick Reference: API Endpoints β
Audio Notes β
| Step | Method | Endpoint | Purpose |
|---|---|---|---|
| 1 | POST | /api/call-logs/generate-s3-signed-url | Create CallLog + get signed URL |
| 2 | PUT | <S3 signed URL> | Upload audio to S3 |
| 3 | PUT | /api/call-logs/:id/file-url | Set permanent URL on CallLog |
| 4 | POST | /api/transcription | Trigger transcription pipeline |
| β | WebSocket | transcription_completed | Receive transcript result |
File Attachments β
| Step | Method | Endpoint | Purpose |
|---|---|---|---|
| 1 | POST | /api/attachments/upload | Create Attachment + get signed URL |
| 2 | PUT | <S3 signed URL> | Upload file to S3 |
| 3 | POST | /api/attachments/:id/confirm-upload | Confirm upload complete |
Note (Draft) Creation β
| Method | Endpoint | Purpose |
|---|---|---|
POST | /api/drafts/from-map-note | Create the note with attachments + callLogIds + location |
Architecture Diagram β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MOBILE APP β
β β
β βββββββββββββββ ββββββββββββββββ βββββββββββββββββββββ β
β β Audio Record β β File Picker β β Note Editor β β
β β (expo-av) β β (images/PDF) β β (title + body) β β
β ββββββββ¬βββββββ ββββββββ¬ββββββββ ββββββββββ¬βββββββββββ β
β β β β β
β βΌ βΌ β β
β ββββββββββββββββ ββββββββββββββββ β β
β β CallLog Flow β β Attachment β β β
β β (4 steps) β β Flow (3 step)β β β
β β β β β β β
β β Returns: β β Returns: β β β
β β callLogId β β attachment.idβ β β
β ββββββββ¬ββββββββ ββββββββ¬ββββββββ β β
β β β β β
β β WebSocket β β β
β ββββ transcript ββββ β β
β β completed β β β
β β β β β
β βΌ βΌ βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β POST /api/drafts/from-map-note β β
β β { title, transcript, attachments[], callLogIds[], β β
β β location, companyId } β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β BACKEND β
β β
β CallLog Pipeline Attachment Pipeline β
β ββββββββββββββββββ βββββββββββββββββββ β
β β S3 + GCS β β S3 + HEIC conv β β
β β BullMQ queues β β Analysis jobs β β
β β 3x transcribe β βββββββββββββββββββ β
β β WebSocket emit β β
β ββββββββββββββββββ Draft (source: 'map') β
β βββββββββββββββββββ β
β β attachments[] β β Attachment IDsβ
β β callLogIds[] β β CallLog IDs β
β β location β β GeoPoint β
β β name / transcriptβ β
β βββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ