Notes, Voice Notes & Attachments — Upload Architecture

Reference implementation: apps/agri-frontend/src/components/field_app/Map/NoteEditorDrawer.tsx and apps/agri-frontend/src/hooks/useAudioRecorder.ts

Overview: What is a "Note"?

A Note is a geo-located observation created from the Map view. On the backend, it is stored as a Draft with source: 'map'. There is no separate Note model.

The frontend uses a lightweight type for rendering:

type Note = {
  id: string;
  title: string;
  body: string;
  createdAt: number;
  lat?: number;
  lng?: number;
  status?: 'sending' | 'sent' | 'error';
  attachments?: any[];
};

On the backend, the Draft document stores: name (title), transcript (body), source: 'map', metadata.locations (coordinates), attachments (file references), and callLogIds (audio recording references).

Key Concept: Two Separate Upload Paths

Attachments and Audio Notes use completely different upload pipelines and are linked to the note differently.

	Attachments (images, PDFs, docs)	Audio Notes (voice recordings)
Upload mechanism	3-step signed URL via `/attachments/upload`	4-step signed URL via `/call-logs/generate-s3-signed-url`
Backend model	`Attachment` collection	`CallLog` collection (legacy)
Post-upload processing	Confirm upload, optional analysis	Transcription via OpenAI (BullMQ pipeline)
Link to Note (Draft)	`attachments[]` array	`callLogIds[]` array
Real-time feedback	None needed	WebSocket `transcription_completed` event
Max count	5 files per note	Unlimited recordings

Flow 1: File Attachments (images, PDFs, documents)

Uses a 3-step signed-URL pattern (no backend proxy, files go directly to S3).

Step 1 — Request upload URL

http

POST /api/attachments/upload
Content-Type: application/json
Authorization: Bearer <firebase-id-token>

{
  "fileName": "photo.jpg",
  "fileType": "image/jpeg",
  "attachmentType": "image",
  "companyId": "<company-id>",
  "draftId": "<optional>",
  "conversationId": "<optional>",
  "metadata": {
    "gps": { "latitude": 48.856, "longitude": 2.352, "altitude": 35 }
  }
}

Response:

json

{
  "attachment": { "_id": "abc123", "attachmentType": "image", ... },
  "uploadUrl": "https://s3.amazonaws.com/bucket/images/1234_photo.jpg?X-Amz-..."
}

This creates an Attachment record in MongoDB with uploadingStatus: 'uploading'.

Step 2 — Upload directly to S3

http

PUT <uploadUrl>
Content-Type: image/jpeg

<raw file bytes>

Step 3 — Confirm upload

http

POST /api/attachments/<attachment-id>/confirm-upload
Authorization: Bearer <firebase-id-token>

Marks uploadingStatus: 'completed'. For HEIC images, the backend converts them to JPEG.

Frontend preprocessing (images)

Images larger than 2.5 MB are downscaled to max 1600px on the longest side (JPEG at 82% quality).
GPS EXIF data is extracted from the original image and sent as metadata.
HEIC/HEIF images are converted client-side when possible.

The returned attachment._id is collected and sent in the note's attachments[] array when saving.

Flow 2: Audio Notes (Voice Recordings) + Transcription

This is more complex: the audio is uploaded through a legacy CallLog pipeline and then transcribed asynchronously.

A. Recording

The web app uses the browser MediaRecorder API via the useAudioRecorder hook (apps/agri-frontend/src/hooks/useAudioRecorder.ts).

Auto-detects MIME type: audio/mp4 > audio/webm;codecs=opus > audio/webm > audio/ogg;codecs=opus
Collects chunks every 250ms
Creates a File with a timestamped name (e.g. audio-2026-02-26T10-30-00-000Z.m4a)

For mobile: Use the equivalent native recording API (e.g., expo-av or react-native-audio-recorder-player). Output format should be m4a (AAC) or webm (Opus) — the backend supports both.

B. Upload + Transcription Request (4 sequential steps)

When recording stops, the upload chain runs automatically:

Step 1 — Generate signed URL + create CallLog

http

POST /api/call-logs/generate-s3-signed-url
Content-Type: application/json
Authorization: Bearer <firebase-id-token>

{
  "fileName": "audio-2026-02-26T10-30-00-000Z.m4a",
  "fileType": "audio/mp4",
  "companyId": "<company-id>",
  "userId": "<user-id>",
  "source": "notes"
}

Important: source: 'notes' maps to CallSource.TRANSCRIBE_ONLY — the audio will be transcribed but NOT parsed into structured records. Use 'file_upload' for the dashboard flow that also runs AI parsing.

Response:

json

{
  "_id": "calllog-abc123",
  "signedUrl": "https://s3.amazonaws.com/bucket/audio/1234_audio.m4a?X-Amz-..."
}

Step 2 — Upload to S3

http

PUT <signedUrl>
Content-Type: audio/mp4

<raw audio bytes>

Step 3 — Update CallLog with permanent URL

http

PUT /api/call-logs/calllog-abc123/file-url
Content-Type: application/json
Authorization: Bearer <firebase-id-token>

{
  "fileUrl": "https://s3.amazonaws.com/bucket/audio/1234_audio.m4a"
}

The permanent URL is the signed URL without query parameters (strip everything after ?).

Step 4 — Request transcription

http

POST /api/transcription
Content-Type: application/json
Authorization: Bearer <firebase-id-token>

{
  "audioUrl": "https://s3.amazonaws.com/bucket/audio/1234_audio.m4a",
  "userId": "<user-id>",
  "callLogId": "calllog-abc123",
  "companyId": "<company-id>",
  "conversationId": "<optional>"
}

After this call, the audio status becomes transcribing.

C. Backend Transcription Pipeline

The transcription runs asynchronously through 3 BullMQ queues:

POST /api/transcription
    │
    └─> callLogsService.queueAudioProcessing()
        │
        ▼
   ┌──────────────────┐
   │  asset.acquire    │  Downloads from S3, uploads to GCS for archival
   └──────┬───────────┘
          │  Fan-out to 3 transcription engines (in parallel)
          │
   ┌──────┼──────────────┐
   ▼      ▼              ▼
 openai  whisper-1     qwen3
(gpt-4o) (whisper-1)   (qwen3)
   │      │              │
   └──────┼──────────────┘
          ▼
   TranscriptionService.save()  (deduplicated via SHA256)
          │
          ▼
   ChatGateway.emitTranscriptionCompleted()  (WebSocket)

Three models run in parallel for accuracy. Results are deduplicated by SHA256 hash of the transcription text.

D. WebSocket: Receiving the Transcription Result

When transcription finishes, the backend emits a Socket.io event scoped to the user's room:

Event name: 'transcription_completed'
Room: 'user:<userId>'

Payload: {
  callLogId: string;
  status: 'completed' | 'ai-processed' | 'ai-processing-failed' | 'error';
  transcript?: string;    // the transcribed text
  error?: string;
}

For mobile: Connect to the Socket.io server and listen for transcription_completed. Match incoming events by callLogId to update the correct recording.

A successful transcription has status === 'completed' or status === 'ai-processed'.

E. Transcription Text Auto-Append

In the web app, when a transcription completes, the onTranscribed callback fires and the transcript text is appended to the note's description automatically. This lets users dictate notes hands-free and see text appear in real-time.

F. RecordedAudio State Machine

Each audio recording goes through these states:

idle → recording → processing → uploading → transcribing → transcribed
                       │                          │
                       └──→ (cancelled)            └──→ error

The RecordedAudio interface:

interface RecordedAudio {
  id: string;
  fileName: string;
  blobUrl?: string;
  status: 'recording' | 'uploading' | 'transcribing' | 'transcribed' | 'error';
  error?: string;
  permanentUrl?: string;
  callLogId?: string; // set after Step 1, used to match WebSocket events
  audioFile?: File;
  transcriptionText?: string; // set when transcription completes
}

Flow 3: Saving the Note (Combining Everything)

When the user presses "Send", the handleSaveNote function in MapView.tsx orchestrates the final save:

Pre-save Validation

Title or description must be non-empty
All audio recordings must be in transcribed or error status (wait for pending transcriptions)
Max 5 file attachments

Save Sequence

Upload pending file attachments via uploadMultipleFiles() (the 3-step attachment flow above)
Normalize attachments into { id, type, url, name, metadata } objects
Collect callLogIds from transcribed audio recordings (filter for status === 'transcribed' and truthy callLogId)
Build payload and call the backend:

http

POST /api/drafts/from-map-note
Content-Type: application/json
Authorization: Bearer <firebase-id-token>

{
  "companyId": "<company-id>",
  "title": "Field observation",
  "transcript": "The crop looks healthy in the north section...",
  "attachments": [
    {
      "id": "att-abc123",
      "type": "image",
      "url": "https://s3.../images/photo.jpg",
      "name": "photo.jpg",
      "metadata": { "gps": { "latitude": 48.856, "longitude": 2.352 } }
    }
  ],
  "callLogIds": ["calllog-abc123", "calllog-def456"],
  "location": {
    "type": "Point",
    "coordinates": [2.352, 48.856],
    "properties": {
      "fieldId": "field-xyz",
      "fieldName": "North Parcel",
      "timestamp": "2026-02-26T10:30:00.000Z",
      "label": "Field observation"
    }
  }
}

Backend Processing

DraftService.createFromMapNote():

Geo-matches coordinates against the Field collection to find which agricultural field the note belongs to
Fetches full attachment documents from DB by IDs
Creates the Draft with source: 'map', status: 'processing'
Enqueues attachment analysis (BullMQ) for image/document processing

Offline Support (Reference)

The web app's VoiceNoteButton component (used in the chat, not in notes) has offline support via offlineQueue.ts:

When offline, audio blobs are stored in IndexedDB (using idb-keyval)
When the app comes back online, a flush function replays the queue
It uses a 5-second delay between uploads to avoid overwhelming the backend

For mobile: Consider implementing a similar queue using AsyncStorage or SQLite for offline note creation.

Quick Reference: API Endpoints

Audio Notes

Step	Method	Endpoint	Purpose
1	`POST`	`/api/call-logs/generate-s3-signed-url`	Create CallLog + get signed URL
2	`PUT`	`<S3 signed URL>`	Upload audio to S3
3	`PUT`	`/api/call-logs/:id/file-url`	Set permanent URL on CallLog
4	`POST`	`/api/transcription`	Trigger transcription pipeline
—	WebSocket	`transcription_completed`	Receive transcript result

File Attachments

Step	Method	Endpoint	Purpose
1	`POST`	`/api/attachments/upload`	Create Attachment + get signed URL
2	`PUT`	`<S3 signed URL>`	Upload file to S3
3	`POST`	`/api/attachments/:id/confirm-upload`	Confirm upload complete

Note (Draft) Creation

Method	Endpoint	Purpose
`POST`	`/api/drafts/from-map-note`	Create the note with attachments + callLogIds + location

Architecture Diagram

┌─────────────────────────────────────────────────────────────────┐
│                        MOBILE APP                               │
│                                                                 │
│  ┌─────────────┐    ┌──────────────┐    ┌───────────────────┐  │
│  │ Audio Record │    │ File Picker  │    │ Note Editor       │  │
│  │ (expo-av)   │    │ (images/PDF) │    │ (title + body)    │  │
│  └──────┬──────┘    └──────┬───────┘    └────────┬──────────┘  │
│         │                  │                     │              │
│         ▼                  ▼                     │              │
│  ┌──────────────┐  ┌──────────────┐              │              │
│  │ CallLog Flow │  │ Attachment   │              │              │
│  │ (4 steps)    │  │ Flow (3 step)│              │              │
│  │              │  │              │              │              │
│  │ Returns:     │  │ Returns:     │              │              │
│  │ callLogId    │  │ attachment.id│              │              │
│  └──────┬───────┘  └──────┬───────┘              │              │
│         │                  │                     │              │
│         │    WebSocket     │                     │              │
│         │◄── transcript ───│                     │              │
│         │    completed     │                     │              │
│         │                  │                     │              │
│         ▼                  ▼                     ▼              │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │              POST /api/drafts/from-map-note              │  │
│  │  { title, transcript, attachments[], callLogIds[],       │  │
│  │    location, companyId }                                 │  │
│  └──────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                        BACKEND                                  │
│                                                                 │
│  CallLog Pipeline          Attachment Pipeline                  │
│  ┌────────────────┐        ┌─────────────────┐                 │
│  │ S3 + GCS       │        │ S3 + HEIC conv  │                 │
│  │ BullMQ queues  │        │ Analysis jobs   │                 │
│  │ 3x transcribe  │        └─────────────────┘                 │
│  │ WebSocket emit │                                             │
│  └────────────────┘        Draft (source: 'map')               │
│                            ┌─────────────────┐                 │
│                            │ attachments[]   │ ← Attachment IDs│
│                            │ callLogIds[]    │ ← CallLog IDs   │
│                            │ location        │ ← GeoPoint      │
│                            │ name / transcript│                 │
│                            └─────────────────┘                 │
└─────────────────────────────────────────────────────────────────┘

✅ Testing

🐙 Git

🚂 Deployment

👁️ Observability

Notes, Voice Notes & Attachments — Upload Architecture

Overview: What is a "Note"?

Key Concept: Two Separate Upload Paths

Flow 1: File Attachments (images, PDFs, documents)

Step 1 — Request upload URL

Step 2 — Upload directly to S3

Step 3 — Confirm upload

Frontend preprocessing (images)

Flow 2: Audio Notes (Voice Recordings) + Transcription

A. Recording

B. Upload + Transcription Request (4 sequential steps)

Step 1 — Generate signed URL + create CallLog

Step 2 — Upload to S3

Step 3 — Update CallLog with permanent URL

Step 4 — Request transcription

C. Backend Transcription Pipeline

D. WebSocket: Receiving the Transcription Result

E. Transcription Text Auto-Append

F. RecordedAudio State Machine

Flow 3: Saving the Note (Combining Everything)

Pre-save Validation

Save Sequence

Backend Processing

Offline Support (Reference)

Quick Reference: API Endpoints

Audio Notes

File Attachments

Note (Draft) Creation

Architecture Diagram

Notes, Voice Notes & Attachments — Upload Architecture ​

Overview: What is a "Note"? ​

Key Concept: Two Separate Upload Paths ​

Flow 1: File Attachments (images, PDFs, documents) ​

Step 1 — Request upload URL ​

Step 2 — Upload directly to S3 ​

Step 3 — Confirm upload ​

Frontend preprocessing (images) ​

Flow 2: Audio Notes (Voice Recordings) + Transcription ​

A. Recording ​

B. Upload + Transcription Request (4 sequential steps) ​

Step 1 — Generate signed URL + create CallLog ​

Step 2 — Upload to S3 ​

Step 3 — Update CallLog with permanent URL ​

Step 4 — Request transcription ​

C. Backend Transcription Pipeline ​

D. WebSocket: Receiving the Transcription Result ​

E. Transcription Text Auto-Append ​

F. RecordedAudio State Machine ​

Flow 3: Saving the Note (Combining Everything) ​

Pre-save Validation ​

Save Sequence ​

Backend Processing ​

Offline Support (Reference) ​

Quick Reference: API Endpoints ​

Audio Notes ​

File Attachments ​

Note (Draft) Creation ​

Architecture Diagram ​

Notes, Voice Notes & Attachments — Upload Architecture

Overview: What is a "Note"?

Key Concept: Two Separate Upload Paths

Flow 1: File Attachments (images, PDFs, documents)

Step 1 — Request upload URL

Step 2 — Upload directly to S3

Step 3 — Confirm upload

Frontend preprocessing (images)

Flow 2: Audio Notes (Voice Recordings) + Transcription

A. Recording

B. Upload + Transcription Request (4 sequential steps)

Step 1 — Generate signed URL + create CallLog

Step 2 — Upload to S3

Step 3 — Update CallLog with permanent URL

Step 4 — Request transcription

C. Backend Transcription Pipeline

D. WebSocket: Receiving the Transcription Result

E. Transcription Text Auto-Append

F. RecordedAudio State Machine

Flow 3: Saving the Note (Combining Everything)

Pre-save Validation

Save Sequence

Backend Processing

Offline Support (Reference)

Quick Reference: API Endpoints

Audio Notes

File Attachments

Note (Draft) Creation

Architecture Diagram