Skip to content

Notes, Voice Notes & Attachments β€” Upload Architecture ​

Reference implementation: apps/agri-frontend/src/components/field_app/Map/NoteEditorDrawer.tsx and apps/agri-frontend/src/hooks/useAudioRecorder.ts


Overview: What is a "Note"? ​

A Note is a geo-located observation created from the Map view. On the backend, it is stored as a Draft with source: 'map'. There is no separate Note model.

The frontend uses a lightweight type for rendering:

ts
type Note = {
  id: string;
  title: string;
  body: string;
  createdAt: number;
  lat?: number;
  lng?: number;
  status?: 'sending' | 'sent' | 'error';
  attachments?: any[];
};

On the backend, the Draft document stores: name (title), transcript (body), source: 'map', metadata.locations (coordinates), attachments (file references), and callLogIds (audio recording references).


Key Concept: Two Separate Upload Paths ​

Attachments and Audio Notes use completely different upload pipelines and are linked to the note differently.

Attachments (images, PDFs, docs)Audio Notes (voice recordings)
Upload mechanism3-step signed URL via /attachments/upload4-step signed URL via /call-logs/generate-s3-signed-url
Backend modelAttachment collectionCallLog collection (legacy)
Post-upload processingConfirm upload, optional analysisTranscription via OpenAI (BullMQ pipeline)
Link to Note (Draft)attachments[] arraycallLogIds[] array
Real-time feedbackNone neededWebSocket transcription_completed event
Max count5 files per noteUnlimited recordings

Flow 1: File Attachments (images, PDFs, documents) ​

Uses a 3-step signed-URL pattern (no backend proxy, files go directly to S3).

Step 1 β€” Request upload URL ​

http
POST /api/attachments/upload
Content-Type: application/json
Authorization: Bearer <firebase-id-token>

{
  "fileName": "photo.jpg",
  "fileType": "image/jpeg",
  "attachmentType": "image",
  "companyId": "<company-id>",
  "draftId": "<optional>",
  "conversationId": "<optional>",
  "metadata": {
    "gps": { "latitude": 48.856, "longitude": 2.352, "altitude": 35 }
  }
}

Response:

json
{
  "attachment": { "_id": "abc123", "attachmentType": "image", ... },
  "uploadUrl": "https://s3.amazonaws.com/bucket/images/1234_photo.jpg?X-Amz-..."
}

This creates an Attachment record in MongoDB with uploadingStatus: 'uploading'.

Step 2 β€” Upload directly to S3 ​

http
PUT <uploadUrl>
Content-Type: image/jpeg

<raw file bytes>

Step 3 β€” Confirm upload ​

http
POST /api/attachments/<attachment-id>/confirm-upload
Authorization: Bearer <firebase-id-token>

Marks uploadingStatus: 'completed'. For HEIC images, the backend converts them to JPEG.

Frontend preprocessing (images) ​

  • Images larger than 2.5 MB are downscaled to max 1600px on the longest side (JPEG at 82% quality).
  • GPS EXIF data is extracted from the original image and sent as metadata.
  • HEIC/HEIF images are converted client-side when possible.

The returned attachment._id is collected and sent in the note's attachments[] array when saving.


Flow 2: Audio Notes (Voice Recordings) + Transcription ​

This is more complex: the audio is uploaded through a legacy CallLog pipeline and then transcribed asynchronously.

A. Recording ​

The web app uses the browser MediaRecorder API via the useAudioRecorder hook (apps/agri-frontend/src/hooks/useAudioRecorder.ts).

  • Auto-detects MIME type: audio/mp4 > audio/webm;codecs=opus > audio/webm > audio/ogg;codecs=opus
  • Collects chunks every 250ms
  • Creates a File with a timestamped name (e.g. audio-2026-02-26T10-30-00-000Z.m4a)

For mobile: Use the equivalent native recording API (e.g., expo-av or react-native-audio-recorder-player). Output format should be m4a (AAC) or webm (Opus) β€” the backend supports both.

B. Upload + Transcription Request (4 sequential steps) ​

When recording stops, the upload chain runs automatically:

Step 1 β€” Generate signed URL + create CallLog ​

http
POST /api/call-logs/generate-s3-signed-url
Content-Type: application/json
Authorization: Bearer <firebase-id-token>

{
  "fileName": "audio-2026-02-26T10-30-00-000Z.m4a",
  "fileType": "audio/mp4",
  "companyId": "<company-id>",
  "userId": "<user-id>",
  "source": "notes"
}

Important: source: 'notes' maps to CallSource.TRANSCRIBE_ONLY β€” the audio will be transcribed but NOT parsed into structured records. Use 'file_upload' for the dashboard flow that also runs AI parsing.

Response:

json
{
  "_id": "calllog-abc123",
  "signedUrl": "https://s3.amazonaws.com/bucket/audio/1234_audio.m4a?X-Amz-..."
}

Step 2 β€” Upload to S3 ​

http
PUT <signedUrl>
Content-Type: audio/mp4

<raw audio bytes>

Step 3 β€” Update CallLog with permanent URL ​

http
PUT /api/call-logs/calllog-abc123/file-url
Content-Type: application/json
Authorization: Bearer <firebase-id-token>

{
  "fileUrl": "https://s3.amazonaws.com/bucket/audio/1234_audio.m4a"
}

The permanent URL is the signed URL without query parameters (strip everything after ?).

Step 4 β€” Request transcription ​

http
POST /api/transcription
Content-Type: application/json
Authorization: Bearer <firebase-id-token>

{
  "audioUrl": "https://s3.amazonaws.com/bucket/audio/1234_audio.m4a",
  "userId": "<user-id>",
  "callLogId": "calllog-abc123",
  "companyId": "<company-id>",
  "conversationId": "<optional>"
}

After this call, the audio status becomes transcribing.

C. Backend Transcription Pipeline ​

The transcription runs asynchronously through 3 BullMQ queues:

POST /api/transcription
    β”‚
    └─> callLogsService.queueAudioProcessing()
        β”‚
        β–Ό
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚  asset.acquire    β”‚  Downloads from S3, uploads to GCS for archival
   β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚  Fan-out to 3 transcription engines (in parallel)
          β”‚
   β”Œβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β–Ό      β–Ό              β–Ό
 openai  whisper-1     qwen3
(gpt-4o) (whisper-1)   (qwen3)
   β”‚      β”‚              β”‚
   β””β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β–Ό
   TranscriptionService.save()  (deduplicated via SHA256)
          β”‚
          β–Ό
   ChatGateway.emitTranscriptionCompleted()  (WebSocket)

Three models run in parallel for accuracy. Results are deduplicated by SHA256 hash of the transcription text.

D. WebSocket: Receiving the Transcription Result ​

When transcription finishes, the backend emits a Socket.io event scoped to the user's room:

Event name: 'transcription_completed'
Room: 'user:<userId>'

Payload: {
  callLogId: string;
  status: 'completed' | 'ai-processed' | 'ai-processing-failed' | 'error';
  transcript?: string;    // the transcribed text
  error?: string;
}

For mobile: Connect to the Socket.io server and listen for transcription_completed. Match incoming events by callLogId to update the correct recording.

A successful transcription has status === 'completed' or status === 'ai-processed'.

E. Transcription Text Auto-Append ​

In the web app, when a transcription completes, the onTranscribed callback fires and the transcript text is appended to the note's description automatically. This lets users dictate notes hands-free and see text appear in real-time.

F. RecordedAudio State Machine ​

Each audio recording goes through these states:

idle β†’ recording β†’ processing β†’ uploading β†’ transcribing β†’ transcribed
                       β”‚                          β”‚
                       └──→ (cancelled)            └──→ error

The RecordedAudio interface:

ts
interface RecordedAudio {
  id: string;
  fileName: string;
  blobUrl?: string;
  status: 'recording' | 'uploading' | 'transcribing' | 'transcribed' | 'error';
  error?: string;
  permanentUrl?: string;
  callLogId?: string; // set after Step 1, used to match WebSocket events
  audioFile?: File;
  transcriptionText?: string; // set when transcription completes
}

Flow 3: Saving the Note (Combining Everything) ​

When the user presses "Send", the handleSaveNote function in MapView.tsx orchestrates the final save:

Pre-save Validation ​

  1. Title or description must be non-empty
  2. All audio recordings must be in transcribed or error status (wait for pending transcriptions)
  3. Max 5 file attachments

Save Sequence ​

  1. Upload pending file attachments via uploadMultipleFiles() (the 3-step attachment flow above)
  2. Normalize attachments into { id, type, url, name, metadata } objects
  3. Collect callLogIds from transcribed audio recordings (filter for status === 'transcribed' and truthy callLogId)
  4. Build payload and call the backend:
http
POST /api/drafts/from-map-note
Content-Type: application/json
Authorization: Bearer <firebase-id-token>

{
  "companyId": "<company-id>",
  "title": "Field observation",
  "transcript": "The crop looks healthy in the north section...",
  "attachments": [
    {
      "id": "att-abc123",
      "type": "image",
      "url": "https://s3.../images/photo.jpg",
      "name": "photo.jpg",
      "metadata": { "gps": { "latitude": 48.856, "longitude": 2.352 } }
    }
  ],
  "callLogIds": ["calllog-abc123", "calllog-def456"],
  "location": {
    "type": "Point",
    "coordinates": [2.352, 48.856],
    "properties": {
      "fieldId": "field-xyz",
      "fieldName": "North Parcel",
      "timestamp": "2026-02-26T10:30:00.000Z",
      "label": "Field observation"
    }
  }
}

Backend Processing ​

DraftService.createFromMapNote():

  1. Geo-matches coordinates against the Field collection to find which agricultural field the note belongs to
  2. Fetches full attachment documents from DB by IDs
  3. Creates the Draft with source: 'map', status: 'processing'
  4. Enqueues attachment analysis (BullMQ) for image/document processing

Offline Support (Reference) ​

The web app's VoiceNoteButton component (used in the chat, not in notes) has offline support via offlineQueue.ts:

  • When offline, audio blobs are stored in IndexedDB (using idb-keyval)
  • When the app comes back online, a flush function replays the queue
  • It uses a 5-second delay between uploads to avoid overwhelming the backend

For mobile: Consider implementing a similar queue using AsyncStorage or SQLite for offline note creation.


Quick Reference: API Endpoints ​

Audio Notes ​

StepMethodEndpointPurpose
1POST/api/call-logs/generate-s3-signed-urlCreate CallLog + get signed URL
2PUT<S3 signed URL>Upload audio to S3
3PUT/api/call-logs/:id/file-urlSet permanent URL on CallLog
4POST/api/transcriptionTrigger transcription pipeline
β€”WebSockettranscription_completedReceive transcript result

File Attachments ​

StepMethodEndpointPurpose
1POST/api/attachments/uploadCreate Attachment + get signed URL
2PUT<S3 signed URL>Upload file to S3
3POST/api/attachments/:id/confirm-uploadConfirm upload complete

Note (Draft) Creation ​

MethodEndpointPurpose
POST/api/drafts/from-map-noteCreate the note with attachments + callLogIds + location

Architecture Diagram ​

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        MOBILE APP                               β”‚
β”‚                                                                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ Audio Record β”‚    β”‚ File Picker  β”‚    β”‚ Note Editor       β”‚  β”‚
β”‚  β”‚ (expo-av)   β”‚    β”‚ (images/PDF) β”‚    β”‚ (title + body)    β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚         β”‚                  β”‚                     β”‚              β”‚
β”‚         β–Ό                  β–Ό                     β”‚              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”‚              β”‚
β”‚  β”‚ CallLog Flow β”‚  β”‚ Attachment   β”‚              β”‚              β”‚
β”‚  β”‚ (4 steps)    β”‚  β”‚ Flow (3 step)β”‚              β”‚              β”‚
β”‚  β”‚              β”‚  β”‚              β”‚              β”‚              β”‚
β”‚  β”‚ Returns:     β”‚  β”‚ Returns:     β”‚              β”‚              β”‚
β”‚  β”‚ callLogId    β”‚  β”‚ attachment.idβ”‚              β”‚              β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜              β”‚              β”‚
β”‚         β”‚                  β”‚                     β”‚              β”‚
β”‚         β”‚    WebSocket     β”‚                     β”‚              β”‚
β”‚         │◄── transcript ───│                     β”‚              β”‚
β”‚         β”‚    completed     β”‚                     β”‚              β”‚
β”‚         β”‚                  β”‚                     β”‚              β”‚
β”‚         β–Ό                  β–Ό                     β–Ό              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚              POST /api/drafts/from-map-note              β”‚  β”‚
β”‚  β”‚  { title, transcript, attachments[], callLogIds[],       β”‚  β”‚
β”‚  β”‚    location, companyId }                                 β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        BACKEND                                  β”‚
β”‚                                                                 β”‚
β”‚  CallLog Pipeline          Attachment Pipeline                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                 β”‚
β”‚  β”‚ S3 + GCS       β”‚        β”‚ S3 + HEIC conv  β”‚                 β”‚
β”‚  β”‚ BullMQ queues  β”‚        β”‚ Analysis jobs   β”‚                 β”‚
β”‚  β”‚ 3x transcribe  β”‚        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                 β”‚
β”‚  β”‚ WebSocket emit β”‚                                             β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        Draft (source: 'map')               β”‚
β”‚                            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                 β”‚
β”‚                            β”‚ attachments[]   β”‚ ← Attachment IDsβ”‚
β”‚                            β”‚ callLogIds[]    β”‚ ← CallLog IDs   β”‚
β”‚                            β”‚ location        β”‚ ← GeoPoint      β”‚
β”‚                            β”‚ name / transcriptβ”‚                 β”‚
β”‚                            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜