Before You Begin: How Does Transcription Work?
Modjo automatically transcribes your calls after they are imported. This process includes two steps:
Transcription: converting audio to text
Identification: identifying and assigning voices to each participant
💡 Good to know: Voice attribution is more reliable when each participant speaks at the beginning of the call. A quick round of introductions at the start helps Modjo distinguish between voices.
Case 1: The transcription doesn't appear at all
What you see: the message "⚡️ Start listening to this call while it is being transcribed ⚡️" is displayed, or the transcription is empty.
Step 1 – Check the wait time
Transcription may take a few minutes after the call ends. If your call just ended, wait 5 to 10 minutes then refresh the page.
⏱️ If the delay has passed and the transcription still doesn't appear, move on to Step 2.
Step 2 – Check that the audio is audible
Open the call in Modjo and check:
The audio player starts correctly
The sound is audible
There are at least two participants speaking from the beginning of the call
If the issue persists after these steps:
📨 Contact Modjo support via chat providing:
The URL of the call in Modjo
A description of the problem (no sound, empty transcription, incorrect duration)
Case 2: The transcription is present but participants are incorrectly identified
How does Modjo identify participants?
Before diagnosing a problem, here is how identification works depending on the call type:
Video conferencing (Zoom, Teams, Meet…) Participants are identified by the name displayed in the application. Modjo retrieves this information directly from the calendar invitation. Over time, Modjo memorises the voices and numbers associated with registered users to automatically refine identification.
Phone calls — stereo audio Audio is recorded on two separate channels: one channel for the Modjo user, one channel for the client. Attribution is therefore reliable as long as your VOIP system records both channels distinctly.
Phone calls — mono audio Some VOIP systems merge both voices into a single channel. In this case, Modjo applies diarisation: it separates the voices, identifies the vocal fingerprint of the registered Modjo user, and attributes the other channel to the client.
💡 Limitation to be aware of: if several people are in the same room using a single device, they will appear under a single track, identified by the person who initiated the call. It is not possible to distinguish them individually (except for Zoom Rooms).
2a – Voices are swapped or incorrectly attributed
Modjo has detected several participants correctly, but the names do not match the right voices.
How to fix: you can swap participants directly from the call page, without contacting support.
2b – A participant is missing from the list
Modjo only detects people present in the calendar invitation. A participant not included in the invitation or who has not accepted the event on their calendar cannot be automatically linked.
Two specific cases:
A participant joined by phone number: they will appear with their partial or full number, not their name. Modjo will associate them with their profile once this number is known in the database.
Several people in the same room on a single device: they will be grouped under a single track, that of the person who initiated the call.
📨 If you encounter this problem, contact Modjo support via chat providing:
The URL of the call in Modjo
A description of the problem: (specify who the missing person is and who they were attributed to)
Case 3: Topics Are Not Detected
Two possible reasons:
No subject was identified in the conversation: Topics are generated from the keywords and categories you have configured. If no configured keyword was spoken, no Topic appears.
The transcription rendered the word differently: the algorithm may interpret a word spoken quickly or with a strong accent in a different way. For example, "Thomas" spoken slowly may be transcribed as "Tom as", and will therefore not match the Topic "Thomas" you have defined.
💡 Tip: if an important Topic is never detected, check in your settings that spelling or phonetic variants of the word are properly added.
👉 Article on topics here
Case 4: The Call Language Is Incorrectly Identified
Modjo has an automatic language detection algorithm for each call. In the vast majority of cases, this detection is reliable, but it can happen that the identified language is incorrect.
Two main reasons:
No audio: the recording contains no sound, the algorithm cannot analyse the language.
Poor audio quality: significant background noise disrupts detection. For example, if all participants are in the same room with a speaker, the algorithm may identify the wrong language even if the conversation took place normally.
💡 The source of the problem is almost always linked to the quality of the audio input. The clearer the audio and the more distinct the speakers, the more precise the detection.
Example: The following call took place in French but was identified as English
📨 If the language is incorrectly identified on a call: Contact Modjo support via chat providing:
The URL of the call in Modjo
The actual language of the call
A description of the audio context (speaker, background noise, several people in the same room, etc.)





