Skip to main content
Speaker identification
Updated over a week ago

Background

When Modjo analyzes calls, it uses multiple methods to determine who was on the call and when they spoke.

This information is displayed visually on the call page, allowing you to easily navigate to the relevant part of the conversation.

There are two ways to join conference calls:

  • Using their computer > Participants identify themselves using their full name or a nickname.

  • When dialing the number > The conference system usually displays the partial or full phone number of the participant.

Over time, Modjo becomes familiar with the voices and phone numbers used by registered users, allowing for proper identification in the future.

Stereo Phone Calls

When Modjo analyzes phone calls, audio is provided to Modjo in two channels:

One channel includes the recorded Modjo user, and the other channel includes the customer.

In addition, Modjo knows which extension (or, in the more general case, which recorded Modjo user) the call is associated with.

In this case, Modjo associates one side of the call with the recorded Modjo user and the other side with the other party. This ensures maximum accuracy as long as the VOIP system records audio consistently across the channels.

Mono Phone Calls

Some VOIP systems record audio in a single channel. That is, the audio from the two parties is merged into a single channel, and there's no easy way to tell who is speaking on the call.

In this scenario, Modjo applies the following sequence:

  1. First, it separates the audio channel into two channels (a process called, in speech jargon, diarization).

  2. It then tried to assess which of the two channels matches the voiceprint of the recorded Modjo user.

  3. Once identified, it marks that channel as the recorded Modjo user known to be on the call and the other channel as the customer.

Note

There is no restriction in Modjo regarding the number of speakers that can be recognized on a call. However, it is necessary for these individuals to be identified either through the web conferencing platform or their phone number.

In the event that multiple individuals are present in the same physical location and utilizing a single device, they will all be displayed as a single speaker track.

This track will be attributed to the person who initiated the web conferencing application or dialed in via phone, with the exception of Zoom Rooms.

In cases where no other means of identification is available, a voiceprint will be utilized to identify the speaker.

This identification only applies to the host of the meeting.

Did this answer your question?