Voice Communications

A Turning Point

April 14, 20255 min read

I recently spent time in London, engaging in thought-provoking roundtable discussions about the latest trends in surveillance, the challenges facing conduct and culture agendas in financial services, and numerous one-on-one coffee meetings that revealed what's truly happening within compliance teams across the city.

One topic that has notably evolved since my deep dive last January is voice communications surveillance, particularly the application of AI in this space.

From Skepticism to Optimism

Last year, industry conversations were dominated by pessimism. Technology vendors had overpromised and underdelivered. The AI solutions didn't meet senior management's expectations for cost savings within their desired timeframes. More than anything, there was a palpable sense of fatigue with the lack of progress.

This sentiment has shifted significantly. London was buzzing with success stories of firms finally turning the corner and seeing tangible results from their technology investments. There's been a liberating move away from seeking a single tool to solve all problems. Instead, firms are now adopting best-in-class tools for specific challenges while leveraging internal development to achieve better results tailored to their unique risks and data.

Most teams are implementing a hybrid approach to electronic communications surveillance—a strategy many have been developing over the past year or two. The difference now is that firms can demonstrate this approach is working for them, with declining false positive rates and improved risk detection.

Transcription and Translation: The Energy Centres

The most energizing conversations centered around transcription and translation capabilities. The insights from the industry and the results being achieved are worth sharing.

Voice-to-Text Transcription

Increased scrutiny on voice surveillance coverage has prompted firms to intensify their efforts in this area for several years. Initially, they faced the typical challenges of establishing a new surveillance program:

  • Data sets lacked sufficient quality, availability, and accuracy

  • Available technology couldn't perform surveillance efficiently enough for teams to manage the output

  • Detection capabilities weren't effective enough to identify actual risk

As remediation efforts build voice coverage programs, data capture and archives are improved, data lineage standards are applied, and technology continues to develop. The industry has largely concluded that the optimal approach for voice surveillance is to utilize voice-to-text transcription, allowing voice communications to leverage existing electronic communications tools and programs for detection, with some retraining needed for this communication type.

This approach makes sense if your transcription quality enables acceptable detection processes in the electronic communications format. During my deep dive’s in 203 and last year and roundtables in early 2024, this was a major frustration point for many. Transcription accuracy wasn't reaching the levels needed for high-quality surveillance review.

This year, I've heard reports of improvements in transcription accuracy and some with remarkable results. With smiles, practitioners mentioned consistent accuracy rates in the high 90% range for voice-to-text conversion. Several vendors have addressed issues with their third-party transcription service providers, while others have built in-house solutions to deliver sufficient quality. The result appears to be a significant leap forward in confidence across the market regarding transcription reliability.

Bridging Transcription and Translation

Before exploring multilingual coverage challenges, it's worth noting that many firms have opted to adapt electronic communications surveillance strategies for voice coverage by converting voice files into text.

Depending on your electronic communications surveillance tool, this may be an English-only strategy, meaning all non-English voice communications need translation (along with all written communications). This is where translation AI becomes the next critical solution for voice surveillance efforts.

Multilingual Surveillance Coverage

This has been one of surveillance's most persistent challenges. The fundamental reality is that very few international financial services firms operate exclusively in English. People conducting business locally generally communicate, at least partially, in their local language. Where this isn't English, communications in other languages naturally occur. This makes perfect business sense and is often essential for relationship-building and promoting business activity. Surveillance must work across these languages, regardless of controls implemented to limit it.

Traditionally, surveillance teams needed detection capabilities in each language, requiring dedicated lexicons (or models) for each. They needed teams of linguists or access to compliance personnel with relevant language skills to review communications. They lacked strategies for blended communications where multiple languages appear in a single conversation. They also faced limited funding to build comprehensive coverage, especially when a particular language might represent only a small percentage of total communications—perhaps 4%—yet require almost the same effort as the primary English program to build.

Firms have employed various translation tools for years, but accuracy concerns persisted, with even some market-leading technology reporting only 65-75% accuracy a few years ago, varying further depending on the language.

The Compound Results

The industry has long aspired to achieve compound accuracy of transcribed and translated voice communications data high enough to perform effective surveillance. What's new is that real results are now being reported in the market. Not just from a select few with limited language coverage, but from institutions handling a wide range of languages, who report accuracy rates from 85% (for character-based languages) to 95+% (for transcribed Latin-based languages).

This shift over the past year is significant and is driving more conversations between firms and their technology providers and in-house teams, creating a self-perpetuating push for continual refinement.

Some firms haven't reached this point yet—several participants in my discussions were seeking more information or facing ongoing challenges, and many are looking for new technology partners to deliver these results. The notable change is the more hopeful, positive, and less frustrated atmosphere among surveillance leaders and practitioners regarding voice surveillance capabilities.

Looking Forward: Key Questions for 2025

For surveillance practitioners reviewing and seeking to upgrade their programs in 2025, there are some key questions to ask beyond simply which electronic communications surveillance vendor and tool you'll use.

Establish what technology do they use for transcription, and what is their translation capability? This functionality is often outsourced by the main surveillance vendors, however is the foundation of your multilingual voice surveillance strategy. There are market leaders in these two critical areas of AI technology and it will determine the success (or otherwise) of your v-communications program. Expect to assess both of these components carefully and test them separately.

As a consultant who regularly navigates these complex considerations with financial institutions, I understand the challenges of building a robust, future-proof surveillance program. If you're looking to enhance your voice communications surveillance capabilities or would like to discuss how these emerging technologies might integrate with your existing framework, I'd be happy to share further insights.

Reach out for a conversation about your specific needs and how the latest advancements might benefit your organization.

Back to Blog