Speech-To-Text vs. Phonetics-Based Speech Analytics | CallFinder Blog
Home CallFinder Blog Speech-To-Text vs. Phonetics-Based Speech Analytics

Speech-To-Text vs. Phonetics-Based Speech Analytics

July 08, 2015 by Morgan Pulitzer - Last Updated: November 05, 2020

soundwaves on a computer screen
Over the years, call center technology has changed drastically. And the customer experience has become the main differentiator between many companies and their competitors. What does this mean?

Call centers must employ the latest technology in quality monitoring and employee training. Basic call recording software and manual call monitoring are no longer adequate. Because there are tons of insights that QA managers can glean from daily interactions with customers, manually listening to a handful of calls is both inefficient and incomplete.

Speech analytics technology has improved the call analysis process by leaps and bounds. However, it’s important to note that “speech analytics technology” is an umbrella term that includes many aspects of automated quality monitoring. With the vast number of speech analytics solutions on the market, it’s often difficult for businesses to know where to start when choosing the right one.

One place to start is to compare the technologies behind speech analytics solutions. The solution can be vocabulary/dictionary-based (speech-to-text) or phonetics-based. Let’s take a closer look at each.

Speech-To-Text Engines in Speech Analytics

Speech-to-text technology is based on a large vocabulary continuous speech recognition (LVCSR) engine, which translates audio recordings into searchable text. This type of engine is dependent upon a provided language model and dictionary to identify words correctly. The audio processing for speech-to-text engines searches the entire dictionary for the exact match to provide the complete semantic context.

Phonetics-Based Speech Analytics Engines

Phonetic searches use phonemes—the smallest units of sound—to identify keywords and phrases. It scans the recordings and uses the original audio files to identify the string of phonemes that match the search criteria. This means that a phonetics-based engine can detect slang and accents (Southern accents vs. Northern accents, for example).

This technology does not require a language model or dictionary. Instead, the recognition unit is a phoneme. Although audio processing may be quicker, the search process may be slightly slower because it requires manual action to ensure the words are accurate.

Because the recognition unit in each engine is different, the two technologies cannot be easily compared. It’s important to remember that each approach has its own benefits, and every company has different needs (and different budgets). While a phonetics-based engine may be right for some, others may benefit more from the results provided by a speech-to-text engine.

To learn more about speech analytics technology, visit our resource center for videos, white papers, case studies, and more.