September 11, 2012 by Morgan Pulitzer - Last Updated: June 05, 2019

There are three types of technology used to power speech analytics: speech-to-text, phonetics-based, and direct phrase recognition.

1. Speech-to-text technology is based on a large vocabulary continuous speech recognition (LVCSR) engine, which translates audio recordings into searchable text. Speech-to-text is dependent upon a language model and dictionary to identify words correctly.

2. Phonetics-based speech analytics solutions scan call recordings to identify the string of phonemes (the smallest units of sound that make up language) that match the search phrases as defined by the user. The phonetics-based technology does not require a language model or dictionary, mostly because the size of the grammar is very small. There are only a few tens of unique phonemes in most languages, and the output of this recognition is a stream (text) of phonemes, which is then searched by the application.

3. Direct phrase analytics works by analyzing speech, looking for specific phrases that have been pre-defined in the application.

