An utterance is a unit of speech. It is usually from the start of speaking until a significant pause. In speech recognition, speech audio is segmented into utterances based on the level and duration of voice activity.
An Utterance object is the decoded data for a given audio utterance. It contains a time-ordered list of the most likely decoded events as well as the speech recognition lattice.
Each event contains the decoded word with start and end time stamps, and the system’s confidence that the entry is the correct decoding.
- class client.utterance.Utterance(metadata=None, lattice=None)¶
An Utterance object holds decoded data, consisting of the time-ordered list of words, events and the associated client.lattice.Lattice.
- words¶
- list of decoded words.
- events¶
- time ordered list of UtteranceEvent objects.
- metadata¶
- dictionary of tags associated with this utterance.
- start¶
- time stamp of first contained event.
- end¶
- time stamp of last contained event.
- text()¶
- Return the text string for this utterance.
- id()¶
- Return the id string of this utterance in the form <source>[c<channel id>][u<utterance number>].
- merge(*utts)¶
- Merge this utterance with given utterances. It is more efficient to merge multiple utterances at the same time instead of one by one.