The utterance Module

An utterance is a unit of speech. It is usually from the start of speaking until a significant pause. In speech recognition, speech audio is segmented into utterances based on the level and duration of voice activity.

An Utterance object is the decoded data for a given audio utterance. It contains a time-ordered list of the most likely decoded events as well as the speech recognition lattice.

Each event contains the decoded word with start and end time stamps, and the system’s confidence that the entry is the correct decoding.

Utterance Objects

class client.utterance.Utterance(metadata=None, lattice=None)

An Utterance object holds decoded data, consisting of the time-ordered list of words, events and the associated client.lattice.Lattice.

words
list of decoded words.
events
time ordered list of UtteranceEvent objects.
lattice
decoded speech Lattice object.
metadata
dictionary of tags associated with this utterance.
start
time stamp of first contained event.
end
time stamp of last contained event.
text()
Return the text string for this utterance.
id()
Return the id string of this utterance in the form <source>[c<channel id>][u<utterance number>].
merge(*utts)
Merge this utterance with given utterances. It is more efficient to merge multiple utterances at the same time instead of one by one.

UtteranceEvents Objects

class client.utterance.UtteranceEvent

Describes a single utterance event. Each audio event has the following attributes:

word
The word for this event.
start
The start event time stamp.
end
The end event time stamp.
confidence
System’s level of confidence in this entry’s correctness.

Module Quick Links

Table Of Contents

Previous topic

The speechdata Module

Next topic

The scanner Module

This Page