Using Audio Transformations to Improve Comprehension in Voice Question Answering


Many popular form factors of digital assistants—such as Amazon Echo or Google Home—enable users to converse with speech-based systems. The lack of screens presents unique challenges. To satisfy users’ information needs, the presentation of answers has to be optimized for voice-only interactions. We evaluate the usefulness of audio transformations (i.e., prosodic modifications) for voice-only question answering. We introduce a crowdsourcing setup evaluating the quality of our proposed modifications along multiple dimensions corresponding to the informativeness, naturalness, and ability of users to identify key parts of the answer. We offer a set of prosodic modifications that highlight potentially important parts of the answer using various acoustic cues. Our experiments show that different modifications lead to better comprehension at the expense of slightly degraded naturalness of the audio.

CLEF'19 Proceedigns of the Conference and Labs of the Evaluation Forum