Abstract This chapter describes an information-modeling and integration of an embedded audio-visual speech recognition system, aimed at improving speech recognition under adverse automobile noisy environment. In particular, we employ lip-reading as an added feature for enhanced speech recognition. Lip motion feature is extracted by active shape models and the corresponding hidden Markov models are constructed for lip-readinglip-reading . For realizing efficient hidden Markov models, tied-mixture technique is introduced for both visual and acoustical information. It makes the model structure simple and small while maintaining suitable recognition performance. In decoding process, the audio-visual information is integrated into the state output probabilities of hidden Markov model as multistream featuresmultistream features . Each stream is weighted according to the signal-to-noise ratio so that the visual information becomes more dominant under adverse noisy environment of an automobile. Representative experimental results demonstrate that the audio-visual speech recognition system achieves promising performance in adverse noisy condition, making it suitable for embedded devices.


    Access

    Check access

    Check availability in my library

    Order at Subito €


    Export, share and cite



    Design of Audio-Visual Interface for Aiding Driver's Voice Commands in Automotive Environment

    Kim, K. / Jeon, C. / Park, J. et al. | British Library Conference Proceedings | 2009


    Apparatus and method for processing commands by recognizing driver's voice and lips

    LEE CHANG MOK | European Patent Office | 2023

    Free access


    DRIVER'S VISUAL RANGE DETECTION

    SKAAR, D. E. / TENNANT, J. A. / FRENK, R. S. | SAE Technical Papers | 1972


    Three Dimensional Heads-up Display Unit Including Visual Context for Voice Commands

    SISBOT EMRAH AKIN | European Patent Office | 2017

    Free access