Speech Emotion Recognition Based on 1D CNN and MFCC

Sie sind hier: Homepage > Suche

Speech Emotion Recognition Based on 1D CNN and MFCC

Li, Gaoyun / Liu, Yong / Wang, Xiong

With the widespread application and growing popularity of speech technology, the field of speech emotion recognition has garnered significant attention in scientific research. This study presents a novel approach for speech emotion recognition using a 1D convolutional neural networks. To ensure a smooth and efficient experimental process, the system initially undergoes preprocessing of the speech emotional sample files. This preprocessing involves resizing the files to a fixed length and increasing the sample size through data augmentation techniques. Subsequently, the system extracts the Mel frequency cepstral coefficients from each speech sample and feeds the resulting feature vector into the 1D convolutional neural networks for training. The system outputs the category of each speech sample, achieving effective emotion recognition. The performance evaluation of the proposed system is conducted on two publicly available datasets, namely CREMA-D and RAVDESS. The experimental results indicate that the proposed system achieves an impressive average accuracy of 94.69% for CREMA-D with 6 emotion classes and 97.33% for RAVDESS with 8 emotion classes after data augmentation. These results demonstrate that our proposed framework outperforms other competitive models in speech emotion recognition on both datasets comprehensively.