With the widespread application and growing popularity of speech technology, the field of speech emotion recognition has garnered significant attention in scientific research. This study presents a novel approach for speech emotion recognition using a 1D convolutional neural networks. To ensure a smooth and efficient experimental process, the system initially undergoes preprocessing of the speech emotional sample files. This preprocessing involves resizing the files to a fixed length and increasing the sample size through data augmentation techniques. Subsequently, the system extracts the Mel frequency cepstral coefficients from each speech sample and feeds the resulting feature vector into the 1D convolutional neural networks for training. The system outputs the category of each speech sample, achieving effective emotion recognition. The performance evaluation of the proposed system is conducted on two publicly available datasets, namely CREMA-D and RAVDESS. The experimental results indicate that the proposed system achieves an impressive average accuracy of 94.69% for CREMA-D with 6 emotion classes and 97.33% for RAVDESS with 8 emotion classes after data augmentation. These results demonstrate that our proposed framework outperforms other competitive models in speech emotion recognition on both datasets comprehensively.
Speech Emotion Recognition Based on 1D CNN and MFCC
2023-10-11
3865837 byte
Aufsatz (Konferenz)
Elektronische Ressource
Englisch
Speech-Based Driver Emotion Recognition
TIBKAT | 2023
|Speech-Based Driver Emotion Recognition
Springer Verlag | 2022
|Acoustic Target Recognition Based on MFCC and SVM
Springer Verlag | 2022
|Europäisches Patentamt | 2019
|