Automatic Spoken Word Recognition

Genre classification image

Codes:

GITHUB (repo containing all assignments in the course)
Feature Extraction Notebook (This project, Asgn 3)
Word Recognition Notebook (This project, Asgn 3)

Python libraries used: Hmmlearn, Librosa, Soundfile, Pandas, Numpy

This was a course assignment for the course EE 679: Speech Processing. The dataset was a subset of Google Speech Commands Dataset. For the problem statement please refer to this document.

From the audio files of the words, Mel Frequency Cepstral Coefficients (MFCCs) and their first differences (delta terms) and second differences (double delta terms) were computed using Librosa. Using these features, HMM models were trained using the package Hmmlearn. For each of the 10 words, one HMM was trained. To predict the word from an MFCC feature vector, the vecotr was passed to each of the 10 models and the one with the maximum log probability score under the models was chosen as the predicted word from the system.

I got an accuracy of 89% (10-class classification) on clean test data and 81% on noise test data (seven types of noise files were provided as a part of the problem statement). For details of the implementation, please check the comments and codes in the two notebooks linked above.

The Github repository also contains details (problem statements and codes/solutions) of other 2 assignments in the course (Source Filter Model for Vowel Generation, and Linear Predictive Analysis).