- Volume 13 Issue 3
We describe in this paper a hardware-based improvement scheme of a real-time automatic speech recognition (ASR) system with respect to speed by designing a parallel feature extraction algorithm on a Field-Programmable Gate Array (FPGA). A computationally intensive block in the algorithm is identified implemented in hardware logic on the FPGA. One such block is mel-frequency cepstrum coefficient (MFCC) algorithm used for feature extraction process. We demonstrate that the FPGA platform may perform efficient feature extraction computation in the speech recognition system as compared to the generalpurpose CPU including the ARM processor. The Xilinx Zynq-7000 System on Chip (SoC) platform is used for the MFCC implementation. From this implementation described in this paper, we confirmed that the FPGA platform is approximately 500× faster than a sequential CPU implementation and 60× faster than a sequential ARM implementation. We thus verified that a parallelized and optimized MFCC architecture on the FPGA platform may significantly improve the execution time of an ASR system, compared to the CPU and ARM platforms.
ARM;Feature extraction;FPGA;MFCC;Automatic speech recognition;Zynq
- D. Huggins-Daines, M. Kumar, A. Chan, A. Black, M. Ravishankar, and A. Rudnicky, "PocketSphinx: a free, real-time continuous speech recognition system for hand-held devices," in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2006), Toulouse, France, 2006.
- Spiral project: DFT/FFT IP core generator [Internet]. Available: http://www.spiral.net/hardware/dftgen.html.
- H. Kou, W. Shang, I. Lane, and J. Chong, "Optimized MFCC feature extraction on GPU," in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, Canada, pp. 7130-7134, 2013.
- M. Staworko and M. Rawski, "FPGA implementation of feature extraction algorithm for speaker verification," in Proceedings of the 17th International Conference on Mixed Design of Integrated Circuits and Systems (MIXDES), Warsaw, Poland, pp. 557-561, 2010.
- J. C. Wang, J. F. Wang, and Y. S. Weng, “Chip design of MFCC extraction for speech recognition,” Integration, the VLSI Journal, vol. 32, no. 1, pp. 111-131, 2002. https://doi.org/10.1016/S0167-9260(02)00045-7
- M. Bahoura and H. Ezzaidi, "Hardware implementation of MFCC feature extraction for respiratory sounds analysis," in Proceedings of 8th Workshop on Systems, Signal Processing and their Applications (WoSSPA), Algeria, pp. 226-229, 2013.
- E. M. Schmidt, K. West, and Y. E. Kim, "Efficient acoustic feature extraction for music information retrieval using programmable gate arrays," in Proceedings of 10th International Society for Music Information Retrieval Conference (ISMIR2009), Kobe, Japan, pp. 273-278, 2009.
- S. Ke, Y. Hou, Z. Huang, and H. Li, "A HMM speech recognition system based on FPGA," in Proceedings of Congress on Image and Signal Processing (CISP2008), Sanya, China, pp. 305-309, 2008.
- K. You, H. Lim, and W. Sung, "Architecture design and implementation of an FPGA softcore based speech recognition system," in Proceedings of IEEE International Workshop on System-on-Chip for Real-Time Application (IWSOC), Cairo, Egypt, pp. 50-55, 2006.
- MFCC project: c-based algorithm of MFCC [Internet], Available: https://code.google.com/p/mfcc-umbc/wiki/MFCCIntro.
- S. T. Pan, C. F. Chen, and J. H. Zeng, "Speech recognition via Hidden Markov Model and neural network trained by genetic algorithm," in Proceedings of International Conference on Machine Learning and Cybernetics (ICMLC), Qingdao, China, pp. 2950-2955, 2010.
- M. Mohri, F. Pereira, and M. Riley, “Speech recognition with weighted finite-state transducers,” in Springer Handbook of Speech Processing. Berlin: Springer, pp. 559-584, 2008.
- J. G. Kim, H. Y. Junh, and H. Y. Chung, “The improvement of the Korean Speech recognition systems using MEL-LPC analysis method,” Journal of the Institute of Information and Telecommunication, vol. 9, no. 1, pp. 65-70, 2002.
- K. You, J. Chong, Y. Yi, E. Gonina, C. J. Hughes, Y. K. Chen, W. Sung, and K. Keutzer, “Parallel scalability in speech recognition,” IEEE Signal Processing Magazine, vol. 26, no. 6, pp. 124-135, 2009. https://doi.org/10.1109/MSP.2009.934124
- D. Huggins-Daines and A. Rudnicky, "Mixture pruning and roughening for scalable acoustic models," in Proceedings of ACL Workshop on Mobile Language Technologies, Columbus, OH, pp. 21-24, 2008.