Speaker Identification and Verification Using Convolutional Neural Network CNN
Main Article Content
Abstract
Speaker identification and verification are important fields contributing to smart IoT, phone banking, remote login services, E-learning, and other applications. In this work, the speaker identification and verification processes have been experimentally proven to have mutual enhancement effects if they are merged together in a proper manner. Speaker identification and verification work cooperatively so that the verifier will enhance the identifier model. The first step is to identify the speaker using context – independent speech signal, and the identifier model (ID) is trained using a classification model. The model’s outputs are then used to control the verification process as a next step. When the verification result is positive, the first process outcome is approved with high confidence. Otherwise, the negative verification will force the ID process to re-configure itself. The loop continues until both verification and ID agree on the speaker. A multiple Gaussian mixture GMM was used to efficiently model each person’s speech features (MFCC) for using expectation maximization (EM). On the other hand, the conducted experiments showed that the one-dimensional convolutional neural network (1D-CNN) proved its superiority over other models for speaker identification. A novel approach was proposed, proving that little data can be expanded with split-add-noise and train-on-the-fly procedures. In many speaker identification approaches, the specific context was used as a keyword or a password to simplify the processing, requiring big data to achieve high accuracy. It is noteworthy that a small amount of data was enough to efficiently train the proposed model, with a verification error of around 3%, i.e., an accuracy of 97%. Meanwhile, 95% and 96% identification accuracy was achieved using two different datasets. Additionally, the suggested algorithm did not imply using any keyword or password because it is a context-independent approach.
Metrics
Article Details

This work is licensed under a Creative Commons Attribution 4.0 International License.
THIS IS AN OPEN ACCESS ARTICLE UNDER THE CC BY LICENSE http://creativecommons.org/licenses/by/4.0/
Plaudit
References
Xue Y. Multi-Label Training for Text-Independent Speaker Identification. arXiv preprint arXiv 2022 Nov 14.
Mohammadi M, Mohammadi HRS. Weighted X-Vectors for Robust Text-Independent Speaker Verification with Multiple Enrollment Utterances. Circuits, Systems, and Signal Processing 2022; 41(5):2825-2844. DOI: https://doi.org/10.1007/s00034-021-01915-2
Nagakrishnan R, Revathi A. Generic Speech-Based Person Authentication System with Genuine and Spoofed Utterances: Different Feature Sets and Models. Multimedia Tools and Applications 2022; 1:1-30. DOI: https://doi.org/10.1007/s11042-021-11365-2
Gaurav, Bhardwaj S, Agarwal R. An Efficient Speaker Identification Framework Based on Mask R-CNN Classifier Parameter Optimized Using Hosted Cuckoo Optimization (HCO). Journal of Ambient Intelligence and Humanized Computing 2022; 5:1-3. DOI: https://doi.org/10.1007/s12652-022-03828-7
Shareef SRS, Al-Irhayim YFM. Comparison Between Features Extraction Techniques for Impairments Arabic Speech. Al-Rafidain Engineering Journal 2022; 27(2):190-197. DOI: https://doi.org/10.33899/rengj.2022.132977.1160
Monir M et al. Cancelable Speaker Identification Based on Cepstral Coefficients and Comb Filters. International Journal of Speech Technology 2022; 25(2):471-492. DOI: https://doi.org/10.1007/s10772-021-09804-4
Karthikeyan V. Adaptive Boosted Random Forest-Support Vector Machine Based Classification Scheme for Speaker Identification. Applied Soft Computing 2022; 131:109826. DOI: https://doi.org/10.1016/j.asoc.2022.109826
Hamsa S et al. Speaker Identification from Emotional and Noisy Speech Using Learned Voice Segregation and Speech VGG. Expert Systems with Applications 2023; 224:119871.
AL-Shakarchy ND, Obayes HK, Abdullah ZN. Person Identification Based on Voice Biometric Using Deep Neural Network. International Journal of Information Technology 2023; 15(2):789-795. DOI: https://doi.org/10.1007/s41870-022-01142-1
Jahangir R et al. Text-Independent Speaker Identification Through Feature Fusion and Deep Neural Network. IEEE Access 2020; 8:32187-32202. DOI: https://doi.org/10.1109/ACCESS.2020.2973541
Shafik A et al. Speaker Identification Based on Radon Transform and CNNs in the Presence of Different Types of Interference for Robotic Applications. Applied Acoustics 2021; 177:107665. DOI: https://doi.org/10.1016/j.apacoust.2020.107665
Sidorov M et al. Survey of Automated Speaker Identification Methods. Proceedings of the 9th International Conference on Intelligent Environments 2013; 236-239. DOI: https://doi.org/10.1109/IE.2013.31
Dey N. Applied Speech Processing: Algorithms and Case Studies. Academic Press; 2021.
Dawood A et al. Simulation of Multimedia Data Transmission Over WSN Based on MATLAB/SIMULINK. International Journal of Computing and Digital Systems 2023; 14(1):147-157. DOI: https://doi.org/10.12785/ijcds/140114
Shi Y, Huang Q, Hain T. H-VECTORS: Improving the Robustness in Utterance-Level Speaker Embeddings Using a Hierarchical Attention Model. Neural Networks 2021; 42:329-339. DOI: https://doi.org/10.1016/j.neunet.2021.05.024
Alshaykha AM. E-Learning Visual Design Elements of User Experience Perspective. Tikrit Journal of Engineering Sciences 2022; 29(1):111-118. DOI: https://doi.org/10.25130/tjes.29.1.9
Mokgonyane TB et al. A Cross-Platform Interface for Automatic Speaker Identification and Verification. Proceedings of the International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems 2021; 1-6. DOI: https://doi.org/10.1109/icABCD51485.2021.9519322
Wang X et al. A Network Model of Speaker Identification with New Feature Extraction Methods and Asymmetric BLSTM. Neurocomputing 2020; 403:167-181. DOI: https://doi.org/10.1016/j.neucom.2020.04.041
Lawson A et al. Survey and Evaluation of Acoustic Features for Speaker Recognition. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing 2011; 5444-5447. DOI: https://doi.org/10.1109/ICASSP.2011.5947590
Antony A, Gopikakumari R. Speaker Identification Based on Combination of MFCC and UMRT Based Features. Procedia Computer Science 2018; 143:250-257. DOI: https://doi.org/10.1016/j.procs.2018.10.393
Shahin I, Nassif AB, Bahutair M. Emirati-Accented Speaker Identification in Each of Neutral and Shouted Talking Environments. International Journal of Speech Technology 2018; 21:265-278. DOI: https://doi.org/10.1007/s10772-018-9502-0
Yadav S, Rai A. Learning Discriminative Features for Speaker Identification and Verification. Proceedings of Interspeech 2018; 2237-2241. DOI: https://doi.org/10.21437/Interspeech.2018-1015
Chakroun R, Frikha M. Improved Text-Independent Speaker Identification and Verification with Gaussian Mixture Models. 12th International Conference on Knowledge Science, Engineering and Management 2019; 3-10. DOI: https://doi.org/10.1007/978-3-030-29563-9_1
Pawar MD, Kokate R. A Robust Wavelet Based Decomposition and Multilayer Neural Network for Speaker Identification. 7th Innovations in Electronics and Communication Engineering 2019; 197-209. DOI: https://doi.org/10.1007/978-981-13-3765-9_21
Sekkate S, Khalil M, Adib A. Speaker Identification for OFDM-Based Aeronautical Communication System. Circuits, Systems, and Signal Processing 2019; 38(8):3743-3761. DOI: https://doi.org/10.1007/s00034-019-01026-z
Bunrit S et al. Text-Independent Speaker Identification Using Deep Learning Model of Convolution Neural Network. International Journal of Machine Learning and Computing 2019; 9(2):143-148. DOI: https://doi.org/10.18178/ijmlc.2019.9.2.778
Shi Y, Huang Q, Hain T. Improving Noise Robustness in Speaker Identification Using a Two-Stage Attention Model. arXiv preprint arXiv 2019 Sep 24.
Nicolson A, Paliwal KK. Sum-Product Networks for Robust Automatic Speaker Identification. arXiv preprint arXiv 2019 Oct 26. DOI: https://doi.org/10.21437/Interspeech.2020-1501
Tiwari V et al. Speaker Identification Using Multi-Modal I-Vector Approach for Varying Length Speech in Voice Interactive Systems. Cognitive Systems Research 2019; 57:66-77. DOI: https://doi.org/10.1016/j.cogsys.2018.09.028
Wilkinghoff K. On Open-Set Speaker Identification With I-Vectors. Proceedings of Odyssey 2020; 408-414. DOI: https://doi.org/10.21437/Odyssey.2020-58
Roumiassa F, Chelali FZ. Speaker Identification and Verification System for Arabic and Berber Language. 1st International Conference on Communications, Control Systems and Signal Processing 2020; 242-247. DOI: https://doi.org/10.1109/CCSSP49278.2020.9151633
Benesty J, Sondhi MM, Huang Y (Eds). Springer Handbook of Speech Processing. Springer; 2008; 1. DOI: https://doi.org/10.1007/978-3-540-49127-9_1
Dawood AAM, Abdulaziz AS, Mohammed AJ. RLC-Based Image Compression Using Wavelet Decomposition with Zero-Setting of Unnecessary Sub-Bands. Journal of Engineering Science and Technology 2022; 17(1):391-403.
Reynolds DA. Gaussian Mixture Models. Encyclopedia of Biometrics 2009; 659-663. DOI: https://doi.org/10.1007/978-0-387-73003-5_196
Mahmood MS, Al Dabagh NB. Improving IoT Security Using Lightweight Based Deep Learning Protection Model. Tikrit Journal of Engineering Sciences 2023; 30(1):119-129. DOI: https://doi.org/10.25130/tjes.30.1.12
Divya V et al. Improving EEG Electrode Sensitivity with Graphene Nano Powder and Neural Network for Schizophrenia Diagnosis. Tikrit Journal of Engineering Sciences 2023; 30(1):84-93. DOI: https://doi.org/10.25130/tjes.30.1.8
Hamsa S et al. Speaker Identification from Emotional and Noisy Speech Using Learned Voice Segregation and Speech VGG. Expert Systems with Applications 2023; 224:119871. DOI: https://doi.org/10.1016/j.eswa.2023.119871
Pentapati HK. Enhancement in Speaker Identification Through Feature Fusion Using Advanced Dilated Convolution Neural Network. International Journal of Electrical and Computer Engineering Systems 2023; 14(3):301-310. DOI: https://doi.org/10.32985/ijeces.14.3.8