Speaker Identification and Verification Using Convolutional Neural Network CNN

Azhar S. Abdulaziz; Akram Dawood; Amar Daood

doi:10.25130/tjes.32.2.19

PDF

Published: 2025-05-31

DOI: https://doi.org/10.25130/tjes.32.2.19

Keywords:

Artificial intelligence, Biometrics, Digital voice communication, Deep learning, Signal processing, Speaker identification, Speaker verification

Azhar S. Abdulaziz

Computer Networks and Internet Department, College of IT, Ninevah University, Mosul, Iraq.

https://orcid.org/0000-0002-8631-5788

Akram Dawood

Department of Computer Engineering, Engineering College, Mosul University, Mosul, Iraq.

https://orcid.org/0000-0003-4544-875X

Amar Daood

Department of Computer Engineering, Engineering College, Mosul University, Mosul, Iraq.

https://orcid.org/0000-0002-6841-5259

Abstract

Speaker identification and verification are important fields contributing to smart IoT, phone banking, remote login services, E-learning, and other applications. In this work, the speaker identification and verification processes have been experimentally proven to have mutual enhancement effects if they are merged together in a proper manner. Speaker identification and verification work cooperatively so that the verifier will enhance the identifier model. The first step is to identify the speaker using context – independent speech signal, and the identifier model (ID) is trained using a classification model. The model’s outputs are then used to control the verification process as a next step. When the verification result is positive, the first process outcome is approved with high confidence. Otherwise, the negative verification will force the ID process to re-configure itself. The loop continues until both verification and ID agree on the speaker. A multiple Gaussian mixture GMM was used to efficiently model each person’s speech features (MFCC) for using expectation maximization (EM). On the other hand, the conducted experiments showed that the one-dimensional convolutional neural network (1D-CNN) proved its superiority over other models for speaker identification. A novel approach was proposed, proving that little data can be expanded with split-add-noise and train-on-the-fly procedures. In many speaker identification approaches, the specific context was used as a keyword or a password to simplify the processing, requiring big data to achieve high accuracy. It is noteworthy that a small amount of data was enough to efficiently train the proposed model, with a verification error of around 3%, i.e., an accuracy of 97%. Meanwhile, 95% and 96% identification accuracy was achieved using two different datasets. Additionally, the suggested algorithm did not imply using any keyword or password because it is a context-independent approach.

Issue

Vol. 32 No. 2 (2025): Vol. 32, No. 2, 2025

Section

Articles

This work is licensed under a Creative Commons Attribution 4.0 International License.

THIS IS AN OPEN ACCESS ARTICLE UNDER THE CC BY LICENSE http://creativecommons.org/licenses/by/4.0/

References

Xue Y. Multi-Label Training for Text-Independent Speaker Identification. arXiv preprint arXiv 2022 Nov 14.

Mohammadi M, Mohammadi HRS. Weighted X-Vectors for Robust Text-Independent Speaker Verification with Multiple Enrollment Utterances. Circuits, Systems, and Signal Processing 2022; 41(5):2825-2844.

Nagakrishnan R, Revathi A. Generic Speech-Based Person Authentication System with Genuine and Spoofed Utterances: Different Feature Sets and Models. Multimedia Tools and Applications 2022; 1:1-30.

Gaurav, Bhardwaj S, Agarwal R. An Efficient Speaker Identification Framework Based on Mask R-CNN Classifier Parameter Optimized Using Hosted Cuckoo Optimization (HCO). Journal of Ambient Intelligence and Humanized Computing 2022; 5:1-3.

Shareef SRS, Al-Irhayim YFM. Comparison Between Features Extraction Techniques for Impairments Arabic Speech. Al-Rafidain Engineering Journal 2022; 27(2):190-197.

Monir M et al. Cancelable Speaker Identification Based on Cepstral Coefficients and Comb Filters. International Journal of Speech Technology 2022; 25(2):471-492.

Karthikeyan V. Adaptive Boosted Random Forest-Support Vector Machine Based Classification Scheme for Speaker Identification. Applied Soft Computing 2022; 131:109826.

Hamsa S et al. Speaker Identification from Emotional and Noisy Speech Using Learned Voice Segregation and Speech VGG. Expert Systems with Applications 2023; 224:119871.

AL-Shakarchy ND, Obayes HK, Abdullah ZN. Person Identification Based on Voice Biometric Using Deep Neural Network. International Journal of Information Technology 2023; 15(2):789-795.

Jahangir R et al. Text-Independent Speaker Identification Through Feature Fusion and Deep Neural Network. IEEE Access 2020; 8:32187-32202.

Shafik A et al. Speaker Identification Based on Radon Transform and CNNs in the Presence of Different Types of Interference for Robotic Applications. Applied Acoustics 2021; 177:107665.

Sidorov M et al. Survey of Automated Speaker Identification Methods. Proceedings of the 9th International Conference on Intelligent Environments 2013; 236-239.

Dey N. Applied Speech Processing: Algorithms and Case Studies. Academic Press; 2021.

Dawood A et al. Simulation of Multimedia Data Transmission Over WSN Based on MATLAB/SIMULINK. International Journal of Computing and Digital Systems 2023; 14(1):147-157.

Shi Y, Huang Q, Hain T. H-VECTORS: Improving the Robustness in Utterance-Level Speaker Embeddings Using a Hierarchical Attention Model. Neural Networks 2021; 42:329-339.

Alshaykha AM. E-Learning Visual Design Elements of User Experience Perspective. Tikrit Journal of Engineering Sciences 2022; 29(1):111-118.

Mokgonyane TB et al. A Cross-Platform Interface for Automatic Speaker Identification and Verification. Proceedings of the International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems 2021; 1-6.

Wang X et al. A Network Model of Speaker Identification with New Feature Extraction Methods and Asymmetric BLSTM. Neurocomputing 2020; 403:167-181.

Lawson A et al. Survey and Evaluation of Acoustic Features for Speaker Recognition. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing 2011; 5444-5447.

Antony A, Gopikakumari R. Speaker Identification Based on Combination of MFCC and UMRT Based Features. Procedia Computer Science 2018; 143:250-257.

Shahin I, Nassif AB, Bahutair M. Emirati-Accented Speaker Identification in Each of Neutral and Shouted Talking Environments. International Journal of Speech Technology 2018; 21:265-278.

Yadav S, Rai A. Learning Discriminative Features for Speaker Identification and Verification. Proceedings of Interspeech 2018; 2237-2241.

Chakroun R, Frikha M. Improved Text-Independent Speaker Identification and Verification with Gaussian Mixture Models. 12th International Conference on Knowledge Science, Engineering and Management 2019; 3-10.

Pawar MD, Kokate R. A Robust Wavelet Based Decomposition and Multilayer Neural Network for Speaker Identification. 7th Innovations in Electronics and Communication Engineering 2019; 197-209.

Sekkate S, Khalil M, Adib A. Speaker Identification for OFDM-Based Aeronautical Communication System. Circuits, Systems, and Signal Processing 2019; 38(8):3743-3761.

Bunrit S et al. Text-Independent Speaker Identification Using Deep Learning Model of Convolution Neural Network. International Journal of Machine Learning and Computing 2019; 9(2):143-148.

Shi Y, Huang Q, Hain T. Improving Noise Robustness in Speaker Identification Using a Two-Stage Attention Model. arXiv preprint arXiv 2019 Sep 24.

Nicolson A, Paliwal KK. Sum-Product Networks for Robust Automatic Speaker Identification. arXiv preprint arXiv 2019 Oct 26.

Tiwari V et al. Speaker Identification Using Multi-Modal I-Vector Approach for Varying Length Speech in Voice Interactive Systems. Cognitive Systems Research 2019; 57:66-77.

Wilkinghoff K. On Open-Set Speaker Identification With I-Vectors. Proceedings of Odyssey 2020; 408-414.

Roumiassa F, Chelali FZ. Speaker Identification and Verification System for Arabic and Berber Language. 1st International Conference on Communications, Control Systems and Signal Processing 2020; 242-247.

Benesty J, Sondhi MM, Huang Y (Eds). Springer Handbook of Speech Processing. Springer; 2008; 1.

Dawood AAM, Abdulaziz AS, Mohammed AJ. RLC-Based Image Compression Using Wavelet Decomposition with Zero-Setting of Unnecessary Sub-Bands. Journal of Engineering Science and Technology 2022; 17(1):391-403.

Reynolds DA. Gaussian Mixture Models. Encyclopedia of Biometrics 2009; 659-663.

Mahmood MS, Al Dabagh NB. Improving IoT Security Using Lightweight Based Deep Learning Protection Model. Tikrit Journal of Engineering Sciences 2023; 30(1):119-129.

Divya V et al. Improving EEG Electrode Sensitivity with Graphene Nano Powder and Neural Network for Schizophrenia Diagnosis. Tikrit Journal of Engineering Sciences 2023; 30(1):84-93.

Hamsa S et al. Speaker Identification from Emotional and Noisy Speech Using Learned Voice Segregation and Speech VGG. Expert Systems with Applications 2023; 224:119871.

Pentapati HK. Enhancement in Speaker Identification Through Feature Fusion Using Advanced Dilated Convolution Neural Network. International Journal of Electrical and Computer Engineering Systems 2023; 14(3):301-310.

Article Sidebar

Main Article Content

Abstract

Article Details

Issue

Section

References

Similar Articles