Enhancing Speaker Diarization in Forensic Audio: A Comparative Analysis of Machine Learning Algorithms for Gender Classification

  • Rahmat Ullah
  • , Ikram Asghar
  • , Gareth Evans
  • , Rab Nawaz
  • , Saeed Akbari
  • , Dorothy Anne Roberts

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Speaker diarization is vital in contexts like police interrogations, where it enhances the security and personalization of data access and improves confidentiality in multi-speaker environments. The transcription of low-quality forensic audio recordings is challenging, as they are often marred by unclear speech and impede the accuracy of conventional Automatic Speech Recognition (ASR) systems. This paper evaluates the efficacy of traditional machine learning algorithms—Support Vector Machine (SVM), Decision Tree Classifier, Random Forest Classifier, and XGBoost in gender classification from voice samples for speaker diarization systems. These systems are critical in contexts like police interrogations, where they enhance data security and improve confidentiality in multi-speaker environments. We test these algorithms against real-world data, simulating practical conditions to ensure robustness. Our findings reveal that ensemble methods, particularly Random Forest and XGBoost, demonstrate high accuracy and strong generalizability when dealing with unfiltered, real-world audio data. XGBoost shows significant resistance to overfitting, making it highly suitable for secure voice-driven applications. This study aids in algorithm selection for speaker diarization tasks. It addresses gaps in forensic audio transcription accuracy, thereby enhancing the reliability of transcriptions and reducing risks of erroneous interpretations in legal contexts.
Original languageEnglish
Title of host publicationInternational Conference on Smart Systems and Emerging Technologies
Subtitle of host publicationSMARTTECH 2024
EditorsAnis Koubaa, Adel Ben Mnaouer, Wadii Boulila, Said Raghay
PublisherSpringer
Pages150-161
Number of pages11
ISBN (Electronic)9783031912351
ISBN (Print)9783031912344
DOIs
Publication statusPublished - 14 Aug 2025
EventInternational Conference on Smart Systems and Emerging Technologies - Cadi Ayyadh University, Marrakech, Morocco
Duration: 19 Nov 202421 Nov 2024
Conference number: 3

Publication series

NameLecture Notes in Networks and Systems
PublisherSpringer Cham
Volume1401
ISSN (Print)2367-3370
ISSN (Electronic)2367-3389

Conference

ConferenceInternational Conference on Smart Systems and Emerging Technologies
Country/TerritoryMorocco
CityMarrakech
Period19/11/2421/11/24

Fingerprint

Dive into the research topics of 'Enhancing Speaker Diarization in Forensic Audio: A Comparative Analysis of Machine Learning Algorithms for Gender Classification'. Together they form a unique fingerprint.

Cite this