DEVELOPING AN OPEN-SOURCE DATASET FOR SPEECH SOUND DISORDERS IN KAZAKH CHILDREN

3 2

Authors

  • Arypzhan ABEN AYU

Abstract

Speech sound disorders (SSDs) in children represent a significant barrier to effective communication, impacting literacy, social interactions, and mental health. In low-resource linguistic contexts like Kazakh, the absence of child-specific speech datasets hinders the development of diagnostic and therapeutic tools. This study aims to create an open-source dataset of SSDs in Kazakh children aged 3–10 years, comprising audio recordings and metadata from 100 participants (50 with SSDs and 50 typically developing). Data were collected in controlled clinical settings using high-fidelity recording equipment, standardized phonological tasks, and AI-driven preprocessing to ensure quality. The dataset captures unique acoustic and developmental characteristics, revealing higher fundamental and formant frequencies and prevalent error patterns like substitutions and omissions. This resource enables the design of AI-based diagnostic tools and culturally tailored interventions, addressing a critical gap in Kazakh speech-language pathology. Its open-source nature fosters global SSD research and cross-linguistic studies, enhancing communication outcomes for Kazakh children and contributing to the broader understanding of pediatric SSDs.

Downloads

Published

2025-12-31