DEVELOPING AN OPEN-SOURCE DATASET FOR SPEECH SOUND DISORDERS IN KAZAKH CHILDREN
3 2
Abstract
Speech sound disorders (SSDs) in children represent a significant barrier to effective communication, impacting literacy, social interactions, and mental health. In low-resource linguistic contexts like Kazakh, the absence of child-specific speech datasets hinders the development of diagnostic and therapeutic tools. This study aims to create an open-source dataset of SSDs in Kazakh children aged 3–10 years, comprising audio recordings and metadata from 100 participants (50 with SSDs and 50 typically developing). Data were collected in controlled clinical settings using high-fidelity recording equipment, standardized phonological tasks, and AI-driven preprocessing to ensure quality. The dataset captures unique acoustic and developmental characteristics, revealing higher fundamental and formant frequencies and prevalent error patterns like substitutions and omissions. This resource enables the design of AI-based diagnostic tools and culturally tailored interventions, addressing a critical gap in Kazakh speech-language pathology. Its open-source nature fosters global SSD research and cross-linguistic studies, enhancing communication outcomes for Kazakh children and contributing to the broader understanding of pediatric SSDs.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Q.A.Iasaýı atyndaǵy Halyqaralyq qazaq-túrіk ýnıversıtetіnіń habarlary

This work is licensed under a Creative Commons Attribution 4.0 International License.