Artificial Intelligence and the Scientific Development of the Kazakh Language: Corpus, Terminology, and Content Automation
31 41
Keywords:
Kazakh language corpus, terminology automation, artificial intelligence, machine learning, semantic analysis, content automation, digital ecosystem.Abstract
This article provides a comprehensive analysis of effective strategies for enhancing the scientific and theoretical development of the Kazakh language through the integration of artificial intelligence technologies and linguistic corpora. The primary aim of the study is to examine the processes of terminological standardization and automated language processing across the morphological, semantic, syntactic, and lexical levels of the Kazakh language through the integration of linguistic resources with AI tools in a digital environment.
The theoretical section substantiates the necessity of developing a national language data repository designed for automated Kazakh language processing. The construction of this repository involves morphological annotation, automatic part-of-speech recognition, and the segmentation of word stems and affixes. Additionally, semantic fields, interconceptual relations, and the functional load of lexical units are analyzed to support consistent terminological structuring.
In the practical part of the study, texts collected via web scraping were analyzed to identify high-frequency terms and their contextual usage using concordance methods. A comparative diachronic approach was employed to trace the semantic evolution and grammatical roles of specific terms across different time periods. Modern AI tools, including semantic modeling techniques and machine learning algorithms, facilitated the systematic analysis of these processes.
As a result, the study identified the rate of term formation, the mechanisms of semantic expansion, and the level of terminological standardization in the field of information technology in the Kazakh language. The findings are supplemented with specific proposals aimed at improving the digital ecosystem of the Kazakh language. This research makes a significant contribution to expanding the scientific use of the Kazakh language and strengthening terminological regulation by systematically describing the interrelationships between morphology, syntax, semantics, and lexicon.
References
Hunston S. Corpora in Applied Linguistics. – Cambridge: Cambridge University Press, 2002. – 214 p.
McEnery T., Hardie A. Corpus Linguistics: Method, Theory and Practice. – Cambridge: Cambridge University Press, 2012. – 278 p. 3.
Makhambetov O., Kokenbayev Y., Yessenbayev Z., et al. Assembling the Kazakh Language Corpus // Proceedings of ACL. – 2021. – P. 1–12.
Сыздық Р. Тіл табиғаты және оның зерттелуі. – Алматы: Ғылым, 2000. – 320 б.
Cabré M.T. Terminology: Theory, Methods and Applications. – Amsterdam: John Benjamins, 1999. – 356 p.
Ақылбекова Г.К., Жанғабылова З.К. Салалық терминология және терминжасам қағидаттары // ҚР Заңнама және құқықтық ақпарат институтының хабаршысы. – 2023. – №1 (72). – Б. 224–232.
Kim H. Machine Learning Methods for Turkic Languages // Language Resources and Evaluation. – 2022. – Vol. 56, No. 3. – P. 251–268.
Мединаева А.А. Қазақ тіліндегі кейбір терминдердің мәселелеріне орай // ҚР Заңнама және құқықтық ақпарат институтының хабаршысы. – 2023. – №2 (73). – Б. 222–228.
Maybury M. New Directions in Automated Text Summarization // Information Processing and Management. – 1999. – Vol. 35, No. 4. – P. 491–514.
Bowker L., Pearson J. Working with Specialized Language: A Practical Guide to Using Corpora. – London: Routledge, 2002. – 256 p.
Wu Y., Schuster M., Chen Z., Le Q.V., Norouzi M., & Macherey W. Google’s Neural Machine Translation System: Bridging the Gap Between Human and Machine Translation // arXiv preprint. – 2016. – arXiv:1609.08144.
Chomsky N. Aspects of the Theory of Syntax. – Cambridge, MA: MIT Press, 2015. – 270 p.
REFERENCES
Hunston S. Corpora in Applied Linguistics. – Cambridge: Cambridge University Press, 2002. – 214 p.
McEnery T., Hardie A. Corpus Linguistics: Method, Theory and Practice. – Cambridge: Cambridge University Press, 2012. – 278 p.
Makhambetov O., Kokenbayev Y., Yessenbayev Z., et al. Assembling the Kazakh Language Corpus // Proceedings of ACL. – 2021. – P. 1–12.
Syzdyq R. Tіl tabigaty zhane onyn zertteluі [The nature of language and its study]. – Almaty: Gylym, 2000. – 320 b. [in Kazakh]
Cabré M.T. Terminology: Theory, Methods and Applications. – Amsterdam: John Benjamins, 1999. – 356 p.
Aqylbekova G.K., Zhangabylova Z.K. Salalyq terminologia zhane terminzhasam qagidattary [Industry terminology and terminology principles] // QR Zannama zhane qұqyqtyq aqparat institutynyn habarshysy. – 2023. – №1 (72). – B. 224–232. [in Kazakh]
Kim H. Machine Learning Methods for Turkic Languages // Language Resources and Evaluation. – 2022. – Vol. 56, No. 3. – P. 251–268.
Medinaeva A.A. Qazaq tіlіndegі keibіr terminderdіn maselelerіne orai [In connection with the problems of some terms in the Kazakh language] // QR Zannama zhane quqyqtyq aqparat institutynyn habarshysy. – 2023. – №2 (73). – B. 222–228. [in Kazakh]
Maybury M. New Directions in Automated Text Summarization // Information Processing and Management. – 1999. – Vol. 35, No. 4. – P. 491–514.
Bowker L., Pearson J. Working with Specialized Language: A Practical Guide to Using Corpora. – London: Routledge, 2002. – 256 p.
Wu Y., Schuster M., Chen Z., Le Q.V., Norouzi M., & Macherey W. Google’s Neural Machine Translation System: Bridging the Gap Between Human and Machine Translation // arXiv preprint. – 2016. – arXiv:1609.08144.
Chomsky N. Aspects of the Theory of Syntax. – Cambridge, MA: MIT Press, 2015. – 270 p.