PHILOLOGICAL STUDIES: COMPUTER ASPECTS OF STYLOMETRY AUTOMATION

21 36

Authors

  • R.Zh. SAURBAYEV Toraighyrov University
  • A.K. ZHETPISBAY Pavlodar Margulan Pedagogical University
  • F.T. YEREKHANOVA Central Asian Innovation University

Keywords:

computational linguistics, stylometry, text structuring, computer aspects, stylometry automation, artificial intelligence, attribution of authorship.

Abstract

The article aims at considering stylometry automation of philological studies. The relevance of this article stems from the need to trace and critically analyze the results of numerous studies in the interdisciplinary field that have been actively developed in recent years to identify the author of the text using artificial intelligence methods (authorship attribution and profiling) as well as to provide theoretical foundations and a comprehensive stylometric methodology (i.e. based on the analysis of quantifiable linguistic features using statistical methods and machine learning algorithms) to identify the author of the text, based on the principles of explanation, objectivity, evidence, and open science. Within the framework of subject and activity identification idiolectology - is a developing scientific direction that focuses specifically on the systematic study of the phenomenon of idiolect in the identification of computer aspects using modern achievements of computational and corpus linguistics, and data science. The authors of the article claim that the task of computational and corpus linguistics is to provide the researcher with all the necessary material, to prepare the data for counting, and to offer a wide range of computational procedures that can be used to test hypotheses, together to confirm or refute ever subtler and profound philological observations.
Philological problems, in solution of which language information is used, usually have, a clear application, orientation, the language, and style of the text in such a situation are not the goal of the study, but a means of solving extra-linguistic problems. The solution of a specific philological problem (e.g., the problem of disputed authorship) is usually not limited to the rigid framework of a particular research methodology but is carried out using methods and facts in various fields of knowledge and practical activities.

References

Социальные науки и образование в условиях становления электронноцифровой цивилизации / Научно-практическая конференция. – М.; СПб.: Нестор-История, 2020. – 152 с. 2. Бортников В.И. Лингвистический анализ текста: учебно-методическое пособие / под общ. ред. О.В. Обвинцевой. – Екатеринбург: Изд-во Урал. ун-та, 2020. – 112 с. 3. Попов Е.В., Лагутина Н.С. Определение стилометрических характеристик коротких текстов и их применение в задачах классификации // Сборник научных статей. – Ярославль: Ярославский государственный университет им. П.Г. Демидова, 2020. – №12. – C. 254–261. 4. Мартыненко Г.Я. Стилометрия: возникновение в становление в контексте междисциплинарного взаимодействия // Структурная и прикладная лингвистика: межвуз. сб. / под ред. А.С. Герда и И.С. Николаева. – СПб.: Изд-во С.-Петерб. ун-та, 2015. – Вып. 11. – С. 9–28. 5. Сарекенова Қ.Қ., Меліс А.М., Тойбекова С.Р. Филология мамандығы бойынша білім алушыларды компьютерлік технология бағытында оқыту – заман талабы // Л.Н. Гумилев атындағы Еуразия ұлттық университетінің Хабаршысы. Филология сериясы. – 2018. – №2 (123). – Б. 166–172.

Жубантаева Ж. Корпустық лингвистика. [Electronic Resource]. URL: https://www.academia.edu/39122192 (қаралған күні: 15.10.2023)

Маханова З.А., Қожабекова П.А., Сейтжаппар М.А., Сабит Н.Е. Қазақ тілінің автоматтандырылған маркерлік корпусын әзірлеу // ҚазҰТЗУ хабаршысы. – 2021. – №1. – Б. 36–39. 8. Langlois J. When Linguistics meets computer science: Stylometry and professional discourse // Original Research Journal. Training Language and Culture. More than Meets the Eye: A Closer Look at Professional Discourse. – 2021. – Issue 2. – №5. – P. 51–61.

Wright D., May A. Identifying idiolect in forensic authorship attribution: an n-gram textbite approach. Language and Law // Linguagem e Direito. – 2014. – №1(1). – P. 37–69.

Galyashina E.I. Forensic linguistics in Russia: the current situation and new challenges // Theory and practice of forensic expertise. – 2018. – Vol. 13. – №4. – P. 28–37.

Nikishin V.D. Criteria of Extremist Speech Acts: Forensic Linguistic Diagnostic Complexes // European Journal of Social & Behavioural Sciences. – 2021. – №30(2). – P. 3394–3408. DOI:10.15405/ejsbs.296. 12. Чернявская В.Е. Дискурсивный анализ и корпусные методы: необходимое доказательное звено? Объяснительные возможности качественного и количественного подходов // Вопросы когнитивной лингвистики. – 2018. – №2 (55). – C. 31–37. DOI: 10.20916/1812-3228-2018-2-31-37 13. Burrows J. “Delta”: a measure of stylistic difference and a guide to likely authorship // Literary and Linguistic Computing. – 2002. – Vol. 17 (3). – P. 267–287. 14. Demsar J. Statistical comparisons of classifiers over multiple data sets // Journal of Machine Learning Research. – 2006. – №7. – P. 1–30.

Romero-Barranco J., Rodríguez-Abruñeiras P. Current trends in Corpus Linguistics and textual variation // Research in Corpus Linguistics. – 2021. – №9(2). – P. i-xiii. https://doi.org/10.32714/ricl.09.02.01

Desagulier G. Corpus linguistics and statistics with R. Introduction to quantitative methods in linguistics (quantitative methods in the humanities and social sciences). – Springer International Publishing Springer, 2017. – 353 p.

REFERENCES

Socialnye nauki i obrazovanie v usloviah stanovlenia elektronnocifrovoi civilizacii [Social sciences and education in the context of the formation of an electronic digital civilization] / Nauchno-prakticheskaia konferencia. – M.; SPb.: Nestor-Istoria, 2020. – 152 s. [In Russian]

Bortnikov V.I. Lingvisticheskiy analiz teksta [Linguistic analysis of the text]: uchebno-metodicheskoe posobie / pod obsh. red. O.V. Obvincevoi. – Ekaterinburg: Izd-vo Ural. un-ta, 2020. – 112 s. [In Russian]

Popov E.V., Lagutina N.S. Opredelenie stilometricheskih harakteristik korotkih tekstov i ih primenenie v zadachah klassifikacii [Determination of stoichiometric characteristics of short texts and their application in classification tasks] // Sbornik nauchnyh statei. – Iaroslavl: Iaroslavskiy gosudarstvennyi universitet im. P.G. Demidova, 2020. – №12. – C. 254–261. [In Russian]

Martynenko G.Ia. Stilometria: vozniknovenie v stanovlenie v kontekste mejdiciplinarnogo vzaimodeistvia [Stylometry: emergence into formation in the context of interdisciplinary interaction] // Strukturnaia i prikladnaia lingvistika: mejvuz. sb. / pod red. A.S. Gerda i I.S. Nikolaeva. – SPb.: Izd-vo S.-Peterb. un-ta, 2015. – Vyp. 11. – S. 9–28. [In Russian]

Sarekenova Q.Q., Melіs A.M., Toibekova S.R. Filologia mamandygy boiynsha bіlіm alushylardy kompiuterlіk tehnologia bagytynda oqytu – zaman talaby [Training of students in the specialty philology in the direction of computer technology is a modern requirement] // L.N. Gumilev atyndagy Eurazia ulttyq universitetіnіn Habarshysy. Filologia seriasy. – 2018. – №2 (123). – B. 166–172. [in Kazakh]

Jubantaeva J. Korpustyq lingvistika [Corpus linguistics]. [Electronic Resource]. URL: https://www.academia.edu/39122192 (date of access: 15.10.2023) [in Kazakh]

Mahanova Z.A., Qojabekova P.A., Seitjappar M.A., Sabit N.E. Qazaq tіlіnіn avtomattandyrylgan markerlіk korpusyn azіrleu [Development of an automated marker body of the Kazakh language] // QazUTZU habarshysy. – 2021. – №1. – B. 36–39. [in Kazakh]

Langlois J. When Linguistics meets computer science: Stylometry and professional discourse // Original Research Journal. Training Language and Culture. More than Meets the Eye: A Closer Look at Professional Discourse. – 2021. – Issue 2. – №5. – P. 51–61. 9. Wright D., May A. Identifying idiolect in forensic authorship attribution: an n-gram textbite approach. Language and Law // Linguagem e Direito. – 2014. – №1(1). – P. 37–69.

Galyashina E.I. Forensic linguistics in Russia: the current situation and new challenges // Theory and practice of forensic expertise. – 2018. – Vol. 13. – №4. – P. 28–37.

Nikishin V.D. Criteria of Extremist Speech Acts: Forensic Linguistic Diagnostic Complexes // European Journal of Social & Behavioural Sciences. – 2021. – №30(2). – P. 3394–3408. DOI:10.15405/ejsbs.296.

Cherniavskaia V.E. Diskursivnyi analiz i korpusnye metody: neobhodimoe dokazatelnoe zveno? Obiasnitelnye vozmojnosti kachestvennogo i kolichestvennogo podhodov [Discursive analysis and corpus methods: a necessary evidentiary link? Explanatory possibilities of qualitative and quantitative approaches] // Voprosy kognitivnoi lingvistiki. – 2018. – №2 (55). – S. 31–37. DOI: 10.20916/1812-3228-2018-2-31-37 [In Russian]

Burrows J. “Delta”: a measure of stylistic difference and a guide to likely authorship // Literary and Linguistic Computing. – 2002. – Vol. 17 (3). – P. 267–287.

Demsar J. Statistical comparisons of classifiers over multiple data sets // Journal of Machine Learning Research. – 2006. – №7. – P. 1–30.

Romero-Barranco J., Rodríguez-Abruñeiras P. Current trends in Corpus Linguistics and textual variation // Research in Corpus Linguistics. – 2021. – №9(2). – P. i-xiii. https://doi.org/10.32714/ricl.09.02.01

Desagulier G. Corpus linguistics and statistics with R. Introduction to quantitative methods in linguistics (quantitative methods in the humanities and social sciences). – Springer International Publishing Springer, 2017. – 353 p.

Downloads

Published

2024-09-30