АВТОМАТИЧЕСКОЕ РАСПОЗНАВАНИЕ ТЕКСТОВ, СОЗДАННЫХ ИСКУССТВЕННЫМ ИНТЕЛЛЕКТОМ. СРАВНИТЕЛЬНЫЙ АНАЛИЗ МОДЕЛЕЙ

11 16

Авторы

  • Arypzhan ABEN AYU

Аннотация

. This paper investigates the effectiveness of machine learning methods in automatically distinguishing artificial intelligence (AI)-generated texts from human-written texts. The study was conducted on a balanced dataset (2,750 essays; 1,375 entries per class). 14 linguistic-statistical features were extracted from the text, among which vocabulary_richness, word_count, text_length, sentence_count, and complex_word_ratio were found to have high discriminative value using Cohenʼs d. The features were vectorized using TF-IDF and embeddings, and algorithms such as RandomForest, GradientBoosting, XGBoost, LightGBM, LogisticRegression, SVM, KNN, DecisionTree, AdaBoost, and MLP were evaluated using stratified cross-validation. The results showed that gradient boosting models (especially XGBoost) and transform methods performed well; the classification score on the test set reached very high values. Cluster analysis showed a correlation between thematic structure and class division. However, the generalizability of the obtained high scores requires further testing in the case of cross-domain evaluation, adversarial attacks, and manipulations such as reduction/paraphrasing. Future research is recommended to focus on transformer fine-tuning, adversarial stability, and multilingualism.

Библиографические ссылки

Abbas, H. M. (2025). A Novel Approach to Automated Detection of AI-Generated Text. Journal of Al-Qadisiyah for Computer Science and Mathematics.

Chakraborty, S., Bedi, A. S., Zhu, S., An, B., Manocha, D., & Huang, F. (2023). On the Possibilities of AI-Generated Text Detection. arXiv preprint arXiv:2303.XXXXX.

Chen, Y., et al. (2024). A Text Hardness-Aware Benchmark for LLM-generated Text Detection. arXiv preprint arXiv:2407.15286.

DeepMind. (2024). SynthID: Watermarking for AI-Generated Text. Google DeepMind Technical Report.

Fagni, T., et al. (2021). Deepfake Text Detection: A Survey. arXiv preprint arXiv:2106.XXXXX.

Gehrmann, S., et al. (2024). Adversarial Robustness in AI Text Detectors. Proceedings of ACL 2024.

Gritsai, G., Voznyuk, A., Grabovoy, A., & Chekhovich, Y. (2024). Are AI Detectors Good Enough? A Survey on Quality of Datasets With Machine-Generated Texts. arXiv preprint arXiv:2410.14677.

Liu, X., Li, Y., & Li, K. (2025). Enhancing the Robustness of AI-Generated Text Detectors: A Survey. Mathematics, 13(2), 123–145.

Mobin, M. K., & Islam, M. S. (2025). LuxVeri at GenAI Detection Task 3: Cross-Domain Detection of AI-Generated Text Using Inverse Perplexity-Weighted Ensemble of Fine-Tuned Transformer Models. arXiv preprint arXiv:2501.XXXXX.

Mo, Y., Qin, H., Dong, Y., Zhu, Z., & Li, Z. (2024). Large Language Model (LLM) AI Text Generation Detection based on Transformer Deep Learning Algorithm. International Journal of Engineering and Management Research, 14(3), 45–60.

Tang, G., et al. (2024). Detection of Machine-Generated Text: Literature Survey. arXiv preprint arXiv:2402.01642.

Weinberger, M., et al. (2023). Testing of Detection Tools for AI-Generated Text. International Journal for Educational Integrity, 19(1), 1–15.

Wu, J., Yang, S., Zhan, R., Yuan, Y., Chao, L. S., & Wong, D. F. (2025). A Survey on LLM-Generated Text Detection: Necessity, Methods, and Future Directions. Computational Linguistics, 51(1), 275–338.

Yadagiri, V., et al. (2025). Transformer-Based Models for AI Text Detection in COLING 2025. Proceedings of COLING 2025.

Загрузки

Опубликован

2025-10-19