Gevorg Petrosyan

APPLICATION PERSPECTIVES OF LANGUAGE MODELS IN THE INTELLIGENT SYSTEMS FOR DETERMINING THE DEGREE OF TEXT UNIQUENESS

https://doi.org/10.59982/18294359-25.1-gp-22

Abstract

This paper provides an overview of possible approaches to using language models in intelligent systems for determining the degree of uniqueness of a text. To determine the degree of uniqueness of a text, it is necessary to first identify all borrowed parts contained in it (including cross-language ones). Finding cross-language borrowings involves comparing the meaning of the texts written in different languages, since a direct translation normally does not express the linguistic features of the text. Possible changes and adaptations of the language models applications mentioned in this work will be further used for the design and development of the proposed two-stage approach for identifying cross-language borrowings and determining the degree of uniqueness of the text. Detection of cross-language borrowings is an especially urgent task for languages that are underrepresented on the Internet, such as the Armenian language. Methods based on natural language processing are considered one of the highest priorities in the task of detecting cross-language borrowings, since they allow to analyze texts written in natural language at a deeper level. The work also presents the diagram of the proposed system for determining the degree of uniqueness of the text and identifying the borrowings (including cross-language ones) contained in it through the use of the mentioned applications of language models.

Keywords: Natural language processing, cross-language borrowing, lemmatization, plagiarism, text uniqueness, language model.

PAGES : 214-223

DOWNLOAD FULL ARTICLE