Istamuqlov Hasanjon (PhD Student, Khujand State University
named after academician B. Gafurov,
Tajikistan, Khujand
)
Muzafarov Dilshod (Dean of the Faculty of Mathematics,
Khujand State University
named after academician B. Gafurov
Tajikistan, Khujand
)
|
This scientific article examines tokenization methods for Tajik text using the Python programming language. The authors analyze the characteristics of the Tajik alphabet and grammar, as well as typical tokenization problems related to its specificity. The article provides an overview of the main libraries and packages for text processing in Python and describes approaches to tokenization based on examples from other languages. The work presents the results of experiments using morphological, statistical, and neural network approaches to tokenization, and suggests directions for future research in this field.
Keywords:tokenization, Tajik language, Python programming language, morphological approach, statistical approach, neural networks, deep learning, natural language processing, alphabet, grammar
|
|
|
Read the full article …
|
Citation link: Istamuqlov H. , Muzafarov D. TOKENIZATION METHODS FOR TAJIK TEXT USING PYTHON // Современная наука: актуальные проблемы теории и практики. Серия: Естественные и Технические Науки. -2023. -№06/2. -С. 78-82 DOI 10.37882/2223-2966.2023.6-2.16 |
|
|