The Classification of Documents in Malay and Indonesian Using the Naive Bayesian Method Uses Words and Phrases as a Training Set

Loading...
Thumbnail Image
Date
2020-12-21
ORCID
Advisor
Referee
Mark
Journal Title
Journal ISSN
Volume Title
Publisher
Institute of Automation and Computer Science, Brno University of Technology
Altmetrics
Abstract
Malay Language and Indonesian Language are two closely related languages, sharing a lot in common in the meanings of words and grammar. Classifying the two languages automatically using a tool is a challenge because the two languages are very similar. The classification method that is widely used today is the Naive Bayesian method. This method needs to be implemented in a particular way to increase the level of classification accuracy. In this study, a new method was used, by using a training set in the form of words and phrases instead of just using a training set in the form of words only. With this method, the level of classification accuracy of the two languages is increased.
Description
Citation
Mendel. 2020 vol. 26, č. 2, s. 23-28. ISSN 1803-3814
https://mendel-journal.org/index.php/mendel/article/view/116
Document type
Peer-reviewed
Document version
Published version
Date of access to the full text
Language of document
en
Study field
Comittee
Date of acceptance
Defence
Result of defence
Document licence
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license
http://creativecommons.org/licenses/by-nc-sa/4.0
Collections
Citace PRO