The Classification of Documents in Malay and Indonesian Using the Naive Bayesian Method Uses Words and Phrases as a Training Set

Wijaya, Marvin Chandra

The Classification of Documents in Malay and Indonesian Using the Naive Bayesian Method Uses Words and Phrases as a Training Set

Files

116-Article Text-233-3-10-20201222.pdf(2.02 MB)

Date

2020-12-21

Authors

Wijaya, Marvin Chandra

Publisher

Institute of Automation and Computer Science, Brno University of Technology

Altmetrics

Abstract

Malay Language and Indonesian Language are two closely related languages, sharing a lot in common in the meanings of words and grammar. Classifying the two languages automatically using a tool is a challenge because the two languages are very similar. The classification method that is widely used today is the Naive Bayesian method. This method needs to be implemented in a particular way to increase the level of classification accuracy. In this study, a new method was used, by using a training set in the form of words and phrases instead of just using a training set in the form of words only. With this method, the level of classification accuracy of the two languages is increased.

Keywords

Malay, Indonesian, Language, Naive Bayesian, Classification

Citation

Mendel. 2020 vol. 26, č. 2, s. 23-28. ISSN 1803-3814
https://mendel-journal.org/index.php/mendel/article/view/116

Document type

Peer-reviewed

Document version

Published version

Language of document

en

Document licence

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license
http://creativecommons.org/licenses/by-nc-sa/4.0