This study investigates the impact of training set size on authorship attribution (AA) for short Arabic texts, a relatively under-researched area in linguistic analysis. Through experiments using various classifiers including Mahalanobis distance, linear regression, and multilayer perceptron, the research finds that increasing training set size generally improves classifier accuracy, with word-based features combined with n-grams yielding the best results. The paper highlights the need for further research in Arabic language AA and suggests optimal classifier selection based on training set size.
Related topics: