Amharic Language Hate Speech Detection on Social Media - Abstract
Social media platforms enable rapid communication, information sharing, and opinion expression. However, their misuse for hate speech targeting race,
religion and political differences has become a growing concern. This issue is particularly sensitive for underrepresented languages like Amharic, a Semitic
language with the second-largest number of speakers after Arabic and the working language of Ethiopia. This study addresses the challenge of detecting
hate speech in Amharic text by analyzing posts and comments from Facebook, YouTube, and Twitter. A dataset of 7,590 labeled entries was collected
using the Face pager tool, focusing on hate speech related to race, religion, politics, and neutral content. The dataset was annotated with the guidance of
researchers, legal experts, and language specialists. Preprocessing techniques, including data cleaning, tokenization, and normalization, were applied, and
feature extraction was performed using embedding layers. The dataset was split into training (80%), validation (10%), and testing (10%) sets. Several deep
learning models LSTM, BiLSTM, GRU, BiGRU, and RoBERTa were developed and evaluated using precision, recall, F1-score, and accuracy metrics. The RoBERTa
model outperformed others, achieving an accuracy of 91%. This research highlights the effectiveness of advanced deep learning techniques in detecting
Amharic hate speech, offering a valuable tool for mitigating this critical issue in Ethiopian social media contexts.