A Multimodal Hate Speech Classification Process Using Dual Feature Extraction Techniques

Chibuike Onuoha1, Ikerionwu Charles2 and Obi Nwokonkwo1

Abstract

Racist and ethnic violence, fabricated persecution, and some form of intimidation are all risks associated with hate speech, which is a concern with natural language processing. Given the sensitivity of hate speech in our society, it is essential to classify speeches into hate and non-hate categories in real time to minimize its risks. The main objective of this work is to investigate selected supervised machine learning algorithm model for the classification of hate speech on social media. The term frequency-inverse document frequency (TF-IDF) and bag of words (BOW) models were used by the model to extract features. Porter’s stemming model and WordNet for lemmatization are used during the preprocessing step. The datasets were trained using logistic regression, naive Bayes, and random forest, and logistic regression was also utilized to create the classifier. For training purpose, 80% of the datasets was used to train the model and 20% was used for testing the model. Results obtained from the application of Logistic Regression algorithm revealed 98% accuracy and 98% F1-score. These scores indicate high accuracy in hate speech detection and classification.

Keywords

NLP; hate speech; classification; accuracy

Cite This Article

Onuoha, C., Charles, I., Nwokonkwo, O. (2022). A Multimodal Hate Speech Classification Process Using Dual Feature Extraction Techniques. International Journal of Scientific Advances (IJSCIA), Volume 3| Issue 5: Sep-Oct 2022, Pages 768-771, URL: https://www.ijscia.com/wp-content/uploads/2022/10/Volume3-Issue5-Sep-Oct-No.348-768-771.pdf

Volume 3 | Issue 5: Sep-Oct 2022 

 

ISSN: 2708-7972

สัญญาอนุญาตของครีเอทีฟคอมมอนส์

This work is licensed under a Creative Commons Attribution 4.0 (International) Licence.(CC BY-NC 4.0).

Download

Support