LIWBC: a bigram algorithm to enhance results in polarity classification

  • Flavio Carvalho CEFET/RJ
  • Rafael G. Rodrigues CEFET/RJ
  • Gustavo Paiva Guedes CEFET/RJ

Resumo


The text mining literature shows a growing body of work concerned with the automatic identification of sentiment in text. Sentiment polarity classification is one of the most important text mining tasks. The typical approach to polarity classification uses lexicons to count word usage from linguistic or emotional aspects. One of the most widely used lexicons is the Linguistic Inquiry and Word Count (LIWC). LIWC assigns words to categories (e.g., positive emotion) based on a lexicon of words associated with psycholinguist categories. It has been widely used in polarity classification task with good results. However, it only accounts for word count, discarding the text structure and ignoring important semantic relationships between words. In this work, we present LIWBC, an algorithm to count bigrams using the lexicon provided by LIWC. The goal is to incorporate text structure information to improve the polarity classification task with LIWC lexicon. We conducted experiments to evaluate LIWBC with two real datasets: the first one consists of blogger posts; the second one is the movie reviews dataset, which contains full-text movie reviews from IMDB. Both datasets were processed with LIWC and LIWBC. After that, we ran four classification algorithms in the data processed by LIWC and LIWBC. The SVM algorithm executed with LIWBC data yielded the best result in both datasets. The F1 score of SVM in blogger posts and movie reviews dataset had an improvement of 2.2% and 2.5%, respectively.
Palavras-chave: Text mining, Sentiment analysis, LIWC
Publicado
16/10/2018
Como Citar

Selecione um Formato
CARVALHO, Flavio; G. RODRIGUES, Rafael; GUEDES, Gustavo Paiva. LIWBC: a bigram algorithm to enhance results in polarity classification. In: SIMPÓSIO BRASILEIRO DE SISTEMAS MULTIMÍDIA E WEB (WEBMEDIA), 24. , 2018, Salvador. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2018 . p. 419-422.