Unsupervised Aspect Extraction from Free-form Conversations
Aspect-based sentiment analysis on forum data can produce a wealth of knowledge due to the massive and free-form nature of the discussions involved. Existing works in aspect extraction for sentiment analysis include: 1) simple frequency count of noun phrases relying on labelled datasets for learning supervised models, and 2) topic modelling trained on large unlabelled datasets requiring tuning of complex parameters. Our goal is to efficiently and effectively ex- tract aspects (features and attributes) of certain entities (products or brands) from massive heterogenous collections of user-generated free-form conversations. We construct an aspect dictionary in three steps: 1) first we extract candidate aspects using simple lexico-syntactic patterns that capture the “aspect-of” relation between a noun phrase and a mention of an entity; 2) next, we filter the candi- dates by drawing on an automatically-compiled commonness black- list, as well as a neighbourhood-based measure of aspecthood; and 3) lastly, we expand the dictionary to increase coverage using a variety of simple techniques. When compared to state-of-the-art methods for aspect extraction, our method is capable of efficiently construct- ing an extremely compact aspect dictionary (98% more compact) with comparable performance.