JOURNAL BROWSE
Search
Advanced SearchSearch Tips
A Crowdsourcing-Based Paraphrased Opinion Spam Dataset and Its Implication on Detection Performance
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
A Crowdsourcing-Based Paraphrased Opinion Spam Dataset and Its Implication on Detection Performance
Lee, Seongwoon; Kim, Seongsoon; Park, Donghyeon; Kang, Jaewoo;
 
 Abstract
Today, opinion reviews on the Web are often used as a means of information exchange. As the importance of opinion reviews continues to grow, the number of issues for opinion spam also increases. Even though many research studies on detecting spam reviews have been conducted, some limitations of gold-standard datasets hinder research. Therefore, we introduce a new dataset called "Paraphrased Opinion Spam (POS)" that contains a new type of review spam that imitates truthful reviews. We have noticed that spammers refer to existing truthful reviews to fabricate spam reviews. To create such a seemingly truthful review spam dataset, we asked task participants to paraphrase truthful reviews to create a new deceptive review. The experiment results show that classifying our POS dataset is more difficult than classifying the existing spam datasets since the reviews in our dataset more linguistically look like truthful reviews. Also, training volume has been found to be an important factor for classification model performance.
 Keywords
paraphrasing;opinion spam;crowdsourcing;resources generation;resources evaluation;
 Language
Korean
 Cited by
 References
1.
M. Ott, Y. Choi, C. Cardie and J. T. Hancock, "Finding deceptive opinion spam by any stretch of the imagination," Proc. of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Vol. 1, pp. 309-319, 2011.

2.
J. Li, M. Ott, C. Cardie and E. Hovy, "Towards a general rule for identifying deceptive opinion spam," Proc. of the 52nd Annual Meeting of the Association for Computational Linguistics, Vol. 1, pp. 1566-1576, 2014.

3.
A. Heydari, M. A. Tavakoli, N. Salim and Z. Heydari, "Detection of review spam: A survey," Journal of the Expert Systems with Applications, pp. 3634-3642, 2015.

4.
S. Rendle, "Factorization machines with libFM," Journal of the ACM Transactions on Intelligent Systems and Technology, Vol. 3, pp. 57-80, 2012.

5.
H. Sun, A. Morales, X. Yan, "Synthetic review spamming and defense," Proc. of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data mining, 2013.

6.
S. Kim, H. Chang, S. Lee, M. Yu, J. Kang, "Deep semantic frame-based deceptive opinion spam analysis," Proc. of the 24th ACM CIKM International Conference on Information and Knowledge Management, pp. 1141-1140, 2015.

7.
N. Jindal, B. Liu, E-P. Lim, "Finding unusual review patterns using unexpected rules," Proc. of the 19th ACM CIKM International Conference on Information and Knowledge Management, pp. 1549-1552, 2010.

8.
E-P. Lim, V-A. Nguyen, N. Jindal, B. Liu, H. W. Lauw, "Detecting product review spammers using rating behaviors," Proc. of the 19th ACM CIKM International Conference on Information and Knowledge Management, 2010.

9.
A. Mukherjee, B. Liu, N. S. Glance, "Spotting fake reviewer groups in consumer reviews," Proc. of the 21st WWW World Wide Web Conference, pp. 191-200, 2012.

10.
A. Mukherjee, V. Venkataraman, B. Liu, N. Glance, "What yelp fake review filter might be doing," Proc. of the Seventh International Conference on Weblogs and Social Media, 2013.

11.
N. Jindal, B. Liu, "Opinion spam and analysis," Proc. of the WSDM International Conference on Web Search and Web Data Mining, 2008.

12.
S. Gokhman, J. Hancock, P. Prabhu, M. Ott and C. Cardie, "In search of a gold standard in studies of deception," Proc. of the Workshop on Computational Approaches to Deception Detection, pp. 23-30, 2012.