In recent years, flaming-that is, hostile or insulting interaction-on social media has been a problem. To avoid or minimize flaming, enabling the system to automatically check messages before posting to determine whether they include expressions that are likely to trigger flaming can be helpful. We target two types of harmful expressions: insulting expressions and expressions that are likely to cause a quarrel. We first constructed an original harmful expressions dictionary. To minimize the cost of collecting the expressions, we built our dictionary semi automatically by using word distributed representations. The method used distributed representations of harmful expressions and general expressions as features, and constructed a classifier of harmful/general expressions based on these features. An evaluation experiment found that the proposed method was able to extract harmful expressions with an accuracy of approximately 70%. The proposed method was also able to extract unknown expressions; however, it tended to wrongly extract non-harmful expressions. The method is able to determine unknown harmful expressions not included in the basic dictionary and can identify semantic relationships among harmful expressions. Although the method cannot presently be applied directly to multi-word expressions, it should be possible to add such a capability by introducing time-series learning.
Reference
M. Yamamoto and M. Hagiwara, “A moral judgment system using distributed representation and associative information,” Transactions of Japan Society of Kansei Engineering, vol. 15, no. 4, pp.493–501, 2016.
Y. Iwasaki, R. Orihara, Y. Sei, H. Nakagawa, Y. Tahara, and A. Ohsuga, “Identification of flaming and its applications in cgm,” in Proceedings of the 6th International Conference on Agents and Artificial Intelligence-Volume 1, Setúbal, Portugal, 2014.
T. Ishisaka and K. Yamamoto, “Detecting nasty comments from bbs posts,” in Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation, Sendai, Japan, 2010.
A. H. Razavi, D. Inkpen, S. Uritsky, and S. Matwin, “Offensive language detection using multi-level classification,” in Canadian Conference on Artificial Intelligence, Berlin, Heidelberg, 2010.
Y. Chen, Y. Zhou, S. Zhu, and H. Xu, “Detecting offensive language in social media to protect adolescent online safety,” in Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference on Social Computing (SocialCom), Amsterdam, Netherlands, 2012.
T. Nitta, F. Masui, M. Ptaszynski, Y. Kimura, R. Rzepka, and K. Araki, “Detecting cyberbullying entries on informal school websites based on category relevance maximization,” in Proceedings of the Sixth International Joint Conference on Natural Language Processing, Nagoya, Japan, 2013.
P. D. Turney, “Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews,” in Proceedings of the 40th annual meeting on association for computational linguistics, Philadelphia, PA, 2002.
S. Hatakeyama, F. Masui, M. Ptaszynski, and K. Yamamoto, “Statistical analysis of automatic seed word acquisition to improve harmful expression extraction in cyberbullying detection,” International Journal of Engineering and Technology Innovation, vol. 6, no. 2, pp. 165–172, 2016.
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in Advances in Neural Information Processing Systems, Lake Tahoe, NV, 2013.
T. Kudo, “Yet another part-of-speech and morphological analyzer,” 2005. [Online]. Available: https://goo.gl/AM5rmb
V. Vapnik, “Pattern recognition using generalized portrait method,” Automation and Remote Control, vol. 24, pp. 774–780, 1963.
L. v. d. Maaten and G. Hinton, “Visualizing data using t-SNE,” Journal of Machine Learning Research, vol. 9, no. 11, pp. 2579–2605, 2008.
N. Amano, K. Kondo, S. Sakamoto, and Y. Suzuki, “NTT psycholinguistic databases Lexical Properties of Japanese, 1999,” 2011. [Online]. Available: https://goo.gl/6Y94VV
Cinnamon Official, “Cinnamon official Twitter account,” n.d. [Online]. Available: https://goo.gl/7uDGq6
Scikit Learn, “Machine learning in python,” n.d. [Online]. Available: https://goo.gl/q1175
K. Matsumoto, K. Akita, X. Keranmu, M. Yoshida, and K. Kita, “Extraction japanese slang from weblog data based on script type and stroke count,” Procedia Computer Science, vol. 35, pp. 464–473, 2014.
K. Matsumoto, S. Tsuchiya, M. Yoshida, and K. Kita, “Judgment of slang based on character feature and feature expression based on slangs context feature,” in International Conference on Soft Computing in Data Science, Kuala Lumpur, Malaysia, 2016.
To Cite this article
K. Matsumoto, S. Tsuchiya, T. Miyake, M. Yoshida and K. Kita, “Flame prediction based on harmful expression judgement using distributed representation,” International Journal of Technology and Engineering Studies, vol. 4, no. 1, pp. 7-15. doi: https://dx.doi.org/10.20469/ijtes.4.10002-1