Potential and Limits of Automated Classification of Big Data - A Case Study

Abstract

Potentiale und Grenzen der automatischen Klassifikation von Big Data – Eine Fallstudie«. This case study highlights the potentials and limits of big-data analyses of media sources compared to conventional, quantitative content analysis. In an FFG-funded multidisciplinary project in Austria (based on the KIRAS security research program), the software tool WebLyzard was used for an automated analysis of online news and social media sources (comments on articles, Facebook postings, and Twitter statements) in order to analyze the media representation of pressing societal issues and citizens’ perceptions of security. Frequency and sentiment analyses were carried out by two independent observers in parallel to the automated WebLyzard results. Specific articles on selected key topics like technology or Muslims in two major online newspapers in Austria (Der Standard and Kronen Zeitung) were counted, as were user comments, and both were evaluated according to different sentiment categories. The results indicate various weaknesses of the software leading to misinterpretations, and the automated analyses yield substantially different results compared to the sentiment analysis carried out by the two raters, especially for cynical or irrelevant statements. From a social-sciences methodological perspective, the results clearly show that methodology in our discipline should promote theory-based research, should counteract the attraction of superficial analyses of complex social issues, and should emphasize not only the potentials but also the dangers and risks associated with big data.

Publication
Historical Social Research/Historische Sozialforschung