Methodology social tensions and emotions in society

© ANP
The indicators were developed using publicly available posts by Dutch users on sites such as Twitter, Facebook and Instagram. Other sources were also used, such as public Dutch news websites, forums and blogs. The text of these sources was used; images and emoticons were not. Comments on posts were not incorporated into the development process, although forwarded posts such as retweets were included. Posts by spambots were removed wherever possible.

Although the proportion of the Dutch population with an active presence on the social media sites analysed – with Twitter being the primary source at approximately 60% of the posts – is not representative of the population as a whole, previous CBS research has shown that social media can be used to measure changes in indicators that are based on sentiment. This makes social media a good indicator of changes in sentiments in society. 

Social tension indicators

In selecting posts regarding feelings of social tension and unrest, CBS used qualitative research to compile a validated list of words that relate specifically to those feelings.1,2 This list was then combined with a list of words, compiled by the WODC, which contains both words related to feelings of insecurity in society and words that have a link to the topics arising from the domain of the Ministry of Justice and Security.

The percentage of posts about these topics is identified each day, based on this list. The average value in 2011, the first year of the dataset in question, is used to index the values in later years. The dashboard’s default display is the combined tension indicator, but it is also possible to view the indicators for both CBS’ and the WODC’s lists separately. In general, the three social tensions indicators present comparable results, with occasional discrepancies as a result of their different selection criteria.

The Prophet algorithm (Article on Prophet 6) is used to determine whether social tension has increased. The general trends in the social tensions indicator are used to determine the confidence interval. A confidence interval of three times the standard deviation is generated in relation to the indicators, with this interval being visualised as shading around the lines. Peaks that fall outside the confidence interval are seen as days of increased or decreased social tension. To date, only days with increased social tension have been recorded.

Sentiment filter and sentiment indicator

Using the content of the post, all posts are assigned a sentiment (positive, neutral, negative). The decision was made to set the standard option for the social tensions indicators as a calculation based on posts which are classified as positive or negative and which satisfy the word filter for social tension. This decision was made because events which lead to a great deal of social tension often cause the indicators to increase in both positive and negative sentiment. Users also have the option to display only the indicators for negative posts or to show the indicators for all posts. Positive sentiment cannot be selected because the relatively low number of positive posts makes the indicator unstable.

Emotion indicators

Separate word lists were used to compile the emotion indicators. This process was applied to each of the ‘basic emotions’: fear, happiness, sadness, disgust and anger. The sixth basic emotion, that of surprise, was also studied but did not yield any usable indicators. Concerning happiness, posts were filtered which refer to good wishes for public holidays, such as ‘Gelukkig Nieuwjaar’ (‘Happy New Year’). The word lists for emotions were compiled using WordNet (see Annex 7), a tool which displays ‘meaningfully related’ words. The emotion-related words uncovered in this process were translated into Dutch and then filtered according to their presence in the Dutch-language social media posts.

Sentiment indicator

The dashboard also offers a sentiment indicator based on the difference between the total number of positive and negative posts in the social tensions indicator. This produces a line showing the relationship between posts with these two types of sentiment. This means that the sentiment indicator’s calculation differs from the other indicator lines, which are based on a subset of posts relative to the total number of posts.

Word cloud and ranking

The dashboard offers the option to generate a word cloud and a ranking for each day. The ranking shows the level of social tension on the selected day relative to earlier days of raised tension. This calculation is only carried out for days which include a peak in social tensions: if the tension score rises above the confidence interval on that day.

The word cloud shows the 20 most commonly used words from the selected posts on that day, and a word cloud can be generated for any day. The word cloud provides the most relevant information on days with a significant increase in social tensions. Because the words collected in the word cloud are not monitored on a daily basis, it is possible that at a given time words may be visible that CBS would not normally use in its own communications.