Social media has been critical for organization, spreading thoughts, and sharing streaming video of the Ferguson protests. In this project, I examine Twitter users general sentiment torward important topics related to Ferguson, police brutality, protesting, and the media both within Ferguson and around the world.
Twitter activity related to Ferguson peaked around mid-August. During a period
of two hours on August 17, 262,999 tweets using the word
collected using the Twitter Streaming API. The Twitter Streaming API returns a
random sample of 1% of all tweets containing a word or phrase, with or without a
#ferguson were included in the data set).
Many of these tweets were non-original. A preliminary analysis showed that 81% of the tweets collected were retweets of a subset of more popular tweets. To prevent the opinions of several popular Twitter users from overpowering the original opinions of other users in the analysis, these retweets were eliminated, leaving 46,604 original tweets.
A simple measurement of word frequency was used to find the common themes of Ferguson tweets. An individual tweet was given a positivity score using the MPQA subjectivity lexicon to measure positivity of each of its words. The positivity of a common word was calculated by finding the mean positivity of individual tweets containing that word. Combining sentiment and co-occurrence frequency of these common words was used to create a topic graph showing the relation of these words.
The chart above shows the frequency of the 40 most common words used in the Ferguson tweets. Stopwords (such as a, the, for, etc.) were eliminated using the Reuters-21578 Text Categorization Test Collection stopword list.
The top five most common words were
curfew. Less common words included
Common words can be used as a simple way to determine the common subjects of
tweets. In terms of frequency, police violence is one of the main issues on
people’s minds. Racial words, like
white are present, but less
common in conversation.
A subjectivity lexicon contains a list of words and their corresponding polarity
and magnitude. In the MPQA lexicon, words are labeled as positive, negative,
neutral, or both and strongly or weakly subjective. For this project, strong
words were given an absolute value score of 2 and weak words a score of 1, with
positive words being positive and negative words being negative. For example,
horrible would be scored as -2, while
okay would be scored as 1.
The polarity score of a tweet was simply the sum of the polarity score of its words. The distribution of tweet scores is shown above. One drawback of this coding scheme is that it cannot distinguish between low-sentiment tweets that contain mostly neutral words and mixed-sentiment tweets that contain both positive and negative sentiment. However, as tweets can only contain 140 characters, it is difficult to express more than one sentiment per tweet, so this coding scheme should be less of a problem when applied to Twitter data.
The plot above shows the distribution of positivity scores for all 46,604 original tweets. The modal sentiment is zero. Zero-sentiment tweets made up 34% of all original tweets, while 30% of tweets expressed an overall positive sentiment and 35% expressed an overall negative sentiment.
The diagram above shows the distribution. The modal sentiment is zero. 34% of tweets were in this category, while 30% of tweets expressed negative sentiment toward Ferguson and 35% expressed positive sentiment. This is a fairly even distribution of sentiment.
One possible problem with analyzing tweets is the kind of people who tweet in the first place. It is possible that people who feel strongly about Ferguson are more likely to tweet about it, resulting in a bimodal distribution. What the results show in the case of Ferguson is that this was not a problem. Tweets appear to be distributed across a wider spectrum of sentiment. The data appear to be reflective of more general sentiment rather than sentiment of people with extreme opinions.
This graph uses spectral clustering to group some of the more commonly used words by how frequently they are mentioned together in tweets. The red coloring is proportional to the negativity expressed in tweets related to those topics. While it’s often difficult to identify the meaning behind clusterings of this kind, a number of distinct clusters emerged (clockwise, starting from the top):
- Events on the ground of the Fersugon protests
- Micheal Brown, his autopsy, and the inciting event
- Traditional media coverage and citizen journalism
- General, broader topics from a black focus/perspective
- General, broader topics from a white focus/perspective
The most striking result is the treatment of the word
black by Twitter users,
both in terms of sentiment and clustering. In most cases it is used in a
negative context; however, the reason for this may differ based on the Twitter
user’s political views and perspective of the situation.
Some Twitter users use the word
black in tweets that make negative comments
regarding race, counting as a negative usage of the word
black. However, many
tweets use this word negatively with respect to the situation rather than the
group of people. Because the situation is highly negative, this gives the word
black a highly negative positivity score, even though the negative sentiment
is directed toward the police, Ferguson, the media, or the concept of police
brutality rather than the group of people.