The Center for Survey Research (CSR) at East Carolina University collected more than 150,000 tweets (of which 85,606 could be categorized as positive or negative for a specific candidate) made during the hours of the Democratic national debate on November 20, 2019. Using sentiment analysis to calculate a ratio of positive to negative tweets for each of the ten candidates, Elizabeth Warren received the highest percentage of positive tweets, whereas former Vice President Joe Biden, one of the leading contenders for the Democratic Party’s presidential nomination, failed to impress the Twitter audience. With the exception of Tulsi Gabbard, tweets about Biden during the debate were the most negative. Biden also generated 14,528 tweets, more than any other candidate. This combination of a high number of tweets that were overwhelmingly negative make Joe Biden one of the debate’s clear losers from Wednesday night among the Twitter audience. (For additional information about the analysis, see the methodology section below.)
In addition, the results that emerge from the sentiment analysis of tweets, reveal some good news for Cory Booker. Booker finished a close second, only narrowly behind Elizabeth Warren in positive tweets. The one piece of bad news for the Booker campaign is that he generally failed to excite those who took to Twitter during the debate. His 5,548 categorized tweets ranked eighth, ahead of only Tom Steyer (2,815) and Amy Klobuchar (5,529).
Rounding out the field, Andrew Yang continued to score comparatively high among Twitter users, finishing third based on the ratio of positive tweets. (Previous analysis from ECU’s CSR showed Yang finishing in first in October’s debate and second in September’s debate.) Yang’s 8,826 tweets, however, ranked sixth, suggesting, much like Cory Booker, that Yang generated below average attention from Twitter users during Wednesday night’s debate. Kamala Harris, conversely, generated the second highest number of categorized tweets, but ranked sixth when analyzing her positive-to-negative ratio of tweets. The same was largely true for two other leading candidates, Pete Buttigieg and Bernie Sanders. Twitter users mentioned both candidates frequently on Wednesday night in their tweets during the debate; yet, less so in a positive way when compared to Warren, Booker, and Yang.
Our results parallel some of the post-debate analysis from political writers and pundits who likewise identified Joe Biden as one of the losers from last night’s debate and viewed Cory Booker’s debate performance favorably, as well as that of Andrew Yang. In other instances, our results run counter to analyses of Wednesday’s debate. Amy Klobuchar and Pete Buttigieg received high marks for their debate performances from several commentators; however, our analysis failed to show a similar reaction from Twitter users. Additionally, while some commentators praised Elizabeth Warren’s debate performance, many of the analyses highlighted other winners.
As we have noted in previous reports, Twitter users are certainly not representative of the nation’s population as a whole. Nonetheless, Twitter users do reflect an audience of people who are engaged enough politically to voice their opinions. In primary elections, where voter turnout is often low, this engaged audience seems worthy of some attention.
If public opinion moves in Twitter’s direction, then Warren is likely to see her polling numbers rise in the days and weeks to come, while Joe Biden’s campaign could see further erosion in his support for the Democratic presidential nomination.
* * * * *
About the Authors
Dr. Peter L. Francia is the Director of the Center for Survey Research and is a professor of political science at East Carolina University. He can be reached at 252-328-6126 or at franciap@ecu.edu.
Dr. Baekkwan Park is a Senior Data Analyst at the Center for Survey Research at East Carolina University.
Dr. Venkat N. Gudivada is chairperson and professor in the Department of Computer Science at East Carolina University.
Jennifer Andriot is a graduate student in data science in the Department of Computer Science at East Carolina University.
Methodology Summary
The analysis in this article is based on a collection of more than 150,000 tweets containing hashtags #demdebate, #demdebates, #democraticdebate, #democraticdebates, #presidentialdebate, and #presidentialdebates from Twitter between 8:00 – 11:00 p.m. on November 20, 2019, through twitter streaming API. To generate our results, we filtered the data so that the analysis contained only tweet texts for the each of the twelve Democratic primary candidates. The analysis relied on a supervised machine learning approach (Support Vector Machine, accuracy 83.1). This required annotated training data. We manually annotated about 4,000 tweets. Five groups of students were assigned a random set of tweets and manually labeled each tweet into two classes: positive or negative. In order to check the inter-coder reliability, we examined all the labeled data again. After retrieving and filtering all the tweets about the primary debate, we pre-processed the texts to eliminate noisy data. After classifying the tweets into two categories (positive or negative), we calculate the proportion of positive tweets over the total tweets for each candidate.