Final: Analyzing the Roseanne Response

For the final, we are going to be analyzing the tweets that occurred around the cancelation of the TV show "Roseanne" in May.

First, please get some background on this event if you don't remember it. Here are some articles to begin with. I recommend you read these before reading the rest of the final, as they will add important context.

I have a dataset of tweets posted on the day of the cancelation. For this project, the entire class will work as a group to label the tweets, building a publicly accessible dataset. Then, we will break into two groups. One will analyze the tweets' content. The other will analyze the network of people around the event./

A few notes before we get into it. I have run many, many large group projects like this. I have learned in painful ways that, to be successful, we need very strict organization. What that means in practice is that I keep tight control over our activities and processes. If we were working 1-on-1, I'd be very open to discussing alternative ways to store or format data, better schemes for working, etc. But with this many people, we don't have that kind of flexibility (trust me - I've tried it and it has been a nightmare for everyone). So even if you can see a way to do this that seems more efficient, please don't change up what we're doing. I promise I've made all the decisions about how to handle this to work well given my deep knowledge about the dynamics of these large virtual teams and where things go wrong. If I say to use a Google Doc, please use one. If I give you a spreadsheet, please do not change the formatting. What I'm sharing is simplified for a reason and added complexity is going to break things - even if it seems like it might be better to you with your changes.

Dataset Building

In phase 1 of the project, we will build a dataset. This involves labeling the tweets into the following categories:

  • Pro-Roseanne - supporters of Roseanne Bar who are opposed to her firing. Because she was fired for a racist tweet, any tweets that talk about politically correct police, liberal snowflakes, pro-Trump tweets, #maga, etc. can be put into this category. @therealroseanne Seriously? Guess the snowflakes can dish it out but can't take it. Nothing to forgive here. Still... https://t.co/oHC8i1MLrQ
  • Anti-Roseanne - people who support the firing and oppose Roseanne (or at least her content). Again, because of the politics around this issue, anyone with anti-Trump comments in this context can also be marked as Anti-Roseanne (unless the tweet is obviously anti-Roseanne but pro-Trump). @CarJoJoe @TT45Pac @therealroseanne Asshole = A Racist, bigoted, uneducated, ignorant white Trump supporter.
  • Neutral - these are on topic but usually things like news stories or statement of fact, e.g. "Roseanne Canceled by ABC"
  • Unclear/Unrelated - some tweets are arguments between people who are insulting one another without their position on Roseanne being clear. Some are single words (e.g. "Roseanne" is the entire tweet). Some just don't make sense. It's important that we not guess about the pro/anti/neutral stance. If the tweet is confusing, put it in this category. This may end up being a large percentage of the tweets and that is fine.

To categorize the tweets, read only the text provided. Do not follow the links or look up the tweets. Make your judgements only on the text you can see.

When doing this kind of labeling, we need to make sure we all understand how to do it. Thus, you will begin by coding a set of sample tweets

Deadline: July 11

Complete the sample coding at this link: https://goo.gl/forms/ZSQ344CZQ5vdfZP63

When you are done, you can compare your answers to others here: https://docs.google.com/spreadsheets/d/1cBxlb8Ks_bgipOUZJ6Nz9DCUtX81no4RTzSCKeHpL8E/edit?usp=sharing I have provided the correct answers. If you don't understand why your answers don't match mine, please discuss in the Final: Sample Coding Discussion page on Canvas.

Next, we will label the larger dataset. Check the email I sent to the class for a link to that spreadsheet. There are two tabs, so find the one with your last name's first letter in the range. You'll see I put your name in the first column. DO NOT RESORT THIS SPREADSHEET! Just scroll to find your name in the right tab. Names are in alphabetical order

You each have 1500 tweets to label. You must use one of the codes we have trained on: Neutral, Unclear/Unrelated, Pro-Roseanne, Anti-Roseanne. It must have that exact capitalization and spelling. I put in a few codes on each sheet so it should auto-fill for you which will minimize the chance for error.

Each tweet is being coded by two people, which will allow me to reconcile the disagreements later on. I will also use this to check you - if you have a LOT of disagreements, I'm going to take a very close look to make sure you were doing this correctly. If you just randomly or thoughtlessly enter labels, I will know.

That said, disagreements happen and I do not want you to agonize over the codes. Give the tweet a careful read, make a pretty fast decision (there are only 4 categories, so this shouldn't be a big struggle), and label it. I know people work at different speeds, but if you find this is on pace to take more than a few hours, you are probably spending too much time.

This should take around 2 hours to complete. If you find yourself agonizing over how to categorize a tweet, you are doing it wrong! Give the tweet a careful read, decide the best label, and put it there. Do not try to be creative or clever - everyone should be able to understand what you did. If you are not sure, put it in the Unclear category.

Deadline: July 18

I will send you a set of tweets to code by July 11th. They must be coded by July 18th. I will be checking everyone's coding and if you have a bunch of incorrect codes, I will know and you will lose points. Pay close attention to this.

Team Analysis

Once we are done with the dataset, we will create two teams to do two types of analysis.
Deadline: July 18

You must have your team selected by July 18. Pick your team here: https://goo.gl/forms/4aj0Rg6gUqKhMvup2

Team 1: Content Analysis

This team will develop a deeper analysis of the themes within the tweets. Specifically, we will look at the pro-Roseanne and anti-Roseanne tweets. Within each group, we will use a thematic analysis (a common qualitative analysis technique) to develop a set of high-level themes discussed by each group. Please read this article on thematic analysis: https://www.psych.auckland.ac.nz/en/about/our-research/research-groups/thematic-analysis/about-thematic-analysis.html. This is the process we will follow.

Example pro-Roseanne themes that we come up with may be something like "Donald Trump", "Insulting liberals", "Denial of Racism". Anti-Roseanne tweets may have themes like "Thanking ABC", "Criticizing Trump", "Insulting/Mocking Roseanne", etc. (I don't know if these are right - they are just rough guesses based on reading a few tweets). As a group, we will iteratively develop this set of themes by reading the tweets and discussing together. The working document for our thematic analysis is here: https://docs.google.com/document/d/1vwKVSI-mteP4MYEM_utTl8BhniCqlyJBkjA2K-nZ0lE/edit?usp=sharing

Deadline: July 25

Our set of themes should be complete by July 25. You are all expected to actively participate in developing the themes over the course of the week - you can't just jump in on the morning of the 25th and participate. I will review the document over the week, but you must also submit an activity log that roughly details your participation in the discussion over the course of the week. It's fine if you only spend a few minutes checking in and commenting each day, but you as a group (i.e. without my intervention) must have a final list of themes by the 25th. If you slack off, I will know. Submit your activity log via canvas under Final: Team Activity Log 1.

Once we have the themes finalized, I will assign each of you a batch of tweets. Like you did in the whole class Phase 1 activity, you will label each tweet, but you will label them with all the relevant themes instead of pro/anti/neutral labels. Each tweet will be labeled by 2 people. If there are disagreements, those 2 people will sort them out or decide to leave the disagreement (I will resolve these).

Deadline: August 3

All tweets must be labeled with themes by August 3. I will spend that weekend resolving any differences. Nothing to turn in here other than your labels which will go in a shared spreadsheet we create together.

Finally, your group will write up the results of your thematic analysis. You will describe the themes and how often each was found in your datasets. You will share any additional insights about what this says about the supporters/opponents and the larger issues for them. You will produce one document which will become a major section of the resulting paper we submit for publication. You are responsible not just for writing up your results but for producing a smooth, well-written document. That means editing, integrating text, and making the writeup work is an important part of this task.

Deadline: August 17

Final writeup is due. Your group should email Jen a link to your shared Google doc when you begin work on it. That is what will be graded for this section.

Team 2: Network Analysis

This team will analyze a network of participants in the Roseanne discussion. On July 18, I will provide you with a Gephi network file. As a group, you need to divide up the work of calculating statistics and performing an analysis on this network similar to what you did on the Enron assignment. You will try to find important nodes, clusters, and explain why they are linked. This will involve computing statistics, creating visualizations, and connecting the stats to things you can get from reading the tweets of these people on Twitter.

The first step will be to run a bunch of statistics and create some good visualizations. As a group, you need to lay out what statisics you want and who will do what. You need to organize this amongst yourselves. Notes, statistics, and organization should happen in this Google Doc: https://docs.google.com/document/d/1N4y6ackCArsl8Sn_i1T2Ew4_eMFUjb2qgm3V9sqvHNg/edit?usp=sharing

Deadline: July 25

Your initial statistics and visualizations should be complete by July 25th along with a list of analysis activities to carry out going forward that will let you connect those statistics to the content. You are all expected to actively participate in the group organization and computation over the course of the week - you can't just jump in on the morning of the 25th and participate. I will review the document over the week, but you must also submit an activity log that roughly details your participation in the discussion over the course of the week. It's fine if you only spend a few minutes checking in and commenting each day, but you as a group (i.e. without my intervention) must have a final list of statistics, shared gephi files to work from, and analyses to complete, by the 25th. If you slack off, I will know. Submit your activity log via canvas under Final: Team Activity Log 1.

Next, you will carry out all that analysis. Each person should claim certain tasks and connect those statistics to the content. Keep detailed notes on your stats, the analysis you do, and the insights you find.

Deadline: August 3

All analysis linking the stats to Twitter content must be complete on August 3. Everyone's analysis notes will go into a shared document by this date.

Finally, your group will write up the results of your analysis. You will describe the statistics and insights. You will produce one document which will become a major section of the resulting paper we submit for publication. You are responsible not just for writing up your results but for producing a smooth, well-written document. That means editing, integrating text, and making the writeup work is an important part of this task.

Deadline: August 17

Final writeup is due. Your group should email Jen a link to your shared Google doc when you begin work on it. That is what will be graded for this section.

Deadlines

All deadlines are 11:59pm on the date listed
  • July 11 - Sample Coding Complete
  • July 18 - Coding Complete
  • July 18 - Team Selected
  • July 25 - Team Activity Log 1
  • August 3 - Team Analysis Complete
  • August 17 - Final Document

Grading

  • 25% - Phase 1 coding. You cannot receive credit for this if you don't do the sample coding. Accuracy matters
  • 25% - Team activity organization up through and including Activity log 1
  • 25% - Analysis due Aug 3
  • 25% - Final document