We anticipate that you have questions. Here are some that may come to mind; if you have others, please let us know!

  • Why did you not use real Facebook/Twitter data?
    • As we were constructing this assignment, we had to consider what was most appropriate for our audience. At a university, and especially at a large one like UIUC, students come from all walks of life. A significant portion of the content on Facebook – and particularly the types of posts we’d need to include in our dataset – would likely be extremely graphic, offensive, or both. We concluded that in order to provide all students with an optimal and safe learning experience, we should aim for our dataset and assignment to be neutral, while still being as illustrative as possible. We hope that we achieve this balance, but if you feel there is a different way we could set up the assignment, please let us know!
  • How did you gather the project data?
    • The data is pulled from the r/politics subreddit and ESPN news headlines. create the dataset, we merged a subset of posts from each source together, assigning on-topic labels for the posts from ESPN and off-topic labels to the posts from r/politics. As mentioned earlier, there are a number of interesting posts that may be labeled as off-topic, but may actually be on-topic for SportIt(and vice-versa for posts labeled as on-topic) – it is our hope that students see these interesting patterns and think critically about the effects of either removing posts from a platform or not.