Comments from YouTube are a rich source of data. The reactions that a video elicits can be analyzed using the comment section of a YouTube video. For instance, viral or popular videos are likely to have enough comments to be analyzed using data mining techniques and sentiment analysis tools.
YouTube Reactions Gauged by Comments is a very exciting topic that was part of my final project during my Data Mining in Cybersapce course at Northeastern University. The findings reveal that comments are highly correlated to the intended video’s outcome, but it needs an extra interpretation. For example, a negative comment can be a positive outcome if the intended video was published to gauge negative sentiments. Watch the following video from the Mathew Santoro’s channel and think about that.
This video showed that comments including negative words (phobia, scared, holes, etc) can be considered a positive outcome. If Santoro tried to gauge negative sentiments, he did a good job; actually, sentiment analysis reveals that people like this video, besides the negative sentiment of the comments.
The following image depicts the polarity distribution of each comment of the video, the black line represents the average score, and the thick red line designates neutral comments having no value and score of 0; consequently, the bulk of the comments are negative and the video’s outcome is positive.
More details about this analysis can be found on my publications, and the source code (R language), can be downloaded here:
Retrieve comments from YouTube[Download]- Data munging and sentiment analysis [Download]
- Data sets [Download]
UPDATES: The code to retrieve comments from YouTube does not work anymore. This is because services for YouTube API V2 were turned down. More information about API V2 services and migration procedures can be found in the YouTube Engineering and Developers Blog. I included original data sets of the comments in the download section.
YouTube is now a massively popular medium for users to preview content tailored to their needs and activities. Movie trailers can be viewed. Song videos can be watched and heard. Educational material in all kinds of fields can be learned. The latest news stories can be watched. And blogs, rants, and talks posted by other users can be heard. YouTube also garners viewers’ reactions by allowing posting of comments to all this material. Users’ opinions, feelings, and reviews can be captured, and threads of discussion can result. Ultimately, the very users, organizations, and entities responsible for posting their YouTube videos can study the comments sections, reflect on users’ feedback, and make changes, improvements, and even counter-arguments accordingly. That is the benefit of today’s social networking sites’ post-and-comment format, and YouTube’s comment capabilities do it excellently. Taking advantage of YouTube responses can provide incredible improvements in business, and this data mining study reveals just that. The reaction-gauging process developed and tested in the research captures the precursor of user comments and discussion of YouTube videos – that is, the recording of their mental and emotional reactions and thoughts into a physical comment to be heard and either understood or combated by the rest of the viewer population. When businesses advertise their products, they need to feed on their target market’s reactions; was it liked, disliked, and why? Are there competitors, and what did they do better? These things can be captured in video advertisement feedback of products, and the data mining methodology provides the fruitful opinions of the intended customers and haters alike. On the topic of haters and negativity, a worthwhile highlight of the research is the breakdown of an anticipated reaction to a YouTube video. Sometimes, the purpose of a video is to invoke negative reactions. In this case, haters’ comments are expected, and those are what provide the best feedback. Keep your customers close, and your haters closer, so-to-speak. The study provided a decision-tree of sorts, describing that a negative or positive reaction may or may not be helpful, depending on the initial expected reactions of the video. This shows that any and all feedback is welcome, and the nature of comments and their ability to multiply rapidly via threads and replies, including crossovers between multiple videos with comments referencing other posts, provides plenty of comments to be analyzed using the data mining methodology. This segues, however, into a drawback of the massive content-base of comments: when is it simply too much to cover? When used by businesses, a quick gauging of reactions to their product videos on the top 3000 or so comments, as mentioned in the study, is more than likely sufficient. But, you never know what rich information can be garnered as thousands upon thousands of comments pour in. Still, this is not too bad of a drawback such that it discards the methodology; in the information age, and with the advent of social networking and content-sharing, everything posted on the vast cyberspace is eventually lost in the sea of information. This goes for nearly all websites with blogs – too much information occurs almost instantly. So when it comes to analyzing said information, the application of data mining to capture feedback of a reasonable sample size is probably worthwhile. The publishing of the research is credited for showing video posters how to do just that