YouTube recommendation algorithm

From SI410
Revision as of 20:24, 27 March 2020 by Luukkoe (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

YouTube’s recommendation algorithm is the method by which new videos are recommended to users on the YouTube video-sharing app. The algorithm is designed to present users with engaging new videos, and it generates new recommendations based on both the user’s history of watched videos and the recorded tendencies of other app users. The algorithm employs deep-learning to narrow down millions of possible videos to a smaller subset, which are then displayed to the user. YouTube also employs an algorithm to autocomplete search queries, based on search history and what searches are currently most popular. These algorithms have undergone changes since YouTube’s creation in 2005, but their fundamental principle remains the same: to maximize user engagement. Many ethical concerns arise due to the use of these algorithms, and YouTube has taken several steps in recent years to resolve some of these issues.

Basic model for YouTube's recommendation algorithm

Algorithm details

The recommendation algorithm consists of two neural networks: one for video candidate generation and one for ranking[1].

Candidate Generation

During candidate generation, the algorithm reads data from the user’s activity history (previously watched list of videos), and generates a subset of a few hundred possible videos from the larger corpus of millions of videos in the database. This is achieved using a deep neural network, which is trained using user history and implicit video feedback. Implicit feedback for a video refers to its number of total watches and whether users fully completed the video. Explicit feedback, on the other hand, includes objective values such as video thumbs up/down and survey results. Discriminating between videos using these categories is done using a softmax classifier and a nearest neighbor search algorithm.[2] Specific inputs to the algorithm include users’ search history, watch history, geographic location, device used, gender, logged-in state, and age. Furthermore, according to Google, “training examples are generated from all YouTube watches (even those embedded on other sites) rather than just watches on the recommendations we produce. Otherwise, it would be very difficult for new content to surface and the recommender would be overly biased towards exploitation." All of these factors serve as input to the algorithm at different stages, depending on the context and goal, such as generating home page refresh results versus specific search results. This phase of the algorithm selects a few hundred videos to move on to the ranking phase.

YouTube related-music videos network. Colours indicate different clusters.[3]

Ranking

During video ranking, scores are assigned to each video, using additional ranking factors. These scores, which are usually a derivative of expected user watch time, are assigned using logistic regression. Features considered include video thumbnail, previous user interaction with the channel that posted the video, and whether the video has previously been recommended to the user. Watch time is used instead of click-through rate in order to avoid “deceptive videos that the user does not complete (‘clickbait’)." The list of videos is then sorted by score and output to the user’s page.



Timeline of Changes

YouTube has made many changes to their algorithms over time, including their video recommendation algorithm.[4]

2012

Before 2012, YouTube’s recommendation algorithm used view count as its sole metric for recommending new videos. The problem with this method was that it rewarded “clickbait” videos that users may click on but not spend substantial time watching. In 2012, YouTube altered the algorithm to favor video watch time instead of total views. One problem with this method was that content creators began adding longer “intros” to their videos, so that viewers would have to spend more time watching in order to get to the point of the video.

2016

As 2016 approached, YouTube again re-analyzed their algorithm, releasing a research paper “Deep Neural Networks for YouTube Recommendations.” This paper, mentioned in “Algorithm Details,” outlined their new machine learning tactics that took into consideration multiple additional factors of video engagement.

2017

YouTube made efforts to reduce inflammatory content present in news-related videos.[5]

2018

YouTube changed their monetization policy, raising the bar for which videos and channels were eligible for ads generating revenue. Previously, in order to be eligible for ads, content creators needed 10,000 total views of their channel. In 2018, this restriction was changed to 4,000 watch hours of annual view time and 1,000 subscribers. [6]

2019

YouTube announced a ban on radical and borderline content, saying “we’ll begin reducing recommendations of borderline content and content that could misinform users in harmful ways—such as videos promoting a phony miracle cure for a serious illness, claiming the earth is flat, or making blatantly false claims about historic events like 9/11.”[7]

Ethical Concerns

Radicalization

YouTube users tend to find radical content more engaging

The YouTube recommendation algorithm has faced scrutiny for recommending radical videos based on innocuous user history. For example, watching Donald Trump rallies led one user to be recommended white supremacy and Holocaust denial speeches. This same user, after watching videos about vegetarianism, was recommended videos about veganism[8]. These recommendations led Zeynep Tufekci, a writer for The New York Times, to assert that YouTube’s algorithm is exploiting “natural human desire: to look “behind the curtain,” to dig deeper into something that engages us." She cites Google’s bottom line as the cause: “YouTube leads viewers down a rabbit hole of extremism, while Google racks up the ad sales." Tufekci is not alone in condemning the Google and YouTube business model. Ben Popken, of NBC news, criticized YouTube for recommending extremist videos in order to capture as much of users’ time as possible and make more money. The consequence of these recommendations, combined with YouTube’s large audience, according to Chaslot, a software engineer in artificial intelligence, is “gaslighting people to make them believe that everybody lies to them just for the sake of watch time”[9].

Mar Masson Maack commented on this affect, citing Guillaume Chaslot: “divisive and sensational content is often recommended widely: conspiracy theories, fake news, flat-Earther videos, for example. Basically, the closer it stays the edge of what’s allowed under YouTube’s policy, the more engagement it gets”[10].

Algorithm inputs and privacy

In order to make the recommendation algorithm as effective as possible, YouTube gathers several user data points. According to Google’s Privacy & Terms, YouTube collects[11]:

  • Your name and password
  • Unique Identifiers such as your browser, device, application you are using, device settings, operating system, mobile network information such as carrier and phone number, and Google/YouTube application version number
  • Payment information
  • Email Address
  • Content you create, upload, or receive from others, such as emails, photos, videos, documents, spreadsheets, and YouTube comments.
  • Terms you search for
  • Videos you watch
  • Your interaction with content and ads
  • Your voice and audio information (if you use audio features such as dictation)
  • Purchase activity
  • People with whom you communicatie
  • Activity on third-party sites that use Google services
  • Chrome browser history
  • Your geographical location
  • Information about things near your device (Wi-Fi, cell towers, bluetooth devices)
  • Information about you available through public sources
  • Information about you gathered by their marketing partners and advertisers

This information is then stored and used for a variety of purposes, including customized search results, personalized ads, and making improvements to their software. For some people, this information is very personal, and its use could be compromising.

Enabling Pedophilia

YouTube faced scrutiny in 2019 regarding the algorithm's recommendation pattern of videos of children. According to The New York Times, the algorithm correlates erotic videos with videos of children, which enables pedophilia. "On its own, each video might be perfectly innocent, a home movie, say, made by a child. Any revealing frames are fleeting and appear accidental. But, grouped together, their shared features become unmistakable." [12] Forbes commented on this issue, saying the algorithm is successfully working to maximize engagement from every possible audience, but that this comes at the cost of endangering children and sacrificing moral values. They argue YouTube's primary goal should instead be "ensuring that algorithmic models don’t violate moral codes in their ruthless pursuit of business objectives." [13].

Children's privacy

In September of 2019, Google was fined $170 million for violating the Children’s Online Privacy Protection Act[14]. According to Makena Kelly, Google refused to acknowledge that portions of YouTube were specifically geared towards children, and as such they lacked appropriate privacy policies for videos for children. After the COPPA case, YouTube created more algorithms to identify content geared toward children, stating that they “will limit data collection and use on videos made for kids only to what is needed to support the operation of the service”[15].

Restriction of LGBTQ content

YouTube is currently being sued by a group of content creators, on the grounds of “unlawful content regulation, distribution, and monetization practices that stigmatize, restrict, block, demonetize, and financially harm the LGBT Plaintiffs and the greater LGBT Community”[16][17]. According to these creators, the YouTube algorithm targets channels and videos with keywords such as “gay,” “bisexual,” or “transgender." According to Tom Foremski, the issue is not that YouTube is purposefully discriminatory, its that they, in the interest of saving money on labor, use algorithms in place of human-moderated content; “Google's algorithms are not that smart -- especially when it comes to cultural and political issues where they can't discriminate between legitimate and harmful content. The software has no understanding of what it is viewing[18].

"Gaming" the algorithm

The disastrous result of a weakness in YouTube's autocomplete algorithm[19]

YouTube's search autocomplete algorithm recommends search queries based on what a user has typed in and what other people have searched for. [20] This algorithm has the potential to be taken advantage of by people who search for a given query many times in order to make it appear more popular to the algorithm. This "gamed" search result would then be recommended to other users via the autofill algorithm. This loophole in the algorithm allowed for the search result "how to have s*x with your kids" to be recommended when users just typed "how to have." [21] In response to this issue, YouTube released a statement detailing changes to the algorithm and a guide on how to report offensive autocomplete results for removal. [22]


See Also

References

  1. Paul Covington, Jay Adams, Emre Sargin, "Deep Neural Networks for YouTube Recommendations." Google, 2016, https://storage.googleapis.com/pub-tools-public-publication-data/pdf/45530.pdf
  2. Adrian Rosebrock. "Softmax Classifiers Explained." pyImageSearch, September 12 2016, https://www.pyimagesearch.com/2016/09/12/softmax-classifiers-explained/
  3. Massimo Airoldia, Davide Beraldob, Alessandro Gandinic, "Follow the algorithm: An exploratory investigation of music on YouTube." May 2016, https://www.researchgate.net/profile/Massimo_Airoldi/publication/303096460_Follow_the_algorithm_An_exploratory_investigation_of_music_on_YouTube/links/5ca7483792851c64bd513531/Follow-the-algorithm-An-exploratory-investigation-of-music-on-YouTube.pdf
  4. Paige Cooper. “How Does the YouTube Algorithm Work? A Guide to Getting More Views.” Hootsuite, April 8 2019, https://blog.hootsuite.com/how-the-youtube-algorithm-works/
  5. Emily Birnbaum. "YouTube removed 58 million videos in latest quarter." The Hill, December 13 2018, https://thehill.com/policy/technology/421106-youtube-removed-78-million-videos-in-latest-quarter
  6. Reshma Mandal. “New Changes to YouTube Monetization in 2018 to Better Protect Creators.” Digital Ready, March 10 2018, https://digitalready.co/blog/new-changes-to-youtube-monetization-in-2018-to-better-protect-creators
  7. “Continuing our work to improve recommendations on YouTube.” Official YouTube Blog, January 25 2019, https://youtube.googleblog.com/2019/01/continuing-our-work-to-improve.html
  8. Zeynep Tufekci. "YouTube, the Great Radicalizer." March 10 2018, https://coinse.io/assets/files/teaching/2019/cs489/Tufekci.pdf
  9. Ben Popken. "As algorithms take over, YouTube's recommendations highlight a human problem." April 19, 2018, https://www.nbcnews.com/tech/social-media/algorithms-take-over-youtube-s-recommendations-highlight-human-problem-n867596
  10. Mar Masson Maack. "‘YouTube recommendations are toxic,’ says dev who worked on the algorithm." TNW News, 2019, https://thenextweb.com/google/2019/06/14/youtube-recommendations-toxic-algorithm-google-ai/
  11. Google Privacy Policy, https://policies.google.com/privacy?hl=en
  12. Max Fisher and Amanda Taub, June 2019. https://www.nytimes.com/2019/06/03/world/americas/youtube-pedophiles.html
  13. Forbes, June 2019. https://www.forbes.com/sites/forrester/2019/06/14/youtubes-algorithmic-pedo-failure/#14ebae097e91
  14. https://www.theverge.com/2019/9/4/20848949/google-ftc-youtube-child-privacy-violations-fine-170-milliion-coppa-ads
  15. Makena Kelly. "Google will pay $170 million for YouTube’s child privacy violations." The Verge, September 4 2019, https://www.theverge.com/2019/9/4/20848949/google-ftc-youtube-child-privacy-violations-fine-170-milliion-coppa-ads
  16. Julia Alexander. "LGBTQ YouTubers are suing YouTube over alleged discrimination." The Verge, August 14 2019, https://www.theverge.com/2019/8/14/20805283/lgbtq-youtuber-lawsuit-discrimination-alleged-video-recommendations-demonetization
  17. https://www.zdnet.com/article/lgbtq-the-missing-letters-in-googles-youtube-alphabet-and-the-moral-struggle-of-algorithms/
  18. Tom Foremski. "LGBTQ: The missing letters in Google’s YouTube alphabet and the moral struggle of algorithms." ZDNet, August 30, 2019, https://www.zdnet.com/article/lgbtq-the-missing-letters-in-googles-youtube-alphabet-and-the-moral-struggle-of-algorithms/
  19. Geoff Weiss. "YouTube Investigating Pedophiliac Phrases In Autocomplete Search Suggestions." Tubefilter, November 27 2017, https://www.tubefilter.com/2017/11/27/youtube-pedophiliac-phrases-autocomplete/
  20. Google Search Help, https://support.google.com/websearch/answer/106230?hl=en
  21. Charlie Warzel. "YouTube's Search Autofill Surfaced Disturbing Child Sex Results." BuzzFeed News, November 26 2017, https://www.buzzfeednews.com/article/charliewarzel/youtubes-search-autofill-is-surfacing-disturbing-child-sex#.emrqYkvkV
  22. Tamar Yehoshua. "Google Search Autocomplete." Google, June 10 2016, https://blog.google/products/search/google-search-autocomplete/