Content moderation is the process of monitoring, filtering and removing online user-generated content according to the rules of a private organization or the regulations of a government. It is used to restrict illegal or obscene content, spam, and content considered offensive or incongruous with the values of the moderator. When applied to dominant platforms with significant influence, content moderation may be conflated with censorship. Ethical issues involving content moderation include the psychological effects on content moderators, human and algorithmic bias in moderation, the trade-off between free speech and free association, and the impact of content moderation on minority groups.


Most types of moderation involve a top-down approach, where a moderator or small group of moderators are give discretionary power by a platform to approve or disapprove user-generated content. These moderators may be paid contractors or unpaid volunteers. A moderation hierarchy may exist or each moderator may have independent and absolute authority to make decisions.

In general, content moderation can be broken down into 6 major categories.[1]

  • Pre-Moderation screens each submission before it is visible to the public. This creates a bottleneck in user-engagement, and the delay may cause frustration in the user-base. However, it ensures maximum protection against undesired content, eliminating the risk of exposure to unsuspecting users. It is only practical for small user communities, and was common in moderated newsgroups on Usenet.[2]
  • Post-Moderation screens each submission after it is visible to the public. While preventing the bottleneck problem, it is still impractical for large user communities. Furthermore, as the content is often reviewed in a queue, undesired content may remain visible for an extended period of time, drowned out by benign content ahead of it, which must still be reviewed.
  • Reactive moderation reviews only that content which has been flagged by users. It retains the benefits of both pre- and post-moderation, allowing for real-time user-engagement and the immediate review of only potentially undesired content. However, it is reliant on user participation and is still susceptible to benign content being falsely flagged. Most modern social media platforms, including Facebook and YouTube, rely on this method.
  • Distributed moderation is an exception to the top-down approach. It instead gives the power of moderation to the users, often making use of a voting system. This is common on Reddit and Slashdot, the latter also using a meta-moderation system, in which users also rate the decisions of other users.[3] This method scales well across user-communities of all sizes, but also relies on users having the same perception of undesired content as the platform. It is also susceptible to groupthink and malicious coordination, also known as brigading.[4]
  • Automated moderation is the use of software to automatically assess content for desirability. It can be used in conjunction with any of the above moderation types. Its accuracy is dependent on the quality of its implementation, and it is susceptible to algorithmic bias and adversarial examples[5]. Copyright detection software on YouTube and spam filtering are examples of automated moderation[6].
  • No moderation is the lack of moderation entirely. Such platforms are often hosts to illegal and obscene content, and typically operate outside the law, such as The Pirate Bay and Dark Web markets. Spam is a perennial problem for unmoderated platforms, but may be mitigated by other methods, such as limited posting frequency and monetary barriers to entry. However, small communities with shared values and few bad actors can also thrive under no moderation, like unmoderated Usenet newsgroups.


Pre-1993: Usenet and the Open Internet

Usenet emerged in the early 1980s as a network of university and private computers, and quickly became the world's first Internet community. The decentralized platform hosted a collection of message boards known as newsgroups. These newsgroups were small communities by modern standards, and consisted of like-minded, technologically-inclined users sharing the hacker ethic. This collection of principles, including "access to computers should be unlimited", "mistrust authority: promote decentralization", and "information wants to be free", created a culture that was resistant to moderation and free of top-down censorship.[7] The default assumptions were of users acting in good faith and that new users could be gradually assimilated into the shared culture. As a result, only a minority of newsgroups were moderated, most allowing anyone to post however they pleased, as long as they followed the community's social norms, known as "netiquette."[8] Furthermore, the Internet in general was considered separate and distinct from the physical space its servers were located, existing in its own "cyberspace" not subject to the will of the state. Throughout this era of the Open Internet, online activity mostly escaped the notice of government regulation, creating a policy gap that only began to close in the late 1990's.[9]

1994 - 2005: Eternal September and Growth

In September 1993, AOL began offering Internet access to the general public. The resulting flood of users arrived too quickly and in too many numbers to be assimilated into the existing culture, and the shared values that had allowed unmoderated newsgroups to flourish were lost. This was known as the Eternal September, and the resulting growth transformed the Internet from a high-trust to a low-trust community.[10] The consequences of this transformation were first seen in 1994, when the first recorded instance of spam was sent out across Usenet.[11] The spam outraged Usenet users, and the first anti-spam bot was created in response, ushering in the era of content moderation.[12]

With the invention of the World Wide Web, users began to drift away from Usenet, while thousands of forums and blogs emerged as replacements. These small communities were often overseen by single individuals or small teams, and exercised total moderating control over their domains. In response to the growth of spam and other bad actors, these often had much stricter rules than early Usenet groups. However, the vast marketplace of available forums and places of discussion was such that, if a user did not like the moderation policies in one platform, they could easily move to another.

As corporate platforms matured, they began to adopt limited content policies as well, though in a more ad-hoc manner. In 2000, Compuserve was the first platform to develop an "Acceptable Use" policy, which banned racist speech[13] eBay soon followed in 2001, banning the sale of hate memorabilia and propaganda.[14]

2006 - 2010: Social Media and Early Corporate Moderation

In the mid-2000s, social media platforms such as YouTube, Twitter, Tumblr, Reddit, and Facebook began to emerge, and quickly became dominant, centralized platforms that gradually displaced the multitude of blogs and message boards as a place for user discussion. These platforms initially struggled with content moderation. YouTube in particular developed ad-hoc policies from individual cases, gradually building up an internal set of rules that was opaque, arbitrary, and difficult for moderators to apply.[13][15]

Other platforms, such as Twitter and Reddit adopted the unmoderated, free speech ethos of old, with Twitter claiming to be the "free speech wing of the free speech party" and Reddit stating that "distasteful" subreddits would not be removed, "even if we find it odious or if we personally condemn it."[16][17]

2010 - Present: Centralization and Expanded Moderation

Throughout the 2010s, as social media platforms became ubiquitous, the ethics of their moderation policies were brought into question. As these centralized platforms began to have significant influence over national and international discourse, concerns were raised overthe presence of offensive content as well stifling of expression. [18][19] Additionally, internet infrastructure providers also began to remove content hosted on their platforms.

In 2010, WikiLeaks leaked the US Diplomatic Cables and hosted the documents on Amazon Web Services. These were later removed by Amazon as against their content policies. WikiLeaks' DNS provider also made the decision to drop their website, effectively removing WikiLeaks from the Internet until an alternative host could be found.[20]

In 2012, Reddit user /u/violentacrez was doxxed by Gawker Media for moderating several controversial subreddits, including /r/Creepshots. The subsequent media spotlight caused Reddit to reconsider their minimalist approach to content moderation.[21] This set a precedent which was used to ban more subreddits over the next few years. In 2015, Reddit banned /r/FatPeopleHate, which marked a turning point at which Reddit no longer considered itself a "bastion of free speech."[22] In 2019, Reddit banned /r/WatchPeopleDie, in an effort to suppress the spread of the Christchurch mass shooting video, a move widely considered as censorship. [23]

In 2015, Instagram came under fire for moderating female nipples, but not male nipples. It was later revealed that this decision was in turn due to content moderation policies for apps in Apple App Store.[24]

In 2016, in the aftermath of Gamergate and it's associated harassment, Twitter instituted the Trust and Safety Council, and began enforcing stricter moderation policies on their users.[25] In 2019, Twitter was heavily criticized for political bias, inconsistency and lack of transparency in their moderation practices.[26]

In 2018, Tumblr banned all adult content from their platform. This resulted in a mass removal of LGBT and GSM support groups and communities.[27]

Ethical Issues

Psychological Effects on Moderators

Content moderation can have significant negative effects on the individuals tasked with carrying out the motivation. Because most content must be reviewed by a human, professional content moderators spend hours every day reviewing disturbing images and videos, including pornography (sometimes involving children or animals) gore, executions, animal abuse, and hate speech. Viewing such content repeatedly, without end, day after day, can be stressful and traumatic, with moderators sometimes developing PTSD-like symptoms. Others, after continuous exposure to fringe ideas and conspiracy theories, begin to internalize and believe them themselves.[13][28][29]

Further negative effects are brought on by the stress of applying the subjective and inconsistent rules regarding content moderation. Moderators are often called upon to make judgement calls regarding ambiguously-objectionable material or content that is offensive but breaks no rules. However, the performance of their moderation decisions is strictly monitored and measured against the subjective judgement calls of other moderators. A few mistakes is all it takes for a professional moderator to lose their job.[29]

Information Transparency

Information transparency is the degree to which information about a system is visible to its users.[30] By this definition, content moderation is not transparent at any level. First, content moderation is often not transparent to the public, those it is trying to moderate. While a platform may have public rules regarding acceptable content, these are often vague and subjective, allowing the platform to enforce them as broadly or as narrowly as it chooses. Furthermore, such public documents are often supplemented by internal documents accessible only to the moderators themselves.[13]

Secondly, content moderation is not transparent at the level of moderators either. The internal documents are often as vague as the public ones, and contain significantly more internal inconsistencies and judgement calls that make them difficult to apply fairly. Furthermore, such internal documents are often contradicted by statements from higher ups, which in turn may be contradicted by similar statements.[29]

Finally, even at the corporate level where policy is set, moderation is not transparent. Moderation policies are often created by an ad-hoc, case-by-case process and applied in the same manner. Some content that would normally be removed by moderation rules will be accepted for special circumstances, such as "newsworthiness". For example, videos of violent government suppression could be displayed or not, depending on the whims of moderation policy-makers and moderation QAs at the time.[13]


Due to its inherently subjective nature, content moderation can suffer from various kinds of bias. Algorithmic bias is possible when automated tools are used to remove content. For example, YouTube's automated Content ID tools may flag reviews of films or games that feature clips or gameplay as copyright violations, despite being Fair Use when used to criticize. Moderation may also suffer from cultural bias, when something considered objectionable by one group may be considered fine to another. For example, moderators tasked with removing content that depicts minors engaging in violence may disagree over what constitutes a minor. Classification of obscenity is also culturally biased, with different societies around the world having different standards of modesty.[13][15]

Free Speech

