Comment sections have long acted like the wiry garbage cans of news websites, collecting the worst and slimiest of human thought. Thoughtful reactions get mixed in with off-topic offal, personal attacks, and the enticing suggestions to “learn how to make over $7,000 a month by working from home online!” (So goes the old adage: never read the comments.) Things got so bad in the last decade that many websites put the kibosh on comments altogether, trading the hope of lively, interactive debate for the promise of peace and quiet.
But while some people ran away screaming, others leapt in with a mission to make the comment section better. Today, dozens of newsrooms use commenting platforms like Coral and OpenWeb that aim to keep problematic discourse at bay with a combination of human chaperones and algorithmic tools. (When WIRED added comments back to the website earlier this year, we turned to Coral.) These tools work to flag and categorize potentially harmful comments before a human can review them, helping to manage the workload and reduce the visibility of toxic content.
Another approach that’s gained steam is to give commenters automated feedback, encouraging them to rethink a toxic comment before they hit publish. A new study looks at how effective these self-editing prompts can be. The study, conducted by OpenWeb and Google’s AI conversation platform, Perspective API, involved over 400,000 comments on news websites, like AOL, RT, and Newsweek, which tested a real-time feedback feature in their comment sections. Rather than automatically rejecting a comment that violated community standards, the algorithm would first prompt commenters with a warning message: “Let’s keep the conversation civil. Please remove any inappropriate language from your comment,” or “Some members of the community may find your comment inappropriate. Try Again?” Another group of commenters served as a control, and saw no such intervention message.
The study found that for about a third of commenters, seeing the intervention did cause them to revise their comments. Jigsaw, the group at Google that makes Perspective API, says that jibes with previous research, including a study it did with Coral, which found that 36 percent of people edited toxic language in a comment when prompted. Another experiment—from The Southeast Missourian, which also uses Perspective’s software—found that giving real-time feedback to commenters reduced the number of comments considered “very toxic” by 96 percent.
The ways people revised their comments weren’t always positive, though. In the OpenWeb study, about half of people who chose to edit their comment did so to remove or replace the toxic language, or to reshape the comment entirely. Those people seemed both to understand why the original comment got flagged, and acknowledge that they could rewrite it in a nicer way. But about a quarter of those who revised their comment did so to navigate around the toxicity filter, by changing the spelling or spacing of an offensive word to try to skirt algorithmic detection. The rest changed the wrong part of the comment, seeming to not understand what was wrong with the original version, or revised their comment to respond directly to the feature itself (e.g. “Take your censorship and stuff it”).
As algorithmic moderation has become more common, language adaptations have followed in their footsteps. People learn that specific words—say, “cuck”— trip up the filter, and start to write them differently (“c u c k”) or invent new words altogether. After the death of death of Ahmaud Arbery in February, for example, Vice reported that some white supremacist groups online began to use the word “jogger” in place of better-known racial slurs. Those patterns largely escape algorithmic filters, and can make it harder to police intentionally offensive language online.
Ido Goldberg, OpenWeb’s SVP of product, says this kind of adaptive behavior was one of the main concerns in designing their real-time feedback feature. “There’s this window for abuse that’s open to try to trick the system,” he says. “Obviously we did see some of that, but not as much as we thought.” Rather than use the warning messages as a way to game the moderation system, most users who saw interventions didn’t change their comments at all. Thirty-six percent of users who saw the intervention posted their comment anyway, without making any edits. (The intervention message acted as a warning, not a barrier to posting.) Another 18 percent posted their comment, unedited, after refreshing the page, suggesting that they took the warning as a block. Another 12 percent simply gave up, abandoning the effort and not posting at all.