Twitter rolls out refined prompts to combat harmful language

Feature suggests you reconsider a tweet reply if it contains offensive language.

Steven Musil Night Editor / News

Steven Musil is the night news editor at CNET News. He's been hooked on tech since learning BASIC in the late '70s. When not cleaning up after his daughter and son, Steven can be found pedaling around the San Francisco Bay Area. Before joining CNET in 2000, Steven spent 10 years at various Bay Area newspapers.

Expertise I have more than 30 years' experience in journalism in the heart of the Silicon Valley.

See full bio

Steven Musil

May 5, 2021 10:30 a.m. PT

2 min read

twitter-9841 — Twitter aims to reduce the amount of offensive speech on the platform.
James Martin/CNET

A year ago, Twitter began testing new prompts that encouraged people to rethink replies to tweets that contained potentially harmful or offensive language. After a year of testing, Twitter says it's learned a few things and made some revisions to its prompts.

The new prompts begin rolling out for English-speaking iOS users on Wednesday, while Android users can expect them in a few days, Twitter said Wednesday. The prompts, which allow you to edit, delete or just hit send on the tweet, are part of Twitter's efforts to promote healthier conversation on its site.

While social media can connect people with family and friends, it's increasingly being used to sow division. A Pew Research study published in January found that 75% of those who said they experienced online abuse indicated their most recent experience was on social media.

Twitter's testing showed that the system it used to analyze potentially harmful tweets needed fine-tuning to address detection errors and inconsistencies, Twitter said.

"In early tests, people were sometimes prompted unnecessarily because the algorithms powering the prompts struggled to capture the nuance in many conversations and often didn't differentiate between potentially offensive language, sarcasm and friendly banter," Twitter's Anita Butler and Alberto Parrella said in a blog post Wednesday.

Changes that Twitter said it made include taking into account the nature of the relationship between the author and respondent and how often they interact. It also accounts for certain language being reclaimed by underrepresented communities and used in non-harmful ways. The system was also beefed up to more accurately detect strong language, including profanity.

In addition to helping identify parts of the system that required tweaking, Twitter also found that the tests resulted in a reduction of potentially offensive replies being sent. Of those who were prompted, 34% either revised their reply or opted not to send it, Twitter said. Users who had seen a prompt also composed an average of 11% fewer offensive replies.

Twitter said it will continue to collect feedback from Twitter users who receive reply prompts. It will use that feedback to make further changes to the feature. It also plans to expand the feature's reach to other languages.