Key2Kindness: A cross-platform proactive content moderation system

Mark Warner


The key2Kindness project is interested in understand proactive content moderation, and how this form of moderation can be emedded responsibly and effectively within different communication platforms. It takes a critical approach, working with stakeholders to understand the potential risks and unintended harms of these systems. Finally, through engagement with stakeholders, it will evaluate several proactive content moderation interaction design factors through an experimental design and qualitative user interviews.

Main findings

Current research indicates that proactive moderation through preliminary flagging of problematic content can be effective in reducing instances of online harassment, but direct positive effects may not be seen across all users. For example, users that are organised and determined to harass are more likely to ignore these types of prompts, whereas those users who are acting “in the heat of the moment” are likely to be positively influenced. However, all users may indirectly benefit from these prompts due to positive downstream effects, with fewer messages of an harassing nature being sent due to these prompts, resulting in less “fuel” for more organised and determined users to respond to and use within organised harassment campaigns.

In our controlled experiment, prompts were displayed to users where content of messages were considered toxic by an AI language model. Prompt either informed users that their message was toxic with a corresponding toxicity score, or informed users without a score. Prompts were also tested at different points in time, both during drafting a message, and at the point of sending a message (i.e., once a user hits ‘send’). Finally, prompts with and without time delays were tested to understand the impact of adding friction into the process of sending a message where toxic content is detected.

The findings highlight the benefits of prompting users proactively, and while changes in the design did impact efficacy, the simple presence of a prompt had the most effect in reducing toxicity of content. The findings also highlight the potential for these types of prompts to enhance awareness around platform rules, and to act as “in the moment” educational resources to inform and support users when interacting online.


Warner, M., Strohmayer, A., Higgs, M., & Coventry, L. (2024). A Critical Reflection on the Use of Toxicity Detection Algorithms in Proactive Content Moderation Systems. arXiv preprint arXiv:2401.10629.

Warner, M., Strohmayer, A., Higgs, M., Rafiq, H., Yang, L., & Coventry, L. (2024b). [Warner, M., Strohmayer, A., Higgs, M., Rafiq, H., Yang, L., & Coventry, L. (2024). Key to Kindness: Reducing Toxicity In Online Discourse Through Proactive Content Moderation in a Mobile Keyboard. arXiv preprint arXiv:2401.10627.