OpenAI announced it has developed an AI system using GPT-4 to assist with content moderation on online platforms.
The company says this system allows for faster iteration on policy changes and more consistent content labeling than traditional human-led moderation.
OpenAI said in its announcement:
“Content moderation plays a crucial role in sustaining the health of digital platforms. A content moderation system using GPT-4 results in much faster iteration on policy changes, reducing the cycle from months to hours.”
This move aims to improve consistency in content labeling, speed up policy updates, and reduce reliance on human moderators.
It could also positively impact human moderators’ mental health, highlighting the potential for AI to safeguard mental health online.
Challenges In Content Moderation
OpenAI explained that content moderation is challenging work that requires meticulous effort, a nuanced understanding of context, and continual adaptation to new use cases.
Traditionally, these labor-intensive tasks have fallen on human moderators. They review large volumes of user-generated content to remove harmful or inappropriate materials.
This can be mentally taxing work. Employing AI to do the job could potentially reduce the human cost of online content moderation.
How OpenAI’s AI System Works
OpenAI’s new system aims to assist human moderators by using GPT-4 to interpret content policies and make moderation judgments.
Policy experts first write up content guidelines and label examples that align with the policy.
GPT-4 then assigns the labels to the same examples without seeing the reviewer’s answers.
By comparing GPT-4’s labels to human labels, OpenAI can refine ambiguous policy definitions and retrain the AI until it reliably interprets the guidelines.
In a blog post, OpenAI demonstrates how a human reviewer could clarify policies when they disagree with a label GPT-4 assigns to content.
In the example below, a human reviewer labeled something K3 (promoting non-violent harm), but the GPT-4 felt it didn’t violate the illicit behavior policy.
Screenshot from: openai.com/blog/using-gpt-4-for-content-moderation, August 2023.
Having GPT-4 explain why it chose a different label allows the human reviewer to understand where policies are unclear.
They realized GPT-4 was missing the nuance that property theft would qualify as promoting non-violent harm under the K3 policy.
This interaction highlights how human oversight can further train AI systems by clarifying policies in areas where the AI’s knowledge is imperfect.
Once the policy is understood, GPT-4 can be deployed to moderate content at scale.
Benefits Highlighted By OpenAI
OpenAI outlined several benefits it believes the AI-assisted moderation system provides:
- More consistent labeling, since the AI adapts quickly to policy changes
- Faster feedback loop for improving policies, reducing update cycles from months to hours
- Reduced mental burden for human moderators
To that last point, OpenAI should consider emphasizing AI moderation’s potential mental health benefits if it wants people to support the idea.
Using GPT-4 to moderate content instead of humans could help many moderators by sparing them from having to view traumatic material.
This development may decrease the need for human moderators to engage with offensive or harmful content directly, thus reducing their mental burden.
Limitations & Ethical Considerations
OpenAI acknowledged judgments made by AI models can contain unwanted biases, so results must be monitored and validated. It emphasized that humans should remain “in the loop” for complex moderation cases.
The company is exploring ways to enhance GPT-4’s capabilities and aims to leverage AI to identify emerging content risks that can inform new policies.
Featured Image: sun ok/Shutterstock