OpenAI to use GPT-4 LLM for content moderation, warns against bias

OpenAI to use GPT-4 LLM for content moderation, warns against bias

The company expects to eliminate undesired biases introduced during training with the involvement of humans in the loop.

Credit: Shutterstock

ChatGPT-creator OpenAI is working on the development of its GPT-4 large language model (LLM) to automate the process of content moderation across digital platforms, especially social media.

OpenAI is exploring the use of GPT-4’s ability to interpret rules and nuances in long content policy documentation, along with its capability to adapt instantly to policy updates, the company said in a blog post.

"We believe this offers a more positive vision of the future of digital platforms, where AI can help moderate online traffic according to platform-specific policy and relieve the mental burden of a large number of human moderators," the company said, adding that anyone with access to OpenAI’s API can implement their own moderation system. 

In contrast to the present practice of content moderation, which is completely manual and time consuming, OpenAI’s GPT-4 large language model can be used to create custom content policies in hours, the company said. 

In order to do so, data scientists and engineers can use a policy guideline crafted by policy experts and data sets containing real-life examples of such policy violations in order to label the data. 

Humans to help test AI content moderation

"Then, GPT-4 reads the policy and assigns labels to the same dataset, without seeing the answers. By examining the discrepancies between GPT-4's judgments and those of a human, the policy experts can ask GPT-4 to come up with reasoning behind its labels, analyse the ambiguity in policy definitions, resolve confusion and provide further clarification in the policy accordingly," the company said. 

These steps may be repeated by data scientists and engineers before the large language model can generate satisfying results, it added, explaining that the iterative process yields refined content policies that are translated into classifiers, enabling the deployment of the policy and content moderation at scale. 

Other advantages of using GPT-4 over the present manual approach to content moderation include a decrease in inconsistent labelling and faster feedback loop.

"People may interpret policies differently or some moderators may take longer to digest new policy changes, leading to inconsistent labels. In comparison, LLMs are sensitive to granular differences in wording and can instantly adapt to policy updates to offer a consistent content experience for users," the company said. 

The new approach, according to the company, also takes less effort in terms of training the model.

Further, OpenAI claims that this approach is different from so-called constitutional AI, under which content moderation is dependent on the model's own internalized judgment of what is safe. Various companies, including Anthropic, have taken a constitutional AI approach in training their models to be free of bias and error.

Nevertheless, OpenAI warned that undesired biases may creep into content moderation models during training.

"As with any AI application, results and output will need to be carefully monitored, validated, and refined by maintaining humans in the loop," it said. 

Industry experts think OpenAI's approach to content moderation has potential. "GPT-4 is a super capable model and OpenAI has a never-ending stream of users trying to make it do harmful things. Which is great training data," said Tobias Zwingmann, managing  partner at AI services company Rapyd.AI.

If OpenAI’s large language model can be used successfully for content moderation, it will open up a multibillion-dollar market for the company. 

The global content moderation services market, according to a report from Allied Market Research, was valued at $8.5 billion in 2021, and is projected to reach $26.3 billion by 2031, growing at a compound annual growth rate of 12.2% from 2022 to 2031. 

Show Comments