Facebook’s algorithms for detecting hate speech are working harder than ever. If only we knew how good they are at their jobs.
Tuesday the social network reported a big jump in the number of items removed for breaching its rules on hate speech. The increase stemmed from better detection by the automated hate speech sniffers developed by Facebook’s artificial intelligence experts.
The accuracy of those systems remains a mystery. Facebook doesn’t release, and says it can’t estimate, the total volume of hate speech posted by its 1.7 billion daily active users.
Facebook has released quarterly reports on how it is enforcing its standards for acceptable discourse since May 2018. The latest says the company removed 9.6 million pieces of content it deemed hate speech in the first quarter of 2020, up from 5.7 million in the fourth quarter of 2019. The total was a record, topping the 7 million removed in the third quarter of 2019.
Of the 9.6 million posts removed in the first quarter, Facebook said its software detected 88.8 percent before users reported them. That indicates algorithms flagged 8.5 million posts for hate speech in the quarter, up 86 percent from the previous quarter’s total of 4.6 million.
In a call with reporters, Facebook chief technology officer Mike Schroepfer touted advances in the company’s machine learning technology that parses language. “Our language models have gotten bigger and more accurate and nuanced,” he said. “They’re able to catch things that are less obvious.”
Schroepfer wouldn’t specify how accurate those systems now are, saying only that Facebook tests systems extensively before they are deployed, in part so that they do not incorrectly penalize innocent content.
He cited figures in the new report showing that although users had appealed decisions to take down content for hate speech more often in the most recent quarter—1.3 million times—fewer posts were subsequently restored. Facebook also said Tuesday it had altered its appeals process in late March, reducing the number of appeals logged, because Covid-19 restrictions shut some moderation offices.
Facebook’s figures do not indicate how much hate speech slips through its algorithmic net. The company’s quarterly reports estimate the incidence of some types of content banned under Facebook’s rules, but not hate speech. Tuesday’s release shows violent posts declining since last summer. The hate speech section says Facebook is “still developing a global metric.”
The missing numbers shroud the true size of the social networks’s hate speech problem. Caitlin Carlson, an associate professor at Seattle University, says the 9.6 million posts removed for hate speech look suspiciously small compared with Facebook’s huge network of users, and users’ observations of troubling content. “It’s not hard to find,” Carlson says.
Carlson published results in January from an experiment in which she and a colleague collected more than 300 Facebook posts that appeared to violate the platform’s hate speech rules, and reported them via the service’s tools. Only about half of the posts were ultimately removed; the company’s moderators appeared more rigorous in enforcing cases of racial and ethnic slurs than misogyny.
Facebook says content flagged by its algorithms is reviewed the same way as posts reported by users. That process determines whether to remove the content, or add a warning, and can involve human reviewers or software alone. Friday, Facebook agreed to a $52 million settlement with moderators who say reviewing content for the company caused them to develop PTSD. News of the settlement was earlier reported by the Verge.
Facebook’s moderation reports are part of a recent transparency drive that also includes a new panel of outside experts with the power to overturn the company’s moderation decisions. The company stood up those projects after scandals such as Russia-orchestrated election misinformation that have spurred lawmakers in the US and elsewhere to consider new government constraints on social platforms.
Carlson says Facebook’s disclosures appear to be intended to show that the company can self-regulate, but the reports are inadequate. “To be able to have a conversation about this we need the numbers,” she says. Asked why it doesn’t report prevalence for hate speech, a company spokesperson pointed to a note in its report saying its measurement is “slowly expanding to cover more languages and regions, to account for cultural context and nuances for individual languages.”