Hate Speech: Why algorithms fail

Typographical errors, lack of space, and taunts mislead artificial intelligences

Against hate comments in the network today algorithms are used - with limited success. © bigtunaonline / iStock
Read out

KI against hate comments: An experiment reveals why many algorithms fail against hate comments on social media. Thus, typos, wrong grammar, and missing spaces between words are enough to mislead the AI ​​systems. Also interspersed positive words like "love" prevent recognition of hate speech. However, a targeted training of the algorithms on such features could help, the researchers said.

As useful as social media are, so are their "dark sides". Because thanks to Facebook, Twitter and co rampant fake news and hate comments. The echo chamber effect also ensures that users no longer experience true diversity of opinion. Providers have long tried to use learning algorithms to filter fake news and hate speech - but only with limited success.

Seven adaptive filter systems in the test

Why algorithms fail so often in hate comments have now been investigated by Tommi Gröndahl of the University of Aalto and his team. For their experiment, they tested seven current AI systems for hate speech detection. The artificial intelligences got in the first test hate comments, which came from the training record of the other algorithms.

In the second test, the researchers examined how well the AI ​​systems were able to cope with words with typos, wrong grammar, or omitted leeches. Finally, the scientists simply added classic hate comments like "I hate you" to a positive word like "love" - ​​would that affect the algorithms?

Total failure at blanks and "love"

The result: None of the intelligent machine brains performed well when confronted with hate speech from one of the "foreign" datasets. Also typos and wrong grammar made many hate comments "slip through". Even more drastic, however, was the result when the researchers left spaces between words: not a single filter algorithm then recognized a phrase like "Ihate you" ad

But this means that common filters have been tricked out so far: "In the simplest case, the text is simply changed so that a human reader still understands the intended message while the filters classify the text incorrectly" Grndahl and his colleagues. "Almost all models are helpless against such deliberately disguised inputs." Even the simple addition of a hate commentary to the word "love" often led to false classification.

Better training needed

As the experiments proved, this affects even the relatively advanced Google system "Perspective". These learning algorithms evaluate their "toxicity" when entering comments. After a study in 2017 revealed how easily this system can be mislead by typos, Google has significantly improved. But as the researchers discovered, "Perspective" also fills in missing spaces and an added "love". The sentence "I hate you" was no longer classified as "toxic" in the form "Ihate you love".

Google's "Perspective" examines the content when entering "toxicity" - but can be overstated. "Grahlahl et al.

However, according to the scientists, the problem lies not in the basic structure of the models and algorithms used, but rather in the data sets that were previously used for their training. They should be increasingly supplemented by deliberately misspelled terms, contracted words and also appended "suppression". (ACM AISec workshop, 2018)

(Aalto University, 17.09.2018 - NPO)