One of humanities greatest strengths is the ability to navigate the world using only limited data, relying to a large part on our experience built up over years of personal exposure, education and media.
This, for example, means we drive slower around schools because we suspect there may be children around, or offer a seat to the elderly because we reasonably suspect they will be weaker than the average person.
The dark side of these assumptions are of course racist and sexist biases, where our beliefs are poorly substantiated, unfairly extrapolated from a few to a whole population, or do not allow for exceptions to the rule.
Speaking to Wired, Microsoft researchers have revealed that AIs are even more susceptible to developing this kind of bias.
Researchers from Boston University and Microsoft showed that software trained on text collected from Google News would form connections such as “Man is to computer programmer as woman is to homemaker.”
Another study found when AI was trained on two large sets of photos, consisting of more than 100,000 images of complex scenes drawn from the web, labelled by humans with descriptions, the AI developed strong associations between women and domestic items and men and technology and outdoor activities.
In the COCO dataset, kitchen objects such as spoons and forks were strongly associated with women, while outdoor sporting equipment such as snowboards and tennis rackets, and technology items such as keyboards and computer mice were very strongly associated with men.
In fact, the AI’s biases were even stronger than the dataset itself, leading it to be much more likely to identify a person in a kitchen as a woman even if it was a man.
Such biases, if detected, can be corrected with additional training, but there are significant risks that a AI model may slip into production without all such issues being resolved.
Eric Horvitz, director of Microsoft Research, said “I and Microsoft as a whole celebrate efforts identifying and addressing bias and gaps in datasets and systems created out of them. Researchers and engineers working with COCO and other datasets should be looking for signs of bias in their own work and others.”
Horvitz is considering an interesting solution to get the AI right from the start, suggesting instead of images drawn from reality, an AI may be trained on idealized images which shows items with an equal gender balance already, much like children’s education material would reflect reality like we want it to be rather than what it is.
“It’s a really important question–when should we change reality to make our systems perform in an aspirational way?” he says.
Others researchers are not so sure.
If there really are more male construction workers, image-recognition programs should be allowed to see that, says Aylin Caliskan, a researcher at Princeton. Steps can be taken afterwards to measure and adjust any bias if needed. “We risk losing essential information,” she says. “The datasets need to reflect the real statistics in the world.”