MIT doctoral student Irene Chen is trying to extricate a particularly embarrassing fly from the ointment, a fly that acolytes of AI have been slow to acknowledge, and at a loss to remedy— bigotry magnified in the online ether, perpetuating institutionalized bigotry in the real world in which us humans live:
Making ML systems less biased.
With machine learning systems now being used to determine everything from stock prices to medical diagnoses, it’s never been more important to look at how they arrive at decisions.
A new approach out of MIT demonstrates that the main culprit is not just the algorithms themselves, but how the data itself is collected.
“Computer scientists are often quick to say that the way to make these systems less biased is to simply design better algorithms,” says lead author Irene Chen (above), a PhD student who wrote the paper with MIT professor David Sontag and postdoctoral associate Fredrik D. Johansson. “But algorithms are only as good as the data they’re using, and our research shows that you can often make a bigger difference with better data.”
The problem is, at least in part, the misplaced faith AI proponents have in machine learning:
They then showed how changing the way they collected data could reduce each type of bias while still maintaining the same level of predictive accuracy…
Chen says that one of the biggest misconceptions is that more data is always better. Getting more participants doesn’t necessarily help, since drawing from the exact same population often leads to the same subgroups being under-represented. Even the popular image database ImageNet, with its many millions of images, has been shown to be biased towards the Northern Hemisphere.
Earlier this year, the World Economic Forum produced a white paper about the nature and scope of bias in the output of AI (not surprisingly, this bias consistently favors white hetero males in every context, and every domain):
How to Prevent Discriminatory Outcomes in Machine Learning
March 2018
Global Future Council on Human Rights 2016-2018
ML applications are already being used to make many life-changing decisions – such as who qualifies for a loan,whether someone should be given parole, or what type of care a child should receive from social service programs.These decisions affect human rights, especially of society’s most vulnerable: as framed by the Universal Declaration of Human Rights, a pillar of the international legal system since 1948, “the idea of human rights is as simple as it is powerful: that all people are free and equal, and have a right to be treated with dignity.” Machine learning can be disproportionately harmful in low- and middle-income countries, where existing inequalities are often deeper,training data are less available, and government regulation and oversight are weaker.
Many current ML applications might not seem relevant to human rights, such as the image recognition systems used to tag photos on social media. However, it is easy to conceive of scenarios in which they become so: image recognition systems can, for example, identify a person’s sexual orientation with reasonable accuracy – consider how they might be used by governments in countries where homosexuality is illegal. The potential for bias and discrimination goes well beyond sectors such as lending,insurance, hiring, employment, and education. As Cathy O’Neil says, “Predictive models are, increasingly, the tools we will be relying on to run our institutions, deploy our resources, and manage our lives.” (pg. 6)
Because it is impossible to define in advance when discrimination may happen in any given context, humans need to be kept involved and systems made interpretable for them.20,21 (pg. 10)
In some cases, bias is intentionally built into algorithms.For instance, if employers want to avoid hiring women who are likely to become pregnant, they might employ ML systems to identify and filter out this subset of women. In the absence of adequate regulation, the burden lies with the company leadership, designers, data scientists, engineers,and others involved in creating ML systems to build them in ways that predict, prevent, and monitor bias. (pg. 10)
The application of human rights standards to machine learning is a very recent topic of inquiry, and the recommendations in this paper are among the first to be developed and published in this area. We expect that they will be further developed and elaborated by others. These recommendations are meant to function not as a universal manual, but as a useful starting point for companies (from leadership through to development teams), building on any existing mechanisms in their sector. We encourage readers to choose the elements from these recommendations that are relevant to them, and integrate them as best fits their individual needs and context.32 This white paper has sought to move non-discrimination as a human rights issue to the center of the discussion about the potential social impacts of machine learning, and to expand the focus of these concerns to include parts of the world that are currently absent from the conversation. In our exploration of this emerging and complex subject, we have sought to identify areas (geographic, industry-specific,technical) where discrimination in machine learning is most likely to impact human rights, evaluate where businesses’responsibilities lie in addressing algorithmic discrimination,and present the realistic ways forward in overcoming these challenges. (pg. 15)
Of course, we might note in passing that software companies, computer technology companies, and computer science programs in most colleges and universities are notoriously lacking in diversity.
The consequences have been, shall we say, neither surprising, nor really all that mysterious:
White men dominate Silicon Valley not by accident, but by design.
Documentarian Robin Hauser Reynolds, who directed and produced the new documentary CODE: Debugging the Gender Gap, appeared on a panel at the Capital One House at SXSW along with Nathan Ensmenger, an associate professor at Indiana University’s School of Informatics and Computing. Together they discussed the social and cultural history of computing—with a special emphasis on the fact that the field is dominated by white men not by accident, but by design.
The myth of “great men and their machines” perpetuates a reductionist version of the history of computer science, according to Ensmenger. Not only were the world’s first computer programmerswomen back in the 1940s, women made up roughly 26% of computer science professionals in 1960. Cosmopolitan magazine even ran an article in 1967 urging young women to consider careers as “Computer Girls.” But as the tech field grew in the mid-20th century, companies had to hire thousands of workers to fill computing jobs that had never before existed. Recruiters relied on personality analysis to find the best-suited workers. They assumed that the ideal computer programmer was a focused young man who was more interested in machines than in other people. (emphasis added)
And so the culture and mythic status of ‘tech bros’ was born:
This idea quickly became self-perpetuating. The growing prevalence in popular culture of young white male “hackers” is inversely proportionate to the number of women who enrolled in computer science college programs in the latter decades of the 20th century, Ensmenger said.
I’m not really a tech person, but I believe the salient expression for bias in the world of computer science is ‘it’s a feature, not a bug’:
A recent study on gender bias in code, which has yet to be peer-reviewed, found that women’s contributions to open-source repository Github were more likely to be accepted than men’s as long as their gender was not revealed. When the author’s gender was known, women’s contributions were more likely to be rejected.
Which begs the question- an entire industry that is essentially the province of white hetero males, and thinks of itself as simply doing objective, scientifically sound work independent of (or perhaps floating above) the brute sociological facts of race and gender, produces tools and systems steeped in bias, in fact amplifying the pernicious effects of bias in every facet of our lives... who’d a thunk it?