About artificial intelligence bias

About artificial intelligence bias

tl; dr:

  • Machine learning looks for patterns in data. But artificial intelligence can be "biased" - that is, find incorrect patterns. For example, a photo-based skin cancer detection system might pay special attention to pictures taken in a doctor's office. Machine learning can't understand: its algorithms only detect patterns in numbers, and if the data is not representative, so will the result of their processing. And catching such bugs can be tricky due to the mechanics of machine learning itself.
  • The most obvious and frightening problem area is human diversity. There are many reasons why data about people can lose objectivity even at the stage of collection. But do not think that this problem concerns only people: exactly the same difficulties arise when trying to find a flood in a warehouse or a failed gas turbine. Some systems may be prejudiced about skin color, others will be prejudiced against Siemens sensors.
  • Such problems are not new to machine learning, and are far from unique to it. Wrong assumptions are made in any complex structure, and understanding why a particular decision was made is always difficult. This needs to be tackled in a comprehensive way: create tools and processes for verification - and educate users so that they do not blindly follow the recommendations of AI. Machine learning does do some things much better than we do, but dogs, for example, are much more effective than humans at detecting drugs, which is not at all a reason to use them as witnesses and pass sentences based on their testimony. And dogs, by the way, are much smarter than any machine learning system.

Machine learning is one of the most important fundamental technology trends today. This is one of the main ways that technology will change the world around us in the next decade. Some aspects of these changes are troubling. For example, the potential impact of machine learning on the labor market, or its use for unethical purposes (for example, by authoritarian regimes). There is another problem that this post is about: artificial intelligence bias.

This is not an easy story.

About artificial intelligence bias
AI from Google can find cats. This news from 2012 was something special then.

What is "AI bias"?

"Raw data" is both an oxymoron and a bad idea; data must be prepared well and carefully. —Jeffrey Bocker

Sometime before 2013, to make a system that, say, recognize cats in photographs, you had to describe logical steps. How to find corners in an image, recognize eyes, analyze textures for fur, count paws, and so on. Then collect all the components - and find that it all does not really work. Something like a mechanical horse - theoretically it can be done, but in practice it is too complicated to describe. As a result, you have hundreds (or even thousands) of handwritten rules. And not a single working model.

With the advent of machine learning, we stopped using “manual” rules for recognizing an object. Instead, we take a thousand samples of "that", X, a thousand samples of "other", Y, and have the computer build a model based on their statistical analysis. We then give this model some sample data and it determines with some precision whether it fits one of the sets. Machine learning generates a model from data, not from the person who writes it. The results are impressive, especially in the field of image and pattern recognition, which is why the entire tech industry is now shifting to machine learning (ML).

But not everything is so simple. In the real world, your thousands of instances of X or Y also contain A, B, J, L, O, R, and even L. They may be unevenly distributed, and some of them may occur so often that the system will pay more attention to them than to the items that interest you.

What does this mean in practice? My favorite example is when image recognition systems look at a grassy hill and say "sheep". It is clear why: most of the photographs of the "sheep" are taken in the meadows where they live, and in these images the grass takes up much more space than the little white fluffy ones, and it is the grass of the system that is considered the most important.

There are more serious examples. From the recent one project to detect skin cancer in photographs. It turned out that dermatologists often photograph the ruler along with manifestations of skin cancer in order to fix the size of the formations. There are no rulers on the examples of photos of healthy skin. For the AI ​​system, such rulers (more precisely, the pixels that we define as a “ruler”) have become one of the differences between sets of examples, and sometimes more important than a small rash on the skin. So a system designed to recognize skin cancer sometimes recognized rulers instead.

The key point here is that the system does not have a semantic understanding of what it is looking at. We look at a set of pixels and see sheep, skin or rulers in them, and the system is just a number line. She does not see three-dimensional space, she does not see objects, textures, or sheep. She just sees patterns in the data.

The difficulty in diagnosing such problems is that the neural network (the model generated by your machine learning system) consists of thousands of hundreds of thousands of nodes. There is no easy way to look into a model and see how it makes a decision. Having such a way would mean that the process is simple enough to describe all the rules manually, without the use of machine learning. People worry that machine learning has become a black box. (I will explain later why this comparison is overkill.)

This, in a nutshell, is the problem with artificial intelligence or machine learning bias: a system for finding patterns in data can find the wrong patterns, and you might not notice it. This is a fundamental characteristic of technology, and it is obvious to everyone who works with it in academia and in large technology companies. But its consequences are complex, and so are our possible solutions to those consequences.

Let's talk about the consequences first.

About artificial intelligence bias
AI can implicitly make a choice for us in favor of certain categories of people, based on a large number of imperceptible signals

AI bias scenarios

Most obviously and frighteningly, this problem can manifest itself when it comes to human diversity. Recently there was a rumorthat Amazon tried to build a machine learning system for the initial screening of job applicants. Since there are more men among Amazon employees, examples of “successful hiring” are also more often male, and there were more men in the selection of resumes offered by the system. Amazon noticed this and did not release the system into production.

Most importantly in this example, the system was rumored to favor male candidates, despite the fact that gender was not listed on the resume. The system saw other patterns in examples of “good hires”: for example, women might use particular words to describe accomplishments, or have particular hobbies. Of course, the system did not know what "hockey" was, nor who "people" were, nor what "success" was - it simply carried out a statistical analysis of the text. But the patterns that she saw would most likely remain unnoticed by a person, and some of them (for example, the fact that people of different sexes describe success differently) would probably be difficult for us to see, even looking at them.

Further - worse. A machine learning system that is very good at finding cancer on pale skin may not be as good on dark skin, or vice versa. Not necessarily because of bias, but because you probably need to build a separate model for a different skin color, choosing different characteristics. Machine learning systems are not interchangeable even in such a narrow field as image recognition. You need to tune the system, sometimes just through trial and error, to get good at spotting features in the data you're interested in until you get the accuracy you want. What you may not notice, however, is that the system is 98% accurate on one group and only 91% (albeit more accurate than human analysis) on the other.

So far, I have mainly used examples relating to people and their characteristics. This topic is the main focus of the discussion around this problem. But it is important to understand that bias against people is only part of the problem. We will be using machine learning for a lot of things, and sampling error will be relevant for all of them. On the other hand, if you are working with people, the data bias may not be related to them.

To understand this, let's go back to the skin cancer example and consider three hypothetical possibilities for system failure.

  1. Heterogeneous distribution of people: An unbalanced number of photographs of skin tones, leading to false positive or false negative results associated with pigmentation.
  2. The data on which the system is trained contains a frequently occurring and non-uniformly distributed characteristic that is not associated with people and has no diagnostic value: a ruler in photographs of skin cancer manifestations, or grass in photographs of sheep. In this case, the result will be different if the system finds pixels in the image of something that the human eye defines as a "ruler".
  3. The data contains a third-party characteristic that a person cannot see, even if he searches for it.

What does it mean? We know a priori that data can represent different groups of people in different ways, and at least we can plan to find such exceptions. In other words, there are plenty of social reasons to assume that data on groups of people already contain some bias. If we look at the photo with the ruler, we will see this ruler - we just ignored it before, knowing that it does not matter, and forgetting that the system does not know anything.

But what if all your photos of unhealthy skin are taken in an office with incandescent lights, and healthy skin is taken under fluorescent light? What if, after you finished shooting healthy skin, before shooting unhealthy skin, you updated the operating system on your phone, and Apple or Google slightly changed the noise reduction algorithm? A person does not notice this, no matter how much he searches for such features. And here the machine use system will immediately see and use it. She doesn't know anything.

So far we've talked about spurious correlations, but it could also be that the data is accurate and the results are correct, but you don't want to use them for ethical, legal, or managerial reasons. In some jurisdictions, for example, women cannot be given a discount on insurance, even though women may be safer drivers. We can easily imagine a system that, when analyzing historical data, assigns a lower risk factor to female names. Okay, let's remove the names from the selection. But remember the Amazon example: the system can determine gender by other factors (even though it doesn’t know what gender is, and what a car is), and you won’t notice this until the regulator retroactively analyzes your proposed rates and collects from you fine.

Finally, it is often implied that we will only use such systems for projects that involve people and social interactions. This is wrong. If you're building gas turbines, you'll probably want to apply machine learning to the telemetry from dozens or hundreds of sensors on your product (audio, video, temperature, and any other sensors that generate data that can be very easily adapted to create a machine learning model). ). Hypothetically, you can say, “Here is the data for a thousand failed turbines before they failed, and here is the data for a thousand turbines that did not break. Build a model to say what the difference is between them." Well, now imagine that Siemens sensors are installed on 75% of bad turbines, and only 12% of good ones (there is no connection with failures). The system will build a model to find turbines with Siemens sensors. Oops!

About artificial intelligence bias
Image — Moritz Hardt, UC Berkeley

Managing AI Bias

What can we do about it? You can approach the issue from three angles:

  1. Methodological rigor in collecting and managing data for system training.
  2. Technical tools for analyzing and diagnosing the behavior of the model.
  3. Training, learning, and caution when implementing machine learning into products.

There is a joke in Moliere's book "The Bourgeoisie in the Nobility": a man was told that literature is divided into prose and poetry, and he discovers with admiration that he has been speaking prose all his life without knowing it. This is probably how statisticians feel today: without noticing it, they have devoted their careers to artificial intelligence and sampling error. Looking for sampling error and worrying about it is not a new problem, we just need to systematically approach it. As mentioned above, in some cases it's actually easier to do this by looking at issues around people data. We a priori assume that we may have prejudices regarding different groups of people, but it’s hard for us to even imagine a prejudice about Siemens sensors.

The new thing about all this, of course, is that people don't do statistical analysis directly anymore. It is carried out by machines that create large complex models that are difficult to understand. The issue of transparency is one of the main aspects of the problem of bias. We are afraid that the system is not just biased, but that there is no way to detect its bias, and that machine learning differs in this from other forms of automation, which are supposed to consist of clear logical steps that can be checked.

There are two problems here. We may still be able to conduct some kind of audit of machine learning systems. And auditing any other system isn't really any easier.

Firstly, one of the directions of modern research in the field of machine learning is the search for methods of how to identify the important functionality of machine learning systems. At the same time, machine learning (in its current state) is a completely new field of science that is changing rapidly, so do not think that things that are impossible today cannot soon become quite real. Project OpenAI is an interesting example of this.

Secondly, the idea that it is possible to test and understand the decision-making process in existing systems or organizations is good in theory, but so-so in practice. Understanding how decisions are made in a large organization is not at all easy. Even if there is a formal decision-making process, it does not reflect how people actually interact, and they themselves often do not have a logical, systematic approach to making their decisions. As my colleague said Vijay Pande, people are black boxes too.

Take a thousand people in several overlapping companies and institutions, and the problem becomes even more difficult. We know after the fact that the Space Shuttle was destined to fall apart on reentry, and individuals inside NASA had information that made them think something bad might happen, but the system generally didn't know that. NASA even just passed a similar audit, having lost the previous shuttle, and yet it lost another one - for a very similar reason. It is easy to argue that organizations and people follow clear logical rules that can be tested, understood and changed - but experience proves otherwise. This "delusion of the State Planning Committee».

I often compare machine learning to databases, especially relational databases, a new fundamental technology that has changed the possibilities of computer science and the world around it, which has become a part of everything that we constantly use without realizing it. Databases have problems, too, and they're similar: a system can be built on wrong assumptions or bad data, but it's hard to notice, and people using the system will do what it tells them to do without asking questions. There are a lot of old jokes about tax officials who misspelled your name, and convincing them to fix the mistake is a lot harder than actually changing your name. There are many ways to think about this, but it is not clear which is better: as a technical problem in SQL, or as a bug in an Oracle release, or as a failure of bureaucratic institutions? How difficult is it to find a bug in the process that led to the system not having the typo fix feature? Was it possible to understand this before people started complaining?

This problem is even simpler illustrated by stories when drivers drive into rivers due to outdated data in the navigator. Okay, maps should be constantly updated. But how guilty is TomTom that your car is blown into the sea?

I say this because yes, machine learning bias will create problems. But these problems will be similar to those we have encountered in the past, and they will be noticed and solved (or not) about as well as we have been able to in the past. Therefore, a scenario in which AI bias is detrimental is unlikely to happen to lead researchers working in a large organization. Most likely, some insignificant technology contractor or software vendor will write something on the knee, using open source components, libraries and tools that he does not understand. And the unlucky customer will buy into the phrase “artificial intelligence” in the product description and, without asking too many questions, distribute it to their low-paid workers, telling them to do what the AI ​​says. This is exactly what happened with databases. This is not an artificial intelligence problem, and not even a software problem. This is the human factor.

Conclusion

Machine learning can do anything you can teach a dog, but you can never be sure exactly what you have taught that dog.

It often seems to me that the term "artificial intelligence" only gets in the way of conversations like this. This term gives the false impression that we actually created it - this intelligence. That we are on the way to HAL9000 or Skynet - to something that actually understands. But no. They are just machines, and it is much more correct to compare them with, say, a washing machine. She is much better at doing laundry than a human, but if you put dishes in her instead of laundry, she will ... wash it. The dishes will even be clean. But this will not be what you expected, and this will not happen because the system has some kind of prejudice regarding dishes. The washing machine does not know what dishes are, nor what clothes are - it is just an example of automation, conceptually no different from how processes have been automated before.

Whether it's cars, aircraft, or databases, these systems will be both very powerful and very limited. They will depend entirely on how people use these systems, whether their intentions are good or bad, and how much they understand how they work.

Therefore, to say that “artificial intelligence is mathematics, so it cannot have biases” is completely wrong. But it is equally wrong to say that machine learning is “subjective in nature.” Machine learning finds patterns in the data, and which patterns it finds depends on the data, and the data depends on us. Like what we do with them. Machine learning does do some things much better than we do, but dogs, for example, are much more effective than humans at detecting drugs, which is not at all a reason to use them as witnesses and pass sentences based on their testimony. And dogs, by the way, are much smarter than any machine learning system.

Translation: Diana Letskaya.
Editing: Alexey Ivanov.
Community: @PonchikNews.

Source: habr.com

Add a comment