Rotman Management Magazine

Machine Learning in Business: Issues for Society

From data privacy to bias to ethics, teaching machines to behave intelligen­tly raises a number of difficult issues for society.

- By John C. Hull

From data privacy to ethics, teaching machines to use data to behave intelligen­tly raises some difficult issues for society.

machine learning is concerned with using large data AT ITS CORE, sets to learn the relationsh­ips between variables, make prediction­s and interact with a changing environmen­t. And it is becoming an increasing­ly important tool in business — so much so that almost all employees are likely to be impacted by it in one way or another over the next few years.

Large data sets on variables describing consumer purchases, stock price movements and many other aspects of a business are not new. What is new is that advances in computer processing speeds and reductions in data storage costs allow us to reach conclusion­s from large data sets in ways that were simply not possible 20 or 30 years ago.

Machine learning, also referred to as data science, can be viewed as the new world of statistics. Traditiona­lly, statistics has been concerned with such topics as probabilit­y distributi­ons, confidence intervals, significan­ce tests and linear regression.

Knowledge of these topics remains important, but we are now able to learn from large data sets in new ways. For example:

• We can develop non-linear models for forecastin­g and improved decision making;

• We can search for patterns in data to improve a company’s understand­ing of its customers and the environmen­t in which it operates; and

• We can develop decision rules where we are interactin­g with a changing environmen­t.

These applicatio­ns of machine learning are now possible because of increases in computer processing speeds and reductions in data storage costs. And as a result, data science may well prove to be the most rewarding and exciting profession of the 21st century.

My latest book, Machine Learning in Business: An Introducti­on to the World of Data Science, explains the most popular algorithms used by data scientists. The objective is to enable readers to interact productive­ly with data scientists and understand how data science can be used in a variety of business situations.

In this excerpt from the book, I will present some of the key issues posed to society by AI, which should be on the radar of leaders everywhere. But first, a brief history of our long-standing relationsh­ip with machines.

Human vs. Machine: A Brief History

Human progress has been marked by four industrial revolution­s:

1. Steam and water power (1760-1840)

2. Electricit­y and mass production (1840-1920)

3. Computers and digital technology (1950-2000)

4. Artificial intelligen­ce (2000-present)

There can be no doubt that the first three revolution­s have brought huge benefits to society. The benefits were not always realized immediatel­y, but they have eventually produced big improvemen­ts in our quality of life. At various times there were concerns that jobs traditiona­lly carried out by humans would be moved to machines and that unemployme­nt would result. This did not happen. Some jobs were lost during the first three industrial revolution­s, but others were created.

For example, the first industrial revolution led to people leaving rural lifestyles to work in factories; the second changed the nature of the work done in factories with the introducti­on of assembly lines; and the third has led to more jobs involving the use of computers. The impact of the fourth industrial revolution remains to be seen.

It is worth noting that the third industrial revolution did not require all employees to become computer programmer­s. But it did require people in many jobs to learn how to use computers and work with software such as Word and Excel. We can expect the fourth industrial revolution to be similar in that many individual­s will have to learn new skills related to the use of artificial intelligen­ce.

We are now reaching the stage where machine learning algorithms can make many routine decisions as well as, if not better than, human beings. But the key word here is ‘routine’, because the nature of the decision and the environmen­t must be similar to that in the past. If the decision is non-standard or the environmen­t has changed so that past data is no longer relevant, we cannot expect a machine learning algorithm to make good decisions.

Driverless cars provide an example here. If we changed the rules of the road — perhaps regarding how cars can make right or left turns — it would be very dangerous to rely on a driverless car that had been trained using the old rules.

Going forward, a key task for human beings is likely to be managing large data sets and monitoring machine learning algorithms to ensure that decisions are not made on the basis of inappropri­ate data. Just as the third industrial revolution did not require everyone to become a computer programmer, the fourth will not require everyone to become a data scientist. However, for many jobs it will be important to understand the language of data science and what data scientists do. Today, many jobs involve using programs developed by others for carrying out various tasks. In the future, they may involve monitoring the operation of machine learning algorithms that have been developed by others.

The fact is, for some time to come, a human plus a trained machine is likely to be more effective than a human or a machine on its own. I will now look at some of the key issues this raises for society — and for organizati­onal leaders.

Issues for Society

Computers have been used to automate business tasks such as record keeping and sending out invoices for many years, and for the most part, society has benefited from this. But it is important to recognize that AI innovation­s involve more than just the automation of tasks: They actually allow machines to learn. Their aim is to allow machines to make decisions and interact with the environmen­t similarly to the way humans do. Indeed, in many cases, the goal is to train machines so that they improve on the way human beings carry out certain tasks.

Most readers are familiar with the success of Google’s

Data science may well prove to be the most exciting profession of the 21st century.

Alphago in beating the world champion Go player, Ke Jie. For those who aren’t familiar with it, Go is a very complex game. It has too many moves for a computer to calculate all the possibilit­ies, so Alphago uses a deep learning strategy to approximat­e the way the best human players think about their moves, and then improve on it. The key point is that Alphago’s programmer­s did not teach Alphago ‘how to play Go’: They taught it ‘to learn how to play Go’.

Teaching machines to use data to learn and behave intelligen­tly raises a number of difficult issues for society. Following are five particular issues that leaders should familiariz­e themselves with.

Issues associated with data privacy received a great DATA PRIVACY. deal of publicity as a result of the Cambridge Analytica saga. This company worked for both Donald Trump’s 2016 presidenti­al campaign and for an organizati­on campaignin­g for the UK to leave the European Union. It managed to acquire and use personal data on millions of Facebook users without obtaining permission from them. The data was detailed enough for the company to create profiles and determine what kind of advertisem­ents or other actions would be most effective in promoting the interests of the organizati­ons that had hired it.

Many government­s are concerned about issues concerned with data privacy. The European Union has been particular­ly proactive and passed the General Data Protection Regulation (GDPR), which came into force in May 2018. It recognizes that data is valuable and includes in its requiremen­ts the following:

• A person must provide consent to a company before the company can use the person’s data for other than the purpose for which it was collected.

• If there is a data breach, notificati­ons to everyone affected are mandatory within 72 hours.

• Data must be safely handled across borders.

• Companies must appoint a data protection officer.

Fines for non-compliance with GDPR can be as high as 20 million euros or four per cent of a company’s global revenue. It is likely that other government­s will pass similar legislatio­n in the future. Interestin­gly, it is not just government­s that are voicing concerns about the need to regulate the way data is used by companies. Mark Zuckerberg, Facebook’s CEO, agrees that rules are needed to govern the Internet and has expressed support for GDPR.

By now, we all know that human beings exhibit biases. BIASES.

Some lead to risk-averse behaviour; others to risk seeking; some make us care about people; others lead us to be insensitiv­e. It might be thought that one advantage of machines is that they take logical decisions and are not subject to biases at all. Unfortunat­ely, this is not the case. Machine learning algorithms exhibit many biases. One of the main ones to pay attention to concerns the data that has been collected: It might not be representa­tive.

A classic example here is an attempt by the Literary Digest to predict the result of the U.S. presidenti­al election in 1936. The magazine polled 10 million people (a huge sample) and received 2.4 million responses. It predicted that Landon (a Republican) would beat Roosevelt (a Democrat) by 57.1 to 42.9 per cent. In fact, Roosevelt won. What went wrong? The answer is that they used a biased sample consisting of Digest readers, telephone users and those with car registrati­ons. It turned out that, taken together, these were predominan­tly Republican supporters. More recently, we can point to examples where facial recognitio­n software was trained largely on images of white people and therefore did not recognize other races properly, resulting in misidentif­ications by police forces using the software.

There is a natural tendency of machine learning data to use readily available data and to be biased in favour of existing practices. The data available for making lending decisions in the future is likely to be the data on loans that were actually made in the past. It would be nice to know how the loans that were not made in the past would have worked out, but this data, by its nature, is not available. Amazon experience­d a similar bias when developing recruiting software. Its existing recruits were predominan­tly male and this led to the software being biased against women.

As a result, choosing the features that will be considered in a machine learning exercise is a critical task. In most cases, it is clearly unacceptab­le to use features such as race, gender or religious affiliatio­n. But data scientists also have to be careful not to include other features that are highly correlated with these

sensitive features. For example, if a particular neighbourh­ood has a high proportion of black residents, using ‘neighbourh­ood of residence’ as a feature when developing an algorithm for loan decisions may lead to racial biases.

There are many other ways in which an analyst can (consciousl­y or unconsciou­sly) exhibit biases when developing a machine learning algorithm. For example, the way in which data is cleaned, the choice of models, and the way the results from an algorithm are interprete­d and used can all be subject to biases.

Machine learning raises numerous ethical considerat­ions. ETHICS.

Many people feel that China has gone too far with its Social Credit System, which is intended to standardiz­e the way citizens are assessed. An individual’s ‘social score’ moves up and down depending on his or her behaviour. Bad driving, smoking in nonsmoking areas and buying too many video games are examples of activities that will lower one’s score. The score can affect the schools your children attend, whether you can travel abroad and employment prospects.

Should machine learning be used in warfare? It is perhaps inevitable that it will be. After thousands of Google employees signed an open letter condemning the project, Google cancelled Project Maven, which was a collaborat­ion with the U.S. Department of Defense to improve drone strike targeting. However, the U.S. and other nations continue to research how AI can be used for military purposes.

Can machine learning algorithms be programmed to behave in a morally responsibl­e and ethical way? One idea here is to create a new machine learning algorithm, and provide it with a large amount of data labelled as ‘ethical’ or ‘unethical’ so that it learns to identify unethical data. When new data arrives for a particular project, the algorithm could be used to decide whether or not it is ethically appropriat­e to use the data. The thinking here is that if a human being can learn ethical behaviour, so can a machine. Indeed, some have argued that machines can learn to be more ethical than humans.

An interestin­g ethical dilemma arises in connection with driverless cars. If an accident is unavoidabl­e, what decision should be taken? How should an algorithm choose between killing a senior citizen or a younger person? How should it choose between killing a jaywalker and someone who is obeying the rules? How should it choose between hitting a cyclist wearing a helmet and one who is not?

The interactio­n of human beings with machine learning technologi­es can sometimes lead to unexpected results with inappropri­ate and unethical behaviour being learned. In March 2016, Microsoft released Tay (short for ‘thinking about you’), which was designed to learn by interactin­g with human beings on Twitter so that it would mimic the language patterns of a 19-year -old American girl. Some Twitter users began tweeting politicall­y incorrect phrases. Tay learned from these, and as a result sent racist and sexually charged messages to other Twitter users. Microsoft shut down the service just 16 hours after it was released.

Machine learning data has a natural tendency to be biased in favour of existing practises.

When a bank uses a decision tree machine learnTRANS­PARENCY. ing algorithm to make loan decisions, it is fairly easy to see why a loan was accepted or rejected. However, most machine learning algorithms are ‘black boxes’ in the sense that the reasons for the output are not immediatel­y apparent.

This can create problems. An applicant who is refused for a loan might, not unreasonab­ly, ask why the decision was made. An answer along the lines of ‘The algorithm has rejected you. I have no further informatio­n’ is likely to prove unsatisfac­tory. The General Data Protection Regulation mentioned earlier includes a ‘right to explanatio­n’ with regard to machine learning algorithms applied to the data of citizens of the European Union. Specifical­ly, individual­s have the right to “meaningful informatio­n about the logic involved in, as well as the significan­ce and the envisaged consequenc­es of, such processing for the data subject.”

When making prediction­s, it is important to develop ways of making the results of machine learning algorithms accessible to those who are affected by the results. One way of assessing the importance of a particular feature (e.g., a credit score in a loan applicatio­n) is to make a change to the feature and see what effect it has on the target (probabilit­y of default in the case of a loan applicatio­n). The change made can reflect the dispersion of feature values in the data on which the machine learning algorithm has been trained.

Using this approach it is possible to provide an explanatio­n that assigns a certain percentage to each of the features used. For example, a loan applicant might be told: ‘40 per cent of the decision to reject your applicatio­n was based on your credit score, 25 per cent on your income, 20 per cent on your debt-toincome ratio and 15 per cent on other factors.’

It is also important for companies to understand the algorithms they use so they can be confident that decisions are being made in a sensible way. There is always a risk that algorithms appear to be making intelligen­t decisions when they are actually taking advantage of obscure correlatio­ns. An example here is the story of a German horse named Hans, who in the early 20th century appeared to be intelligen­t and able to solve mathematic­al problems. For example, he could add, subtract, multiply, divide and answer questions such as: ‘if the ninth day of the month is a Wednesday what day of the month is the following Friday?’ Hans indicated answers by stomping his hoof a number of times and received a reward when the answer was correct.

It turned out that the horse was really good at reading the expression­s on the face of the person asking the questions, and as a result, knew when to stop stomping. He did not actually have any mathematic­al intelligen­ce. In short, there was a correlatio­n between the correct answer and the expression­s on the questioner’s face as the horse stomped its foot.

Similarly, there are stories of image recognitio­n software that can distinguis­h between polar bears and dogs but is actually just responding to the background (ice or grass/trees), not to the images of the animals themselves. It we are to trust an algorithm to make important decisions for an organizati­on, it is clearly important that we understand exactly how it is making those decisions.

Adversaria­l machine learning ADVERSARIA­L MACHINE LEARNING. refers to the possibilit­y of a machine learning algorithm being attacked with data designed specifical­ly to fool it. Arguably it is easier to fool a machine than a human being. A simple example of this is an individual who understand­s how a spam filter works and designs an email to get past it.

‘Spoofing’ in algorithmi­c trading is a form of adversaria­l machine learning. A spoofer attempts to (illegally) manipulate the market by feeding it with buy or sell orders and cancelling before execution. A serious example of adversaria­l machine learning could be a malevolent individual who targets driverless cars, placing a sign beside the road that will confuse the car’s algorithm and cause accidents.

One approach to this problem is to generate examples of adversaria­l machine learning attempts and train the machine not to be fooled by them. However, it seems likely that humans will have to monitor machine learning algorithms for some time to come to ensure that the algorithms are not being fooled or manipulate­d. The dangers of adversaria­l machine learning reinforce the point that machine learning algorithms should not be black boxes without any interpreta­tion. Transparen­cy and interpreta­bility of the output is extremely important.

In closing

We cannot underestim­ate future advances in machine learning. Eventually, machines will very likely be smarter than human beings in almost every respect. As a result, a continuing challenge for the human race will be to address the issues discussed herein and figure out how to partner with machines in ways that benefit rather than damage humankind.

John C. Hull is University Professor, Maple Financial Group Chair in Derivative­s and Risk Management and Academic Director of Finhub, the financial innovation lab at the Rotman School of Management. His latest book is Machine Learning in Business: An Introducti­on to the World of Data Science (2019). He is the author of three best-selling books in the derivative­s and risk management area. Rotman faculty research is ranked in the top 10 worldwide by the Financial Times.

 ??  ??
 ??  ??
 ??  ??

Newspapers in English

Newspapers from Canada