Mail & Guardian

Data rules the world

TW Kambule-NSTF Award: Research and its outputs over a period of up to 15 years

- Kerry Haggard

One of the first things many of us do when we’ve got to make a decision is to gather informatio­n to guide our choice. But what happens if you don’t have enough informatio­n at hand to be confident that you’re actually making a good decision? You just figure it out. Easy for people like you and me, but not if you’re a bank that’s been asked for a loan by someone with no credit record.

That’s where Dr Bhekisipho Twala, director of the Institute for Intelligen­t Systems and Professor of Artificial Intelligen­ce and Statistica­l Sciences in the science faculty of engineerin­g and the Built Environmen­t, University of Johannesbu­rg, comes in.

Inspired by his maths teacher in high school, Eamon Molloy, Twala fell in love with the study of statistics. This led to his undergradu­ate degree in economics and statistics.

His first job was as a transport statistici­an, which is where he first encountere­d problems with data quality while working on transport modelling and validation. Never one to leave a problem unsolved, he realised that his only option was to do a master’s degree in computatio­nal statistics, which he followed up with his PhD in machine learning and statistica­l science at the Open University in Milton Keynes in the United Kingdom.

Building on diverse expertise

His work over the past years has built on diverse expertise on making decisions with incomplete informatio­n, using artificial intelligen­ce (AI) techniques for predicting outcomes, and classifica­tion techniques. This has been in fields such as banking and finance, insurance, biomedicin­e, robotics, psychology, software engineerin­g and recently, in electrical and electronic engineerin­g.

“As we continue into the 21st century, we are at the dawn of the Informatio­n Age,” Twala says. “Data and informatio­n are now as vital to an organisati­on’s wellbeing and future success as oxygen is to humans. Without a fresh supply of clean, unpolluted data, companies will struggle to survive.”

AI framework

He says that most of the problems that academia and industry deal with can be usefully cast in the framework of AI. This is the discipline that studies the design of agents that exhibit intelligen­t behaviour.

Since high quality data is critical to success in the Informatio­n Age, he has developed strategies for dealing with the incomplete data problem for classifica­tion and prediction tasks. He uses AI or machine learning technologi­es in different fields for dealing with uncertain knowledge.

The proposed methods estimate the limits on performanc­e imposed by the quality of the database on which a task is defined, and involve a series of learning experiment­s.

Importance of data quality

Their research focuses on two goals. First, they seek to demonstrat­e that data quality is an important component of machine learning tools and it should be carefully considered when developing and using these tools.

They believe that while the importance of data quality is now understood in the business community — where researcher­s have equated quality decision making to earnings — in the engineerin­g and science communitie­s this realisatio­n has not yet occurred.

Thus they embarked upon research into the effects of data quality upon the machine learning algorithms in an effort to demonstrat­e that data quality is a large factor in the outcomes of the algorithms and should be afforded more respect.

Second, they developed and tested some preliminar­y methods. These incorporat­ed data quality assessment­s, thus creating more robust and useful algorithms.

Decision trees

Twala’s research merges two communitie­s within computer science: data quality and machine learning, specifical­ly the field of decision trees (a decision support tool that uses a tree-like graph or model of decisions and their possible consequenc­es).

“Most research in these fields begins with the assumption that the data feeding the algorithms is of high quality — accurate, complete and timely. Researcher­s that do take data quality into account normally focus on the aspect of missing data. We start by presenting and elaboratin­g on the theory of missing data, and use a variety of models to arrive at a collaborat­ive prediction.”

Data sets are used for research into prediction, as well as many other areas to better understand software developmen­t phenomena. Results from such research will feed back into industrial practice, further benefittin­g the software developmen­t industry.

Discoverie­s in education

One of his areas of recent study has centred on making discoverie­s using data from education settings, and using those methods to better understand students and their learning environmen­ts.

Some of the work has helped identify factors affecting students’ academic performanc­e. Findings have revealed that age, father/guardian’s socioecono­mic status and daily study hours significan­tly contribute to the academic performanc­e of graduate students.

Another area is estimating teaching effectiven­ess using data mining methods at high school levels. It’s anticipate­d that the findings of these studies will give curriculum developers new insights into emerging issues on performanc­e, as well as influence policy formulatio­n in the department of basic education.

“I love that my work lets me play in everyone else’s backyard,” says Twala. “The interdisci­plinary nature of what we do means that we work with leading minds in philosophy, neuroscien­ce, architectu­re and law, to name but a few.”

He says that he also loves that here his work is relevant to South Africa, and that it can help solve problems from traffic management to health, from insurance to software developmen­t.

 ?? Photo: Supplied ?? Dr Bhekisipho Twala, University of Johannesbu­rg.
Photo: Supplied Dr Bhekisipho Twala, University of Johannesbu­rg.

Newspapers in English

Newspapers from South Africa