Public Sector Manager

Trailblaze­r

-

Senior data scientist Nyalleng Moorosi knows how to follow the patterns

Imagine being able to build a computer system that can give clear patterns of human behaviour.

At just 34, Nyalleng Moorosi is part of a team that develops such patterns using raw data collected during research.

She is a senior data scientist at the Council for Scientific and Industrial Research (CSIR).

Data science is a skill which includes machine learning, mathematic­al modelling and computer programmin­g.

One of the patterns that Moorosi is modelling is understand­ing which and how proteins change in sick versus healthy human cells.

“My team and I are working on understand­ing the expression patterns of proteins and how they respond to diseases, specifical­ly

looking at pancreatic cancer in black people.”

She is part of a team of other academics such as biologists who have spent the last year working with specific hospitals collecting samples for analysis from black people who have pancreatic cancer.

“We find out which proteins are present in the sample and by how much, then we do statistica­l analysis to determine the significan­ce or the impact of this protein in the body,” she explained.

“These changes in protein signal how the body responds to a specific disease.This is important because it helps us determine where we may apply the therapy,” said Moorosi.

Collecting data

In a study such as this, one can expect to collect a terabyte of data. Researcher­s spent the whole of last year collecting the data.

Health data is protected by law and there are many hurdles to overcome before obtaining it, but by developing relationsh­ips with local hospitals, Moorosi and her team are making progress.

Moorosi's role is to write the algorithms that will be used to analyse the data and highlight the difference­s between samples and thus hypothesis­e the sources and effects of the disease.

Algorithms are code for computers – in this case the code helps develop a clear plan of how to solve problems.

This three-year study began in 2017.

“[If] you to want to use machine learning or a data scientist it is usually because you have a lot of data and the data is very complex, otherwise you can do the processes manually,” she said.

Moorosi added that machine learning allows the research world to automate and model data which is the beauty of computers because they can deal with massive amounts of informatio­n.

A fascinatio­n with patterns

It was Moorosi's fascinatio­n with patterns and curiosity about human behaviour that made her fall in love with data science.

“I love patterns. I also wanted to understand why people do the things they do. I wanted to understand why things move the way they do and why they flow in certain direction,” she added.

By determinin­g specific patterns and understand­ing the reasons for these patterns, prediction­s can also be made, explained Moorosi.

“For example, we can determine who is most likely to shop at a particular supermarke­t at a specific hour and what the reasons are for them choosing that specific time.”

By understand­ing these trends, the supermarke­t can use them to attract more customers.

Keeping up with trends

In 2016 Moorosi and her team were approached by the SABC to build a system that would give a clear picture of what people were saying about the Municipal Elections on social media.

She was the project leader of this study that she sees as the highlight of her career.

The SABC wanted to know what people were saying on the ground. They wanted a way to listen to social media and get to know the general trends.

“The SABC wanted to know what were the issues that people had that were leading the discussion­s on social media.They wanted a computeris­ed agent that would constantly be gathering tweets and Facebook posts and a way to quantify it.”

The broadcasti­ng agency also wanted an indication of the positive and negative sentiments gathered on social media and what people were saying about different political parties.

“This was extremely hard because there are 11 official languages. We had to dissect the different messages and, for example, determine if a particular Sesotho word was positive or a particular Xitsonga word negative. We needed to go through all the relevant dictionari­es and other documents.”

She explained that the system was built to collect tweets and

also classified them according to specific political parties or topics.

One of the learnings that came out of the project was that social media was not being used across the entire country, as the tweets came from mainly those in metropolit­an areas.

She said if you wanted to know what Gauteng, Cape Town, Nelson Mandela Bay and Durban had to say then Twitter would be a good place to start.

“It was very interestin­g because we plotted a map of where the tweets were coming from and found that Gauteng and Cape Town were red hot.”

The system had to run 24/7 throughout the voting process for about four days.There was a time when it was collecting over one million tweets an hour.

“This was the first system that I was leading which was also live on television. It was really exciting,” recalled Moorosi.

African solutions for African problems

She is particular­ly proud of her field of work because it allows the country to look for “African solutions for African problems”.

“I'm very excited to be African. We have very interestin­g problems. With our cancer project we are specifical­ly studying black people's bodies because those are not represente­d in the data. If you look at the study of protein there are not a lot of African samples,” Moorosi pointed out.

 ??  ?? Nyalleng Moorosi is a senior data scientist at the Council for Scientific and
Industrial Research.
Nyalleng Moorosi is a senior data scientist at the Council for Scientific and Industrial Research.
 ??  ??

Newspapers in English

Newspapers from South Africa