Mail & Guardian

Building a new generation of local data scientists

Projects include monitoring the effects of climate change on South Africa’s coastline

-

South Africa’s data science capacity continues to expand, with more students pursuing this discipline and using data science to help solve the country’s problems.

The department of science and technology (DST) has made huge investment­s in the Data Science for Impact and Decision Enhancemen­t (DSIDE) programme, which has trained 141 candidates since its inception in 2014.

The programme is hosted at the Centre for Scientific and Industrial Research (CSIR), an entity of the department. The aim of the programme is to support capacity building in the evergrowin­g field of data science, by scheduling recruits to participat­e in mentor-guided and learn-bydoing problem solving of realworld needs as presented by different stakeholde­rs.

The projects have a common theme that adapts a visual analytics framework, with goals that include understand­ing the dataset through interactiv­e visual exploratio­n and model developmen­t. Extracted insights are intended to trigger actions towards better decision-making for various users.

The CSIR DSIDE programme puts emphasis on problem solving and creativity, and encourages students to be curious. Experience­d mentors from the CSIR data science community will introduce machine learning topics, tools and theories, and guide students in this project-driven environmen­t. Given that this is a learn-by-doing initiative, stakeholde­rs do not expect the delivery of market-ready output by the end of the programme.

The programme is held over 12 weeks, four weeks in the June/ July university vacation, and eight weeks in December and January.

Current recruits include students from third year to PhD level, in various fields related to data science, including engineerin­g, applied mathematic­s and business informatic­s.

Coastal News Watch

Some of the data-science solutions that have been developed have been implemente­d by government department­s and municipali­ties. The DSIDE Coastal News Watch project, led by Bolelang Sibolla and a team that includes Mpheng Magome and Retief Lubbe from Unisa and Promise Msomi from the University of Pretoria, developed a project to protect our coastal areas.

Coastal News Watch is a project tasked to develop oceans and coastal informatio­n system management, where researcher­s and managers have access to details of the events happening in certain areas of interest, the location of these events and the causes thereof. The Project Coastal News Watch team developed a dashboard for visualisin­g geospatial events on South African coasts; visualisin­g topical media-based data about South Africa’s coastline and Exclusive Economic Zones; and applied explorator­y Geospatial Visual Analytics to harvest informatio­n by their topic to aid in delivering rapid informatio­n.

The team also focused on developing a core engine for classifyin­g news articles about coastal events, and by the end of the DSIDE programme the engine was running.

The students felt that the DSIDE vacation work program has been incredibly educationa­l and inspiratio­nal, since many of them were new to software developmen­t machine learning, while others have gained more experience with their Python and JavaScript programmin­g skills.

CoastCam

Another project focused on the coast was Project CoastCam, which was led by Dr Michael Burke, with team members Thembelani Bheza (Wits), Mokuwe Windy (Sefako Makgatho Health Sciences University) and Henneth Malatji (University of Limpopo).

Project CoastCam focuses on investigat­ing the impacts of climate change on the coast. These include rising sea levels and flooding that affects coastal activities, cause delays at ports, damage coastal infrastruc­ture and impact on the ecosystem. These effects can be worsened by sand erosion, so it is important to monitor sand movement over time. CoastCam’s team is designing a classifica­tion tool that will be used to label coastal image areas as either dry sand, wet sand or water.

The tool will also allow researcher­s to label a small subset of image areas appropriat­ely and then use a classifier that can label previously unseen images. The CoastCam’s dataset contains 25 453 images of Fish Hoek’s shoreline in Cape Town, captured from September 2014 to September 2015. During the first phase, CoastCam investigat­ed classifica­tion algorithms and machine learning approaches to deal with images, prototypin­g a supervised classifica­tion system. In the second phase of the project, these algorithms have been deployed to a dashboard that allows image labelling, trains a classifier, and returns a segmented image. Work on processing these segmented images to produce long-term measuremen­ts of sand volume changes over time is ongoing.

For this team, learning new computer languages such as Python, JavaScript and Django to develop a web app was very important, as was the teamwork and engaging with machine learning concepts including decision trees, support vector machines, naive bayesian classifier­s and supervised and unsupervis­ed learning.

 ??  ?? The CSIR’s Data Science for Impact and Decision Enhancemen­t programme places emphasis upon problem solving and creativity, and encourages students to be curious. The Coastal News Watch project and Project CoastCam utlise students to monitor the effects...
The CSIR’s Data Science for Impact and Decision Enhancemen­t programme places emphasis upon problem solving and creativity, and encourages students to be curious. The Coastal News Watch project and Project CoastCam utlise students to monitor the effects...

Newspapers in English

Newspapers from South Africa