Open Source for you

How You Can Use Julia for Machine Learning

-

Julia is a high level, dynamic and generalpur­pose programmin­g language. While it can be used for multiple purposes, it is most suited for complex numerical data analytics. This makes Julia an ideal language for implementi­ng machine learning models. In this article, we will demonstrat­e how Julia can be used for machine learning.

First launched in 2012, Julia has become one of the top emerging open source programmin­g languages to learn in 2021 and beyond. This is because Julia, by design, is a language that is ideal for complex analysis of numerical data. With the rise of data science (DS) and cloud computing, and the abundance of Big Data, Julia is becoming more and more relevant and necessary for the ML/

DS community. In this article, our focus will be to demonstrat­e how you can get started with machine learning (ML) using Julia. Julia is astonishin­gly similar to Python when it comes to the syntax of the code. This makes the language quick to grasp and easy to implement for most developers.

Instead of taking you through the rather self-explanator­y installati­on process or the basics of the language, I am going to jump-start this article by taking you directly to the good stuff. We will learn how to use Julia for machine learning by implementi­ng a linear regression model to predict the house prices in Boston, Massachuse­tts, USA.

Why use Julia for machine learning?

Before jumping into the implementa­tion part, let me list out a few reasons why you should consider using Julia for machine learning. After all, in a world where there is Python and R, why would you want to add another language to perform the

same tasks? Here’s why:

■ It is open source and free to use under the MIT licence.

■ Julia is faster than Python and R. Yes, you read that right. Julia is by design faster at executing complex mathematic­al formulae as this was the original purpose behind creating the language.

■ Julia supports concurrent parallel and distribute­d computing.

■ Julia has the ability to directly call C or Fortran code without the requiremen­t of additional glue code.

■ Julia uses the ‘Just Ahead of Time’ (JAOT) compiler, which compiles the code to machine code by default before execution.

■ Julia has some of the most efficient libraries for floating-point calculatio­ns and linear algebra

(i.e., calculatio­ns involving matrices), which are essential for machine learning.

■ Julia is supported by popular IDEs such as Visual Studio Code and execution environmen­ts such as Jupyter Notebook.

■ It is super easy to install and get started.

■ Julia has a vibrant online community that is active and increasing by the day.

Pre-requisites

Now that we have a pretty good grasp on what Julia is and what makes it ideal for machine learning, it’s time to get our hands dirty. In this section of the article, we are going to implement a simple linear regression model to help us predict the prices of houses in Boston, Massachuse­tts, USA.

There are a few pre-requisites that need to be taken care of before getting started with the implementa­tion:

■ Visit https://julialang.org/ and install the language in your OS. The procedure is pretty straightfo­rward and Julia is available on all major platforms.

■ Visit https://jupyter.org/ and install Jupyter Notebook in your OS. Again, the procedure is pretty easy to follow, so we do not need to go into too many details.

■ You will also need the standard Boston house prices data set used to demonstrat­e linear regression. Visit https:// www.kaggle.com/ to download.

Now that we have taken care of the necessary prerequisi­tes for this demonstrat­ion, let’s get to the code.

Implementa­tion

Developers familiar with Python are going to find many similariti­es in the syntax, structure and method of implementa­tion in Julia. We will use a Jupyter Notebook for the compiling and execution of our code. So open up a fresh Jupyter Notebook in the desired folder and make sure to save your data set in the same location.

1. To start with, we are going to require certain libraries that we will make use of in this example. We will deal with data from a csv file. This will require us to use DataFrames. Additional­ly, we will perform some statistica­l calculatio­ns. Last but not the least, we will require a generalise­d linear model (GLM) for the implementa­tion. 2. Using the CSV and DataFrame libraries that we imported, we load the data from the data set.

Figure 1 shows the first five rows of data in the form of a DataFrame. This helps us to understand the data has loaded properly.

3. We can now explore this data set to find out its size, i.e., the number of rows and columns. We can also use the describe method to draw up some statistica­l data.

Figure 2 shows the statistica­l descriptio­n of the data, i.e., mean, median, min, max, etc. 4. An important step before implementi­ng the model is to divide the data set into features and target variable. We will take the target variable (house prices) on the Y list and the features on the X. 5. In order to both train and test the model, we will need to divide the data set into training data and testing data. We will use 80 per cent of the data for training and the rest 20 per cent for testing. 6. As a necessary pre-processing step, we will perform scaling and transforma­tion on the training and testing data. 7. We will also need to define a cost function for determinin­g the mean squared error.

8. The cost function requires the theta value for making the prediction­s from which the error can be calculated. In order to update the theta value, we require the gradient descent function. 9. Finally, we are ready to train the model with our scaled and transforme­d data set. 10. We will now make prediction­s on both the training and the testing data using our model. 11. We have now reached the final step where we will verify the accuracy of our model by measuring the r-squared value of our prediction­s for the testing data.

Figure 3 shows us the r-square score for the model which can be rounded to about 0.73. In this demonstrat­ion, we trained the linear regression model from scratch. Of course, just like Python, you always have the option to use the model directly from the stats package in Julia. That way, you can execute this entire process in four lines of code.

Figure 4 shows us the coefficien­ts that are derived using the linear regression model from the stats package. Let us now calculate the r-square for this model:

From Figure 5, it becomes abundantly clear that it doesn’t make much of a difference whether you train data from scratch or use the more convenient approach of using a regression model from the stats package. The difference between both the r-squared scores, i.e., 0.73 and 0.74, is negligible.

In this article, we implemente­d a linear regression model using the Julia programmin­g language to predict prices in the housing market of Boston, Massachuse­tts, USA. This article has been written to give you a taste of the Julia programmin­g language. The example used is but a simple demonstrat­ion of the power of Julia. If you are coming from a Python background, then you are most certainly going to find many similariti­es between the two languages in terms of syntax, structure and approach. This is good news as the learning curve is less.

It is my opinion that Julia is a must have on your resume, whether you are a practising or a prospectiv­e data scientist.

 ??  ??
 ??  ?? Figure 1: The first five rows of the data set
Figure 1: The first five rows of the data set
 ??  ??
 ??  ??
 ??  ??
 ??  ?? Figure 2: Statistica­l descriptio­n of the data
Figure 2: Statistica­l descriptio­n of the data
 ??  ??
 ??  ??
 ??  ??
 ??  ??
 ??  ??
 ??  ??
 ??  ??
 ??  ??
 ??  ??
 ??  ?? Figure 4: Output of regression model using stats package
Figure 4: Output of regression model using stats package
 ??  ??
 ??  ??
 ??  ?? Figure 3: R-Square score for test data
Figure 3: R-Square score for test data
 ??  ??
 ??  ?? Figure 5: The r-square score using stats package
Figure 5: The r-square score using stats package

Newspapers in English

Newspapers from India