Sharing is Caring

Get Access to
the Machine learning & many more courses, click on the link below

Data science has been termed as sexiest job of the century or electricity of 21^st century. And if you are interested in becoming a data scientist, machine learning engineer or business analyst. You are on right path.

Only problem is, this path is hell of confusing.

I have used three terms- data scientist, machine learning engineer & business analyst.

Please allow me to increase the confusion.

Do you want to learn machine learning, data science, predictive analytics, deep learning, business analytics or artificial intelligence?

Better!

I am sure “NOT”.

I have written a detailed post on machine learning vs deep learning vs artificial intelligence; you should check it out at this link- machine learning vs deep learning vs artificial intelligence.

Download android app for better experience.

Download android app

If you are still not sure about business analytics, predictive analytics or keywords related to data science here, let us spend some time understanding these terms so that you can plan your transition better.

For now, I will refer to the position as data scientist to make it simpler and easy to understand for you.

Because if you are serious about becoming a data scientist & looking to start your learning journey, you must understand these concepts and differences between predictive analytics, machine learning, data science and other such terms.

In this post, I am going to focus more on top machine learning algorithms or models because the field is simply huge. If you are just starting out, you will find it extremely hard to figure out the starting point.

We will talk about these top machine learning algorithms for beginners who are looking to become a data scientist.

Please note list mentioned here is not complete or exhaustive, it is intended to help beginners. If you are new to data science, you should start learning these and then continue to learn more based on your interest.

Alright coming back to machine learning algorithm, just to highlight one thing, they are also referred as ml algorithms or machine learning techniques, so do not get confused.

In this post, we will also talk about deep learning algorithms, but we will not go into these details in this post. Idea here is to help beginners in starting their learning. To begin with machine learning, you need to start with models mentioned in this post.

After you are comfortable with these models, you can start focusing on deep learning if that is something exciting for you.

But first, short description of these different terms to make sure you understand the concept.

Let me put few screenshots from my previous post to make it easier for you to refer. As I mentioned earlier, please refer to “machine learning vs deep learning vs artificial intelligence” post for more details.

After you have acquainted yourself with the concept of machine learning, you need to carve out a learning path for you.

This path should always start with the models described here. These are most common, most important and starting block of machine learning. Alright so, let us start with machine learning algorithms.

Just to add few words on predictive analytics, it is a pre-existing technology part of advance analytics which could use machine learning or statistical model for prediction. But predictive analytics model is essentially a static model which needs to be updated if data has changed whereas machine learning essentially can train and update itself in realtime.

Top machine learning algorithms

As you can see in this picture, we have 8 most common machine learning algorithm. This list comprises of supervised and unsupervised machine learning models.

If you are not sure about supervised machine learning algorithms or unsupervised machine learning algorithms, here are the details.

Supervised machine learning

When you are using labeled data, you are using supervised machine learning.
Model use labels to produce output like logistic regression models which can produce classification values based on the labels.

Unsupervised machine learning

When you are using unlabeled data, you are using unsupervised machine learning.
Models need to extract features and understand patterns like clustering where model groups together similar data elements based on its understanding.

This is a very high-level classification of supervised and unsupervised machine learning algorithms, but you get the point.

In a nutshell, in supervised machine learning models you are supplying information about the data and asking models to predict a future value or class based on supplied information whereas in unsupervised machine learning models, you don’t supply any information about the dataset and model need to develop its own understanding based on features and patterns.

Alright let us continue with these top machine learning algorithms and then we will talk about most common and starting point for deep learning.

Please note I am only mentioning their names and very basic details. Discussing these models in detail is beyond scope for this post because each of these models will require different post to address them. if you are interested in learning these models with R, you can checkout my course. Please click here to check the curriculum and other details.

Supervised machine learning models

Linear regression algorithm

There are two types of linear regression models, simple linear regression models and multiple linear regression models.

Simple linear regression models

Linearity is a mathematical concept which means that relationship between variables can be expressed by a straight line.
Simple means that we are analyzing only 1 explanatory variable.
In linear regression, relationship between output (dependent variable) and input variable (independent) is measured.

To watch a video on linear regression models, click here.

Multiple liner regression

It is used when you need to explain relationship between 1 dependent variable and more than 1 independent variable.

Logistic Regression

Also known as logit model.
Logistic Regression is used for classification problems.
It provides probability of certain event.
Logistic function is used in logistic and produces an output between 0 & 1. It uses the concept of threshold value to convert values to 0 or 1.

CART – Classification and regression tree

Assume there are 2 labeled classes, presence of 2 labeled class indicate that data is not completely pure, if we separate data and each dataset has only 1 labeled data class, that will be pure dataset. Example of 2 classes could be- defaulter and non-defaulter in case of a loan dataset.
To create splits or determine purity, you use gini index of the node.
After identifying the class, usually a dataset will have 1 labeled class and multiple variables, to determine which variable (independent variable) is the strongest variable, you calculate gini gain for each variable.

Time series algorithms

In machine learning & predictive modeling, certain independent variables are used to classify or predict dependent variable.
But if you do not have independent variables (predictors) and you only have 1 responsive variable like sales data, you use time series forecasting.
It is also known as time series analysis.
Time series include measuring a variable over time.
Unit of time is constant. If you are using week to measure the variable like sale, you will use week for entire time frame.
Overall, in time series analysis you use historical movement of variable in the given time horizon.

KNN- K nearest neighbor

It is frequency-based machine learning algorithm.
K represent the number of neighbors you use for comparison.
NN = nearest neighbor.
It is good for large amount of data.
It measures the distances between each data point.
Not impacted by variance and covariances.
It is used for classification.

Naïve Bayes

It is used for pattern recognition.
It is based on Bayes theorem.
It is a conditional probability model.
It is also a frequency-based model.

Random Forest- Ensemble- Bagging algorithm

Random forest is an ensemble technique.
Random forest is a bagging method.
It is a supervised machine learning technique.
Random forest is a classifier.
Random forest in R does not support variables with more than 53 categories so you will need to drop variables if your dataset contains more than 53 variables.

Unsupervised machine learning algorithms

Principal Component Analysis

Also known as PCA
It is an unsupervised machine learning technique
It is a dimension reduction technique.
Objective of PCA is to summarize the correlation among a set of observed variables with a smaller set of linear combinations.
We perform Principal component analysis using singular value decomposition (SVD) or using princomp in R. Various principal components are orthogonal to each other and whatever information is present in PC1, is not present in PC2 & vice versa.

K-Means clustering unsupervised machine learning algorithm

Clustering is considered an unsupervised machine learning technique.
It is also used as a data reduction technique.
You should know how many segments there in the data are. You need to use your domain expertise while developing the model.
K-means popular applications include image and information clustering along with data mining.
K-mean is partitioning, non-hierarchical clustering technique.
K in K-means represent the number of clusters.

Deep learning

It is the subset of machine learning. In case of deep learning, you do not need to specify features.
It is based on neuron and human brain’s functioning.
Neuron (aka nerve cell) is unit of human nervous system.it is like wire, used to conduct stimuli. Our body contain senses in the form of receptors and lots of neurons come together to deliver the response from these receptors to brain. For example, if we think of lifting anything, our brain will use neurons to send this signal to hand (muscles) which will then lift the object. Our entire body contains stacks of neurons.
Deep Learning is based on mimicking this behavior, deep learning is about training these so-called neural networks to deliver desired results.

Common and starting point for learning models.

ANN- autonomous neural network
CNN- convolutional neural network
GAN- generative adversarial neural network
RNN- Recurrent Neural Networks
LSTM- Long Short-Term Memory Networks

Please click here to learn more about deep learning.

Final words on machine learning algorithm

As mentioned earlier in the post, if you are just starting out, start with machine learning models where you cover all the mentioned algorithms here including neural network but leave deep learning models. Once you have gone through the machine learning models which are mentioned here, you can start with deep learning if required.

Deep learning has differentiated into a separate industry itself due to its application. So, you need to, not only learn these models, but also additional technologies like tensorflow or pytorch.
But deep learning is great, and a lot of people just love it. But if you directly jump into it and start learning it, you will not be able to understand the concept behind machine learning and why deep learning is needed at first place.
Therefore, start with supervised and unsupervised machine learning algorithms mentioned in this post and take it forward.
Please leave your questions in comments.

If you are just starting out, start with installing python and setting up your system with it. Click here to learn it.

Interested in being a data scientist, click here to learn how to become one.

Click on links below to learn critical concepts related to machine learning and data science;