In the age of Information technology in which data is the new oil and AI(Artificial Intelligence) & Applicable Insight are the new refined products from this new oil – algorithms act as the refinery and the platform for getting the best of the mined data. In the petroleum industry having the crude oil is not enough you will need to also have massive state of the art refineries and factories to process the oil into useful products.
In the same way having data is not enough you need to know the following:
- The type of data you need
- Type of algorithm
- How long it will take you to derive useful and applicable insights from the data
- How to profit from the insights mined from the data.
Let us see some of the factors we need to consider when choosing a Machine Learning Algorithm – the refinery for our new oil -“data”.
Type of ML Algorithms
To begin let us familiarize ourselves with the types of ML based on the problem wewant to solve and how the data is.
Supervised ML (Machine Learning) : This involves working with a labelled data. In this instance the computer is trained on a labelled data. The main problems this type of ML can be used to solve includes:
- Classification Problems
- Regression Problems
Unsupervised ML (Machine Learning): This involves working with an unlabelled data. The main problems this type of ML can be used to solve includes:
- Clustering Problems
- Segmentation Problems
- etc
There is also Semi-Supervised ML and Reinforcement Learning.
Another group of ML includes Deep Learning,Transfer Learning etc.
How to Choose ML Algorithms
So how do you choose a Machine learning algorithm? These are some of the things to consider when choosing an ML algorithm.
- Type of Problem
- Data at hand
- Type of Data
- Size of Data
- Nature of Data
- Features you can extract from the data
- Speed and Compute
- Accuracy
- Domain
Let us take this things one after the other.
Type of Problem
In choosing a machine learning(ML) algorithm, the first factor you need to consider is the type of problem you want to solve . This will inform and direct you on the data you need and how to obtain them. Hence after you have defined the problem you can now select based on the problem , how to collect the data and how the data will be in the end – whether it will be structured or unstructured or semi-structured data or labelled or unlabelled data,etc.
After this you can now know for sure whether you are dealing with a classification,regression,clustering or other wise.
So if you collected or have a labelled dataset you can now think of either identifying and predicting the classes,cluster or ranges within your data.
Data
The next most important factor to consider is data. Under data we can talk about the following
- Type of Data: Is it labelled or unlabelled? Is it numeric or categorical or a mixture? If you want a numeric output then it can be seen as a regression problem so use regression algorithms such as LinearRegression,etc. If you want a categorical output then you should see it as a classification or clustering problem and then use classification algorithms.
- Nature of Data: What is the nature of the data? Is it a linear or non linear?
- Linear Nature of Data
- Logistic and Linear Regression
- SVM
- Regression Algorithms
- Non-Linear Nature of Data
- Random Forest
- Xgboost
- Linear Nature of Data
- Size of Data: How big is the data? This is a very important feature to consider when choosing an algorithm because it will influence almost every thing you will need from here – such as compute, tools and libraries,programming language you will need,etc. So for basic dataset such as below 1GB these are some algorithms you can use.
- Small Dataset: Naive Bayes
- Large Dataset: KNN
- Speed and Compute: As we stated above speed and compute is important when choosing an algorithm. It goes hand in hand with the size of data and how long you need to train with your data as well as the accuracy you want. More clean data means longer time to train and hopefully better accuracy. Some ML algorithms based on speed include
- Faster Algorithms
- LogisticRegression
- Naive Bayes
- Xgboost
- Linear Regression
- Slow Algorithms
- Random Forest
- SVM
- Neural Networks
- Faster Algorithms
- Number of Features & Dimensions: The number of features should be considered when choosing an ML algorithm. If you are dealing with higher numbers of features then SVM is a good option.
- Accuracy: How accurate do you want your prediction to be? What metrics are you interested – accuracy,precision,recall,error,R-score,etc. This should also be considered when choosing an ML algorithm.
To conclude,we have seen some important factors to consider when choosing a Machine Learning algorithm for your task or project. However these are not the only factors. Let us know some other factors you use when choosing your ML algorithm .
Thank You For Your Time
Jesus Saves
By Jesse E.Agbe(JCharis)