Logistic regression in machine learning is one of the widely used classification algorithms. It is very straightforward to understand and interpret, and hence most of the time, while building a machine learning or data science model, logistic regression is applied if we are working on a classification model.
Although logistic regression is a classification algorithm, it is called a regression algorithm, and it contains the term “regression” in its name as well. This is one of the most famous interview questions related to logistic regression, which is asked frequently in machine learning and data science interviews.
So a gentle question may appear in your mind that, which is correct, either calling it a Logistic Regression algorithm or calling it a Logistic Classification algorithm?
In this article, we will discuss logistic regression and the reason behind calling it a regression algorithm instead of a classification algorithm. We will also discuss the appropriate way to answer this question efficiently in interviews as well.
To understand and answer this confusing concept related to logistic regression, we first need to understand the concept of linear regression and its works.
So let us start with the very basic level, with linear regression.
Linear Regression
Linear regression is one of the most simple statistical machine-learning algorithms that is known to almost everyone exploring the field of data science and machine learning.
Here the data points or the data observations are plotted in dimensions, and then the best-fit regression line is obtained by calculating the slope and intercept of the line. The model calculates the slope and intercept of the line and tunes it later if the model’s performance is not satisfactory.
A simple straight-line equation is used in this case, which is:
Y = mX + C
Where,
Y = Target Observations
X = Training Data Observations
M = Slope of the Line
C = Y-axis Intercept of the line
Now if there is any new query point for the prediction in the model, then the model will have pre-defined values of slope and intercept, and the model will easily be able to predict for any type of unknown dataset.
Note: One of the core assumptions of linear regression is that it works on linear datasets only. If the data fed to the model is nonlinear, then the algorithms might not perform well, and in that case, the polynomial regression can be preferred over linear regression. Which again works the same as linear regression; just an additional term, polynomial degree, is added.
Now, in the same way, let us talk about Logistic Regression, its core intuition, and its working mechanism.
Logistic Regression
If we talk about logistic regression, it is mainly used for classification problems and uses the principle of the perceptron trick, where the best-fit line is obtained according to the patterns of the data, and then the points are classified into their classes depending upon which side of the line they lie.
The working mechanism of logistic regression is the same as linear regression, where the best fit or the regression line is obtained by calculating the best-fit slope and intercept from the data. Here once we get the regression line, then the sigmoid function is applied to the values, which will give the result in probabilities, and thresholding values will be used to classify the data observation in its appropriate class finally.
For example, let us suppose that we have a dataset with a target variable that has two classes. Now here, in this case, the logistic regression will calculate the appropriate slope and intercept of the line which best suits the data. Once the best for the line is obtained, the model will apply the activation function called sigmoid.
This sigmoid function will calculate the probabilities, or it will transform the values of data observation into probabilities for each class (e.g., 0.70 for Yes and 0.30 for No). Now once the sigmoid function is applied, the threshold function is used to get the final class of the data observation from which it belongs. (e.g., 0.70 > 0.30, so final class = Yes)
So, in short, the logistic reg4ression works the same as the linear regression, where the best-fit line is obtained by calculating the slope and intercept that best fit the data, and then with the help of the Sigmoidf function, the probabilities are calculated for each data observations to belong to each class, and whichever probability is higher, the data point is classified into that particular class.
Now let us discuss the proper reason why it is called a regression algorithm.
Is It Logistic Regression or Logistic Classification?
A beginner may confuse in identifying the true nature of the logistic regression as both regression and classification terms are associated with the same.
To clarify, logistic regression is a classification algorithm which is mainly used fgor buinary classification tasks, but the regression term is used with the same due to some historical reasons and its working mechanisms.
So using the term Logiostc Regression is the correct way to call the algorithm, but as per its working mechanisms and nature, it is used for classification problems, not for regression problems.
Note: For regression tasks, the logistic regression can not be used, and if used, the model will perform very poor as the algorithm is designed for classification problems, and in the case of regression problems, it will assume all the data points as a single category which will result in the very poor performing model.
The Reason Behind Calling Logistic Regression a Regression Algorithm
We saw that the working mechanism of logistic regression is almost the same as linear regression, where the model’s ultimate goal is to find the best first line that suits the data points, which is obtained by tuning the slope and intercept values of the line.
As it works very similarly to the regression algorithms, the term regression is used with the Logistic Regression. Although, theoretically, there are mainly three reasons behind calling it a regression algorithm. Let us discuss those.
1. Mathematical Formulations
As we discussed that the logistic regression first finds the best-fit line that suits the data points and then uses the sigmoid function. Here the best-fit line is obtained with the help of a straight-line equation, and the same mechanism is true for other regression algorithms. Due to the same mathematical formulations, it is called a regression algorithm.
2. Connection to Linear Regression
If we know the working mechanism of linear regression and logistic regression, we can easily understand that the working mechanism for both of the algorithms is almost the same; just the additional sigmoid function is used in logistic regression to get the probabilities. Due to the strong connection with linear regression, it is called a regression algorithm.
3. Historical Naming Conventions
In earlier times, when statistical methods were used, the term regression was used for the algorithms which work on the basis of best-fit lines and tuning the same. When logistic regression was introduced, due to its working mechanism, it was termed logistic regression, and later also, the same naming convention was followed and called logistic regression only.
Now let us discuss about some of the key points to remember from this article.
Key Takeaways
1. Linear regression calculates the appropriate slope and intercept for the dataset, and the regression line is obtained, which will be used for prediction.
2. Logistic regression also works the same as linear regression; just here, the additional sigmoid function and threshold values are used.
3. Logistic regression is also called perceptron algorithms, as the working of the perceptrons in deep learning is the same as logistic regression.
4. The mathematical formulations of logistic regression are the same as other regression algorithms, which is one of the strong reasons behind calling it a regression algorithm.
5. Logistic regression can be considered as an extension of linear regression, which is almost similar to it; just an additional activation function or sigmoid function is used.
6. When logistic regression was introduced, it was named logistic regression due to its similar working mechanism to the regression algorithm, and the same naming convention is followed in current times as well.
Conclusion
In this article, we discussed linear regression, logistic regression, its working mechanisms and nature, and three main reasons behind calling logistic regression a regression algorithm.
This article concludes that although logistic regression is a classification algorithm that works on the basis of best-fit lines and sigmoid function, it is termed a regression algorithm as the working mechanism of logistic regression is almost the same as other regression algorithm.
The term regression is sued with the logistic regression from the times when it was introduced, and the same naming convention was followed.
This article will help one to understand the proper working mechanisms of linear and logistic regression, the true nature of logistic regression, and the appropriate reasons behind calling logistic regression a regression algorithm. It will also help one to answer the interview questions related to the same very easily and efficiently.