
In machine learning, no free lunch algorithm or NFL is often used in many problem statements and model building, and it is essential to know about the same. This term or theorem is mostly used when two machine learning algorithms are being compared for their performance on a particular problem statement and the dataset.
In this article, we will discuss the no free lunch theorem, what it is, what is the core intuition behind it, what it state, and the reason behind naming it the no-free lunch theorem. This article will help one to clear the idea behind the no free lunch theorem and will be able to answer interview questions related to the same.
Now before directly jumping into the no-free-lunch theorem, let us discuss the parametric algorithms, as it is essential to have an idea about it before discussing the no-free-lunch theorem.
Paramteric Algorithms
In machine learning, there are mainly two types of algorithms based on the assumptions that are made during the training of the model.
- Parametric Algorithms
- Non-Parametric Algorithms
Parametric algorithms are the type of algorithms that takes assumption before training with the data. In simple words, they are the algorithms that use any type of function to train and build a model, and hence they do have certain assumptions for the data which should be satisfied in order to have an accurate and reliable model.
On the other hand, nonparametric algorithms are the type of algorithms which does not have any assumptions for the data while training the model. These type of algorithms recognizes the patterns and behavior of the data and learns from the data rather than using any specific function to train and predict the data.
This type of algorithm only works well if the assumptions of the model are satisfied. Although these algorithms are faster than the nonparametric algorithms, as nonparametric algorithms do not have any assumptions or functions while training, and hence they recognize the patterns of the data while training data, which is a time-consuming and complex process.
Although the learning scope of the parametric algorithms are shorter than the nonparametric algorithms as they can be trained on such datasets where the assumptions are satisfied, and hence very few datasets or very few problem statements can be solved using the parametric algorithms.
Examples
Linear regression is a parametric algorithm that assumes the data to be linear, and hence it can only be applied to the linear type of data. If we apply linear regression to non-linear data, the straight line would not be able to classify between classes, and hence it will perform poorly.
The naive bayes si a parametric algorithm that assumes the data to have no multicollinearity, meaning that there should be any correlation between the independent columns of the dataset; hence if we have very high multicollinearity in the dataset, we may fail using naive bayes.
On the other hand, algorithms like decision trees and artificial neural networks are nonparametric algorithms which do not have any assumption for the data and hence can be used for almost any type of data.
Now let us discuss the Perfect Model terminology before discussing the no free lunch theorem.
The Perfect Model

In machine learning, to achieve high performing and reliable model, we mainly need two things, good quality data, and the best suitable algorithm for the same.
These two things are very important while training, building, and predicting the model, and they affect the performance of the model directly.
Now If we are training different models with different algorithms on a single dataset, then we might observe that the performance of different algorithms varies on each and every dataset, and hence the algorithm performs well that suits the algorithm, its backend logic, and the assumptions that are made by the algorithm before training the model.
Similarly, if we check the performance of a single algorithm on different datasets carefully, we might observe that the algorithm does not perform the same on each and every dataset. Instead, it performs well for some datasets and poorly for others. And hence, it can be said that there is no single best-performing algorithm for all the datasets.
So from that, we can simply say that the perfect model is the perfect combination of the dataset and the algorithms, and the best-fit combination of data algorithm should be selected to achieve high performing and reliable model.
Now let us jump to the core intuition behind the no-free-lunch theorem.
What is the No Free Lunch Theorem?
As we discussed in the above section, to achieve the perfect model, we can tune mainly two things, the data, and the algorithm. Now according to every problem statement, data is fixed, so it can not be changed; only it can be cleaned and preprocessed, which does not change the behavior of the data entirely, and hence we have only one option left, which is tuning the algorithm selection.
Machine learning algorithm selection is one of the essential steps while building a model. No matter how good quality data we have or how much quantity of the data we have, the selection of an appropriate machine learning algorithm is always an essential and very important step in model building.
Now we can not directly see the data and decide that the particular algorithm will perform best and we can use it. We first try different combinations of the algorithms on the same dataset and check for their performance, and according to the performance of different algorithms on the dataset, we decide the best-fit algorithm for our problem statement.
Now it is quite obvious that all algorithms are not the same; they do have the same working mechanism, and they are not designed for a single type of problem statement; hence all of the algorithms will perform differently on the different problem statements.
It is said that if we have two algorithms, A and B, and if we apply these two algorithms on several noise-free datasets, the average performance of these algorithms will be the same. Although the perform,nace of this algorithm on a particular dataset can be higher or lower than the other, the average performance over all the problem statements or datasets will be the same.
Meaning that there is no single best-performing algorithm in machine learning for all types of problem statements. It totally dependent on the data, problem statement, and our requirement, and hence all of the algorithms perform differently, but when their performance is averaged, it is the same across all the datasets or the problem statements.
So the no-free lunch theorem simply states that there is no single best-performing algorithm for all types of problem statements and datasets, and hence the best-fit combination of the algorithm with the data set and problem statement should be selected to achieve an accurate and reliable model.
On a particular dataset, the algorithm might perform well if the patterns and behavior of the data suit teh backend logic of the algorithm and the assumptions made by the algorithm while training the model. On the other hand, some algorithm does not perform well on the same dataset if these assumptions are not satisfied and the data does not suit the working mechanism of the algorithm.
For example, on a linear dataset, the linear regression and neural networks both work well, but the linear regression would be a wise choice as it consumes less time and will be efficient for model building. But in the case of the nonlinear dataset, the assumption of the linearity for linear regression will not be satisfied, and the algorithm will not perform well.
In a nutshell, the no free lunch theory simply means that there are no algorithms designed for all types of problem statements, and the performance of each algorithm may vary based on the data patterns and data behavior. The algorithms with suitable backend logic, working mechanism, and satisfied assumptions with the data perform best out of all the other algorithms.
Now let us discuss the reason behind naming this theorem a “no free lunch.”
Why is it Named “No Free Lunch”?
While having lunch or dinner at a hotel or restaurant, we basically order the food that we like the most. We do not get any food for free; there is no scheme like that, and hence we order food that is best fit for us, based on our liking and our budget of the same. So to have something of our choice, we have to pay for the same as well, as nothing is free.
Similarly, in machine learning, we want a best-performing model with all the possible options available to us to train and tune the model. Now as there is no single best-performing algorithm, we make assumptions or use assumptions of different machine learning algorithms. And hence by taking this assumption, we limit the use of particular algorithms on some problem statements, but on the other hand, we also increase the performance of the model on some problem statements.
For example, assuming the data to be linear in linear regression limits the use of linear regression on nonlinear datasets but also increases the performance and efficiency of the algorithm on linear datasets.
Hence the limiting assumption that we make for the algorithm and data while training a model is like the fees or the price that you will be paying for lunch. So, in the end, to get the best-performing model, you end up paying with assumptions of the algorithm, which limits the use of the algorithm in some cases but can achieve the best-performing model for the current case.
Hence the no free lunch name comes from here where achieving the best-performing model in machine learning is not free; for that, we have to pay something, and we end up paying limited use case of the algorithm by taking assumptions with the same.
Features of No Free Lunch Theorem

There are mainly three features of the no-free-lunch theorem. Let us discuss them one by one.
1. Universality
As we discussed above, the no free lunch theorem states that there is no free single algorithm best fit for all problem states and datasets, and that applies to all the algorithms, datasets and problem statements.
2. Trade-Offs
As we discussed above, there is a trade-off between the algorithm and the dataset, and the best for combination should be selected to achieve the best-performing model.
3. Averaging of Performance
The theory states that the average performance of an algorithm on different problem statements and datasets is the same. Meaning that if we compare the average performance of algorithms A and B on several other problem statements, then the average performance of both algorithms will be the same.
Key Takeaways
1. Parametric algorithms assume certain assumptions for the data while training.
2. Non-parametric algorithms do not assume any assumptions for the data, and it recognizes the patterns of the data while training only.
3. Parametric models are faster than non-parameters as they use functions while training and testing the data.
4. Different algorithms perform differently on a single data, and a single algorithm performs differently on different datasets.
5. There is no single best-performing model for all the problems and statements.
6. The algorithm that best suits our data and algorithms will be the optimum model for our problem statement.
7. While training the model, taking assumptions and making use of the algorithm limited is the fees or price that we pay to get the best-performing model.
Conclusion
In this article, we discussed the no-free lunch theorem, what it is, what the core idea behind it is, how it is applied to different algorithms, and why it is named no-free lunch.
To conclude, every algorithm is different, every dataset is different, and there is no single best fir combination of the same. Different combinations should be tried, and the assumptions taken during the training of the model, which limits the use of teh algorithm in some cases, are the fees that we pay for achieving the best-performing model.
This article will help one to understand the core idea behind the free lunch theorem, meaning of the term no free lunch, and will help one to answer interview questions related to the same easily and efficiently.