In machine learning, probability and likelihood are one of the most important and confusing terminologies, which come across when implementing almost all traditional machine learning algorithms. It is very important to have an idea about these terminologies and the core idea behind them, as they look and work almost the same.

In this article, we will discuss the probability and likelihood in machine learning, their core intuition, the formulas for normal and Bernoulli distributions, and the key differences between them. This article will help one to understand the logic behind probability and likelihood and will help one to understand the distinct differences between the same.

So let us start by discussing the event and parameters terms first, before directly jumping into the probability and likelihood.

What is an Event?

The event can be understood as the observed data that we are using. It represents the specific values of the dataset and the different actions that are performed on the same.

Example of an Event

If we have a fair coin, then tossing this coin can be considered an event, observing different heads and tails can also be considered an event.

If we have the data on people’s height, then measuring this data can be considered as an event, or selecting a random person from the dataset can also be considered an event.

What are the Parameters?

The parameters are the unknown variables or quantities using which we are trying to predict with the model. They are the values using which we calculate the probability and likelihood and which help the model predict the unknown values.

Example of Parameters

If we have a fair coin, then tossing it and getting the probability of heads 0.5 can be considered as a parameter. (P(H) = 0.5)

If we have a dataset of people’s height, and if it follows a normal distribution, then the mean and standard deviation of the distribution can be considered as a parameter.

Probability

The probability in machine learning is the chance of some event to occur. It is a measure or it signifies the chance of something to happen. It can be understood as the degree of certainty or uncertainty for some event to happen.

The Probability value ranges from 0 to 1, where 0 means there is no probability for the event to happen and 1 means there is a very high (100% sure) chance of the event to happen.

In the case of probability, we go from the parameters to the event. Meaning that while calculating the probability, we have the parameters and we try to find the probability of an event to happen.

So the probability can be written as:

Probability = F(Event | Parameters)

Note that here the event may be changed, but the parameters will be fixed, we can not change it.

The probability is the measure in the dataset which is calculated before the event happens.

Normally, the formula of the probability is:

Probability = Number of Favourable Outcomes / Total Number of outcomes

Formula of Probability

Mostly, there are two widely known and used distributions for which the probability is calculated, which are Bernoulli and normal distribution. However, for other distributions, the formulas vary which a data analyst or data scientist should take care of.

1. Bernoulli Distriobution

In case we have a categorical variable with two possible outcomes, we will have a Bernoulli distribution of the variable. The formula of the probability for Bernoulli distribution is:

2. Normal Distribution

In case, we have numerical or continuous variables, and the variable follows the normal distribution, then the formula for the probability will be:

Probability Example

1. Bernoulli Distribution Example

Let us suppose we have a fair coin, and we toss it. Now we know that here there are two possible outcomes and hence the probability of getting heads and tails will be equal, which is 0.5

So in the normal way, the formula for getting heads is:

P(H) = 0.5, this can be considered as the parameter.

And so, P(T) = 1 – P(H) = 1 – 0.5 = 0.5, this can be considered as the probability calculated from the parameter.

The same thing can also be calculated using the formula mentioned above.

So in this example:

Probability = F(Parameters | Event)

Here, if we consider the P(H) a parameter and calculate the P(T) an event, the probability function for this example can be written as:

Probability = F(P(H) | P(T))

The probability can be calculated using the formulas discussed above.

2. Normal Distribution Example

Let us suppose, we have a dataset of 100 people and their heights ranging from 0 to 200. Now if we randomly select one person from the dataset, then the probability of getting the person with a height of 100 or 110, or any other height value can be calculated using the above formula.

Here the height variable will follow the normal distribution, the parameters for this example would be the mean and the standard deviation of the distribution.

So here,

Parameters = Mean and Standard deviation

Event = Calculating the probability of getting a person with a height of 100

Probability = F(h=100 | mean, sd)

The probability can be calculated using the formulas discussed above.

Likelihood

The likelihood can be defined as the probability of observing a specific set of data given the parameters of the model or the hypothesis. It is a measure, which gives us an idea about the goodness or reliability of the parameters that we are selecting.

It basically signifies the plausibility or the correctness or the trueness of the parameters in the model. Its values give us an idea about how to correct the values of the parameters suited to the data and the model.

With the help of likelihood, we can easily understand the plausibility of the parameters and we can tune the same.

The value of the likelihood can be from 0 to any positive number. Higher the value of the likelihood, the higher the plausibility or reliability of the parameters for the data. Note that we always want higher values of the likelihood as it denotes that the parameters suit well to the model.

The likelihood is calculated after the vent happens, and here the event is kept constant and the parameter values are changed in order to get the best-fit values of the parameters according to the model and dataset.

In the case of likelihood, we go from event to parameters (justification), meaning that here the event has already happened and we have the values of the parameters, but we check the reliability of the plausibility of the variables.

In short, here by calculating the likelihood we question the values of the parameters, or we try to justify the values of the parameters by the likelihood. The higher the likelihood, the changes required to the values of the parameters, and vice versa.

So the likelihood can be written as:

Likelihood = F(Parameters | Event)

Here, note that the parameters will change, but the event will be kept constant all the time.

Formulas for Likelihood

Mostly, there are two widely known and used distributions for which the likelihood is calculated, which are Bernoulli and normal distribution. However, for other distributions, the formulas vary which a data analyst or data scientist should take care of.

1. Bernoulli Distribution

If the dataset or the variable is having only two possible outcomes, it will follow the Bernoulli distribution, and the formula for likelihood will be:

2. Normal Distribution

If the dataset or the variable is continuous or numerical, it can have any value and if it follows the normal distribution, then the form,ula for the likelihood will be:

Example of Likelihood

1. Bernoulli Distribution

Let us suppose we have a fair coin and we are tossing it five times. And we observe the following pattern:

HHTHH = Heads, Heads, Tails, Heads, Heads

Here we can see that we are getting heads 4 times and tails only 1 time, but according to the value of our parameter which is P(H) = 0.5, the probability of getting heads and tails are equal, but the same is not observed in the event (five observations).

So here the obvious thing we can think is that the coin may not be fair, there is a chance that the coin is fake and there can be even5t more probability of getting heads and less probability of getting tails.

So to question the parameters, we calculate the probability of getting tails by changing the values of the parameters.

So here, the likelihood will be:

Likelihood = F(P(H) | P(T))

So according to the formula, the likelihood will be:

Likelihood = 0.5*0.5*0.5*0.5*0.5

The likelihood can also be calculated using the formulas discussed above.

Here, we can see that the likelihood of this pattern is very less, meaning that there is very less chance to observe such a pattern from the data set, given the value of the parameter is 0.5

So in simple words, for this example, the likelihood can be considered as the measure using which we check the plausibility of the parameters. It is the probability of observing the data pairs or data patterns given the parameter values.

Note that, here we toss the coin five times, the likelihood can also be calculated for a single observation, but according to the formula, the likelihood for a single observation will be the same as the probability, as the form,ula for both is the same, just in likelihood the probabilities of different observations are multiplied.

2. Normal Distribution

Let us suppose that we have a dataset of people’s heights ranging from 0 to 200. Now we will assume that the dataset follows a normal distribution and hence it will have parameter mean and standard deviation.

Suppose the mean of the dataset is 150 and the standard deviation is 50.

Now the likelihood in this case will help us identify how good fit the parameters are to the dataset, or what is the plausibility of the mean and standard deviation according to the data.

Let us suppose we pick a random person from the dataset and is of height 100.

So here in this case, The likelihood can be defined as the probability of getting a person of 100 height with a given parameters mean of 150 and a standard deviation of 25.

Note that the higher the value of likelihood, the better fit the values parameters are to the dataset.

Likelihood = ( h = 100 | mean, standard deviation)

The likelihood for this example can be calculated with the formulas discussed above.

Now let us discuss the key differences between probability and likelihood in the next section.

Key Differences

There are various key differences where the probability and the likelihood differ. Now let us discuss those differences one by one to clear the idea behind them.

1. Measure

The probability is the measure of how likely the event will happen, whereas the likelihood is the measure of how the parameters of the model, fit the data and it focuses on the justification of the parameters according to the model and data.

2. Ranges

The value of the probability can range from 0 to 1, where 0 means the event will not happen surely and 1 means the event will surely happen. In contrast, the value of likelihood ranges from 0 to any positive number, where the higher the value of likelihood, the better the fit of parameters to the model.

3. Calculations

The probability is the measure that is calculated before the event happens, meaning that here we have the parameters and we calculate the probability of the event to happen before it has happened. In the case of likelihood, it is calculated after the event happens. Here we have the event that happened and the parameters of the model and we try to justify its values to the model.

4. Functions

The probability is the function of the event, which means that here the values of the parameters of the model are kept constant, and different probabilities of events are calculated. In the case of likelihood, it is the function of data and the model itself. Here the event is kept constant and the values of parameters are changed to get the best-fit values of the same.

5. Roles

The key role of probability is to give an idea about the chance of something or some event to happen or predict the outcomes, whereas the role of likelihood is to give us an idea about the fit of parameters and the model. It can help in setting best-fit parameters according to the data and the model.

6. Usecase

The probability can be used to make decisions, whereas the likelihood can mainly be used to test the hypothesis. For example, the probability can help us in making a decision whether to do this or not, whereas the likelihood can help us test the hypothesis like the coin is biased or not.

7. Complexity

Probability is a relatively less complex concept to understand, interpret and use, whereas likelihood is a more complex process compared to probability as it involves the calculation of the best-fit parameters, which is the function of the model, which can be very tedious to understand.

Now let us discuss the key points to remember from this article.

Key Takeaways

1. Probability is the measure of the chance of something happening or some event occurring. It can help us in getting or predicting some outcomes based on the probabilities.

2. The likelihood is the measure of the fitness of parameters to the model. It gives us an idea of how well the parameters fit the data and the model. It helps in getting the best-fit parameters of the model.

3. While calculating the probability, the event changes but the parameters are kept constant, whereas, in the case of likelihood, the parameters change but the event is kept constant.

4. The probability is calculated before the event happens and the likelihood is calculated once the event has already happened.

5. The value of probability can range from 0 to 1, whereas the value of likelihood ranges from 0 to any positive number.

6. Probability is mainly used in the decision-making process whereas likelihood is used in testing the hypothesis of the model.

7. The probability is a relatively less complex process to understand, whereas the likelihood is more complex to understand and interpret.

Conclusion

In this article, we discussed probability and likelihood, the core intuition behind them, their formulas, how they are calculated, and the key differences between them with examples.

Conclusion of this article, the probability can be referred to as the likeliness (chance) of the event to happen, whereas the likelihood can be understood as the parameters which is the measure of the parameter’s fitness to the model, which can be used to get the best-fit parameters that suit the model. These both can be thought of as the opposite sides of the same tunnel.

This article will help one to understand the core idea behind probability and likelihood, their calculator and their formulas, and the key differences between them where they differ with examples. It will also help one to use the same concepts while building intelligent models in AI.

What is an Event?

Example of an Event

What are the Parameters?

Example of Parameters

Probability