Confidence Interval — Helps better prediction in Machine Learning

12 min readOct 18, 2024

We know that for estimating mean of the entire population we take a sample and calculate sample mean. Our aim is to make sample mean and population mean as close as possible. Even if you take measures such as sample is identically distributed and they are independent, we get different sample mean every time. There is always certain degree of uncertainty about how accurate it actually is.

Can you use a sample mean with at least some degree of certainty? Statisticians do this using something called Confidence Interval.

Confidence Interval is an interval of values that is a lower and upper limit which contain the population parameter.

Imagine your day you’re walking along a road to visit your friend. When you get to your friend’s house, you realize you’ve dropped your keys somewhere along the road. That’s okay, your friend says, I’ll help you look for it. You get in your friend’s car and then drive back out to the road where you dropped your key. While you’re driving over, you and your friend talk about how you’ll look for your key. First, you’ll park the car at your best guess of where you dropped it. Then you’ll both walk the same search distance in either direction along the road to look for the key, you need to decide this distance before your search so that you know you’ll meet back at the car at the same time. You could always choose a different place to park the car, and that will shift the entire range of the road you’ll search. But where you park is always going to be a guess. The thing that will have the biggest impact on whether you find the key will be your search distance. So the big question is, how large should your search distance be? To decide your search distance, you think about how confident you need to be that you will actually find the key, which you’ll express as a percentage.

Let’s say you think with this search distance you’re 80% confidence you’ll find the key. What if you left your oven on? Then you really need to find that key. Well, then you’d need to agree to a bigger search distance. With a search distance this big, you’re 95% confident you’ll find the key. Of course, you’ll also have to search a much larger portion of the road. Let’s say you only have a little bit of time to look. Well, you could always shrink your search distance, but your confidence level is also going to drop. There’s a tradeoff here. If you want a higher level of confidence, you’re going to need a larger search distance. You can think of a section of the road you and your friend will search as a single interval with a lower and upper limit. That said, I think it’s really important to remember the way the interval was constructed. At the middle is your best guess of the true location of the key, and then you add a buffer on each side to account for the fact that you know your guess is probably wrong. That said, this orange line is the overall confidence interval. I use this example from the physical world because it helps reinforce an idea that can be tricky to remember when you’re looking at probability distributions. The key never moves, its location is unknown but fixed.

Let us assume our data is normally distributed with mean μ and variance σ² which for now is unknown. Let us consider first the variance is known and mean is unknown. Let us consider sample of 1 data randomly and calculate sample mean which is x̄. Let us create random variable X̄ to describe the probability of selecting different sample means. The mean of x and X will be same. How far are most sample means from the true population mean μ?

After all, if your sample is typical, it will also be within that range. To do this, you need two related concepts. You will need a margin of error, which is a distance on either side of μ, and a confidence level, which is the probability that your sample mean is within that margin of error. To set these two values, you actually normally start with a third one called the significance level, which is denoted with alpha(α).

Significance Level is the probability that your sample mean falls outside your margin of error. From α you’ll calculate your confidence level, which is 1-α. This is the probability that a randomly generated sample mean is within the margin of error.

Since alpha is usually a small number, one minus alpha is usually close to one.

Intuition

We know that the mean of the sample is always equal to the population mean. No matter, how many samples you pick and the standard deviation of sample is same as standard deviation divided by the square root of number of samples. If sample size increases, the mean always remains same. But the standard deviation decreases which is the spread as the n values increases which means the margin error narrows down. With larger samples, we are near the population mean and increase accuracy.

As the sample size increases, the confidence interval narrows down

To Summarize,

Confidence Intervals are a sample mean with a margin of error added to each side.
Confidence Level: the probability a confidence interval contains μ ( for example 95%)
Ideally you have both high confidence and a narrow interval.
Larger samples (more data) will give a narrower interval
Decreasing confidence level will also shrink the interval.

Margin of Error

To calculate the confidence interval, we need 2 ingredients. One is sample mean and other is Margin of Error. the margin of error is determined by the size of the sample you collected and the confidence level you’re aiming for. Once you have your margin of error, you add and subtract it to your sample mean in order to get the lower and upper limit of your confidence interval. As we already know how to calculate the sample mean. Let us see how we can get Margin of Error.

Suppose we have to calculate average height of the population. To do that, you take a sample of size n and calculate the sample mean x bar. You would measure the heights of n people and calculate their average height. You know that sample mean probably isn’t exactly the average height of the overall population, but it’s also probably close. Your goal is to quantify how close. If you randomly happen to choose many short people, x̄ will be lower than μ.

If you randomly happen to choose many tall people, x̄ will be higher than μ. X̄ is itself a random variable. It will be normally distributed with a mean of μ and a variance of σ²/n. The distribution is centered around μ and as you take larger samples, the spread of your distribution shrinks, increasing the probability your sample mean is closer to μ.

We do know that the normal distribution the curve lies within 1 standard deviation on either side of mean is roughly 68% and 2 standard deviation is 95%. The Z-score is the amount of standard deviations

The name z-score is taken from the standard normal distribution, which is often called the z-distribution. You can easily convert normal distributions to the standard normal by subtracting the mean and dividing by the standard deviation.

The z-distribution has a mean of 0 and a variance of 1.

when using the z-distribution, the z-scores are simply the value of the point. If you want exactly 95% of the distribution, then the z-scores to two decimal places are minus 1.96 and positive 1.96. If you randomly sample from the standard normal distribution, 95% of the time the result will be inside of that range. 5% of the time your sample will fall outside of this range.

Negative 1.96 and positive 1.96 are called critical values. They are cut-off points inside of which an exact percentage of a probability distribution is contained. To find these values, you’d either look them up in a pre-computed lookup table or use a software library. This isn’t something you would calculate by hand.

The first critical value is z.025. This notation means find me the Z score of the point which has 2.5% of the distribution curve to its left. This shaded region has a probability of 0.025. From your lookup table or software library you would get the result of -1.96. The second critical value is Z.975.

you want your critical values to exclude 5% of the distribution. This is just the significance level. In this case, it is equal to 0.05.That means this left critical value is z of alpha divided by 2, and the right critical value is z of 1 minus alpha divided by 2. The same holds if α is 0.Your critical values are now at z. 05 and z. 95, which in this case would be the z-scores -1.65 and + 1. Between these two values lies 90% of the distribution.

What about non-standard normal distribution?

You can still use these critical values, but since the distribution isn’t normalized, you need to multiply them by the standard deviation. In this case, 95% of the distribution lies within 1.96 standard deviations of the mean. So you just multiply that critical value by σ. You’re finally ready to calculate your margins of error. You know that x̄ has a normal distribution centered around μ and with variance σ²/ n, the number of observations in your sample. This means the standard deviation is just σ, the population σ/√n. This is also referred to as the standard error. So let’s update the range of values accordingly to use this standard deviation. You now know the range of values that will contain 95% of all sample means. And from there, you can get your margins of error, which will just be 1.96 times your σ/√n.

If you had chosen a different value of α, you’ll have a different margin of error. You’ll look up z of (1-α) /2 and multiply that by your standard error.

Our goal is to find the upper limit and lower limit of the sample mean

We know that the x̄ will fall with in the range shown below

Subtract μ on both side and in middle we get

Now subtract x̄ and multiply by -1

So the confidence interval would be

Calculation Steps of Confidence Interval

Find the sample mean x̄
Define the desired confidence level ( 1- α)
Get the critical value z of 1-α/2 . For 95% confidence it would be approx 1.96
Find the standard error σ/√n
Find the margin of error
Finally Add/Subtract the margin of error to the sample mean to obtain desired confidence interval

There are some assumptions to be considered during this calculation

Simple random sample
Sample size > 30 or population is approximately normal.

Difference between Confidence and Probability

you compute a sample estimate x̄ and you calculate a confidence interval of 95% and in your interpretation, you say this the confidence interval contains the true population parameter 95% of the time. You could be correct. Now, what if you said there is a 95% probability that the population parameter falls within the confidence interval? You would be wrong.

There’s a subtle difference between these two and we’re going to see what it is. So consider the population parameter μ, which is a population mean.

A defining characteristic of μ is that it’s fixed but it’s unknown is the one we’re trying to estimate. And note that μ does not have a probability distribution because it is not random, it’s just unknown. So it’s always the same value for a given population. And because μ is fixed for a given interval, it is in the interval or it is not in the interval. It does not change. So it’s not going to fall within a specified interval 95% of the time or not. The sample mean, on the other hand, has a probability distribution. The sampling distribution of the sample means this value changes given the sample taken. So the concept of the confidence interval is tied to the sample mean and it changes depending on the value of the sample mean. Saying that you’re 95% confident has to do with repeating the sampling experiment many times and calculating the intervals for each sample estimate. 95% of the time those confidence intervals will contain the mean.

This confidence level has to do with the success rate of constructing the confidence interval. It is not the probability that a specific interval contains the population mean because as we’ve seen, the population mean is either on the interval or not. So this is something that is very subtle, but it needs to be clarified.

What if our assumption is wrong that we know the standard deviation?

We have found that if we know the standard deviation the formula for confidence interval would be

And this is because of this normal property that we’re able to use the critical value one minus z of alpha over two. However, more often than not, we do not know this standard deviation of the population. And that’s a problem because we can’t use that sigma in the formula. So what do we do in this case if we want to have a confidence interval?

we have to use the actual sample standard deviation, s. So we can’t use the population standard deviation, but we took a sample and we can use the sample standard deviation, s. This gives us this sampling distribution where we simply change the sigma into an s. However, this quantity is no longer a normal distribution. Now it’s a different distribution. It’s called the student’s t distribution.

The orange dotted line is the student’s t distribution where as blue one would be the normal distribution. The student t distribution looks like this. It’s very similar to the normal distribution, but it has much fatter tails, meaning that more of the points can be found on these sides compared to the normal distribution. If you were to sample a point out of the student t distribution, it’s more likely to be far from the center than if you pick it from the normal distribution. So how do we then adjust for this variability in the distribution to still give us some more accurate confidence intervals?

Since we are no using z distribution any more we would replace this with t distribution and we get t score instead of z score. we use a t score to calculate the margin of error to fix this scaling issue.

We will also use the degree of freedom which is n-1.

As the n increases which is the sample, the closer the distribution reaches the normal distribution of the population.

Confidence Interval of Proportions

How do we calculate the proportion of populations satisfies certain condition. We take the sample and calculate how many of them satisfies the condition. For example, how many of them owns a car. We cannot calculate for the entire population, so we take a sample of 30 out of which 24 has the car. So the Sample Proportion p̂ is given by 24 / 30 which is 80% of the sample owns it. So How can we calculate the confidence interval of the proportion?

The confidence interval is the sample proportion +/- margin of error. Where the margin of error is given by critical value times the square root of p̂ times (1-p̂)/n. Obviously, the critical value will depend of the confidence interval you use. So the major difference is in the standard error and how it is calculated.

Happy Learning!!

References : DeepLearning.ai