This lecture is part of a course on probability estimation vary and random signals. Welcome back to lecture slide set on multiple random variables. In this video, we're going to look at the central limit theorem and it's highly likely that you've come across the central limit theorem and a number of courses. The central limit theorem is a key concept in probability and statistics. Because in many situations, a random variable can be expressed as increments. Or combinations are combinations of other random variables. As a result, under certain conditions. If phase incremental elements each have the same distribution. The central limit theorem tells us that the overall effect of the some of A's increments is Gaussian distributed. There are also other versions are central in my firm. For example, Wab increments are not independent. Nevertheless, let's just motivate the central limit theorem by considering the example which is actually in one of the previous self-study questions. Let's suppose that we have for IID random variables denoted by X k for k is one to four. And by each of these random variables, a uniformly distributed between minus a half plus a half, as shown in this diagram at the bottom. This will be f of x k. I've dropped the subscript just for convenience. Now herself study question as to compete and plot the probability density functions of a sum of those random variables. So Y two is equal to x1 plus x2. And y three is defined as y, y2, x3, and y four is x three plus X4. And in general, you could continue this. So but y n equals y m minus one plus x n. And you can see that effectively y m is a previous YN plus an increment. Now in order to solve this problem, we can use the techniques from the topic on the sum of independent random variables. But as shown on this handout, where we can see that the sum of two random variables is indeed given by the convolution of a probability density functions of each of those random variables. So if we're summing independent an identical random variables that were effectively summing of a random variable with itself. So using this convolution result for the sum of independent random variables, we see the pdf of y. Y2 is a PDF of X, as shown in this diagram, convolved with itself. Whereas a PDF of x three is a resulting convolution from the previous step. Again, convolved a rectangular pulse and you can carry thy son. So Y4 is accumulation. Why free with this rectangular pulse? Now what is an exercise for you to do? And I don't intend to cover in this video. But as a starting point, we do know that if you convolve a rectangular pulse with a rectangular pulse, we know Eva, you get a triangular distribution. So indeed. Convolution calculations for y, y2 does yield this triangular pulse here that is described by this analytical form here. So to get Y free, we now convolve this triangular distribution. We have a rectangular distribution again. So to get the PDF for why free, we're now going to convolve this triangular function with the original uniform distribution for f of x. Now just as a hint for how to do these questions, notice a FY y2 here is described by effectively for legions. What happens when y is less than minus one and when y is greater than r1 and r2 legions in-between, Whitney now convolved with a finite duration pulse such as f of x here, the resulting number of legions effectively increases by one. And that's because you must imagine that the, this rectangular pulse is overlapping different regions at different points in time. Anyway, if you work through that calculation, you find that the probability density function for y f3 is now the piecewise combination of free different quadratic terms in y for these different regions here. And Zara outside that region. You'll notice that because the extent or the region where f of y, y2 is non-zero, range from minus one to plus one. And the extend to the region of non-zero PDF of uniforms between minus a half plus a half. Then the region of extend to region of interest extends now from minus one minus a half or minus 1.5 to one plus a half. This is what the function looks like. Now again, you combine convolve S where V uniform distribution to again get the probability density function for y four nanometers. My comment about the number of regions increasing. So we had five distinct regions here bearing in mind that the region for why the MSS minus 1.5 and y being greater than 1.5, a effectively trisect religions. So now we increase the number of regions by number one. So you can see that there are four non-zero regions on Tuesday regions. But we now have key the equations. And obviously the order of the polynomial is going to go up one each time. But the most fascinating thing about this is that in this diagram here in the black line, this is f of y four. But if you were to plot in the thin blue line here, a Gaussian approximation, then the probability density function for the closest Gaussian fits fat and for the actual Polynomial Form is incredibly clerks natures. However, the PDF Y4 Alien actually goes between minus two and plus two, which makes sense because if you're adding together for uniform random variables again between minus half and plus a half. Then adding four ways means that you can only reach range minus two to plus two. But of course the Gaussian continues because the Gaussian has an infinite to extend it between minus infinity and infinity. So this really interesting result effectively says that if you add together for uniform random variables, then the probability distribution of a resulting random variable is almost Gaussian and you've only had to add for uniform random variables together. And this is the central limit theorem. So the central limit theorem does tell us if you keep adding together independent and identically distributed random variables, then the resulting random variable is Gaussian distributed. So just to verify this result numerically, I am going to show you how to do this in MATLAB. Now I pasted this code on learns he can go and experiment with it yourself. So let me just run you through MATLAB code to first-line clears or variables. That's just good practice when you want to make sure that the code is repeatable. Causal figures, that's also just there he sought to do. Now N equal to four is the number of variables that we're going to add together. And is a 100 thousand isn't Monte Carlo experiment. And the reason we're going to do is many times is because if I just do this once and you just end up with one observation of the sum of the four random variables. But that doesn't allow you to build up an empirical histogram. So you have to than this many times. So I have a line y equals zeros of N. One is just clearing memory space. That's always involved quite useful. And then we're going to use the built-in function called make dest, which makes the distribution. So we're going to make a distribution with a uniform London Barrett. And the parameters of this uniform between minus half and plus half. I will comeback to line ten and environment and ask them to do redistribution of Rayleigh distribution. So lines 12 to 15, what that's going to do is actually just to plot some uniform random numbers. We're going to plot n equal to a 100 thousand of them. Um, but it's just to check that we're generating the random variables as intended. Now the key lines, line 17 to 20. So here I'm doing my Monte Carlo experiment. For each case, I'm going to generate four random variables according to the uniform distribution, which is in this probability distribution, p d, e. If you want to know more about this courage issues, a help function in MATLAB. And we're going to add together those four random variables after generators have a loop does at a 100 thousand times. And that's to make sure that we can build up an empirical histogram. Lines 20 to 24 plop this histogram. So if I were to run this code, then we generate two figures in this graph is an empirical histogram, prompted deserts and probability density function. That means that the histogram counts have been normalized by the bin width unstable. And you can see that it's approximate uniform. And I was not perfectly uniform because they're any generated a 100 thousand samples. But it's, it's close to being flat. Now my second figure is plotting the histogram of the random variable Y, which is the sum of those four. And it's done best with a 100 thousand realizations that we've actually generated. You can see that this is, this is very Gaussian like in shape, is 0 mean and the extent of redistribution disappear to go from roughly minus two to plus two. And that really just verifies the calculations that we have done on the slides. So let's turn back to have a formal central limit firm. So an informal interpretation is that if you consider a random variable, y is being given by the sum of all of these individual independent random variables, then the distribution of y of n as n tends to infinity is usually considered just to be a Gaussian. That's very informal phase, but there is a problem with being a little bit too informal. So let's consider the statistics of this random variable y of n. So let's assume, but why I've EMS or IID and at the mean and variance of y of n, a finite and a given by mu x and sigma x squared respectively. Now the mean of y n, So the expected value of Y m is the expected value of this sum, with sum from m is one to capital M of x of n. But using linearity as usual, for the expectation, we can interchange the order of expectation and summation. And we get the sum of the expected value of x of n. But the expected value of x of n is mu x. So that means that the mean of y is n times the mean of X. But what happens to that mean as n tends to infinity, where mu X was finite, but n times to infinity. So now our mean has shifted to infinity. Similarly, the variance of y n is given by the variance of this sum. Now wonderful results for I haven't discussed in previous videos but is worth mentioning now, is that the variance of a sum of independent random variables becomes a sum of the variances. So the variance of a plus b becomes a valence Vae plus variance B. Only. If a and B are independent, you can prove it yourself. It's quite straightforward. I'm quite happy for you to use that results is as long as you know when applies. Now the variance of X of m, we're told is sigma x squared. So we see it at variance of Y is again n times variance of X. So again, as n tends to infinity, then the variance of y tends to infinity as well. So whilst im formerly the Central Limit is well-known to gave a gaussian. We need to be careful because in this case, the pdf of Y is Gaussian, but also it's got infinite variance and infinite mean. And that causes some conceptual difficulties. So it's easier to deal with central limit theorem by considering a normalized random variable. So let x of k between k is one and n, the collection of random variables that are independent and identically distributed for all k. And what we're going to do is define a normalized random variable, y hat, which is the sum of the individual random variables Hawaiian. But we've subtracted from it the mean of y m and also normalized by the standard deviation. So as we know from the previous side, mean of y m is m times the mean of x. And similarly for the valence. Now in that case, the pdf of y of n approaches a normal distribution as m tends to infinity, but in this case with 0 mean and unit variance. And why is a 0 mean and unit variance? While the expectation of y hat of m is the expected value of Y N minus the mean of y m over sigma y. And of course, these two are equivalent so that equals a. And similarly, if you were to do the same calculation with variance of Y MHA, you'd be able to share that a equals one. So that's central limit theorem result is incredibly important. Also some interesting caveats which we will have a look at in a moment. Now just before I continue, if I were to return to the example we had at the start of this video, we can see that the expected value of Y M fill our IID uniform random variables is equal to 0. Now the variance of x of n is the variance of uniform. And if you were to remind yourself of what that was from the lecture notes is a minus B squared and 12, where a and B over different limits. So in this case you can see that is 1 12th. So therefore the variance of y m is four times 12, which gives you one foot. By explains why this Gaussian shape here at 0 mean and a variance of 1. Third, now to demonstrate this is a case, let's return to our carrier that we had earlier for instead of using uniform distribution what we've already tried. Yes, just demonstrate it with of a distribution. So here's a Rayleigh distribution. So Rayleigh distribution is positive only and is of the form x e to the minus x squared. And if I were to oneness carried is a shape of a Rayleigh distribution. And we can see that it's positive only. And here's where some of for Rayleigh distributions. And you can see again that in fact is almost Gaussian. If I had even more samples, that would end up being even more Gaussian like just around Office video on central limit firm, I'm going to give an outline, sketch of a proof. And that's because depending what area you go into issues for to know where these results come from. Whilst the central limit theorem has been observed in practice, equally, it needs a theoretical foundation. And so these results do come from somewhere. And it turns out that we are equipped to actually derive these results. Let me show you how. So in the first stage or a proof is to create a normalized version of a if each random variable. So recall the set of k, and that is the original random variable X K minus mean divided by the standard deviation. That means the mean of z k is ever and variances I k is equal to one. And this is the case for all of z k's because all the x k's are identically distributed. So an alternative way of writing with some, but we had on the previous slide is v some of his i k's alone with a one over square root of M are different. They are equivalent results and I encourage you to go and check phase. Now we're going to prove this using characteristic functions. Now E is characteristic functions. I'm going to use a simple result, but I actually described in a previous video. And that's if you scale one of the random variables. So let's take two random variables, U and V. Let suppose that the random variable U has a characteristic function is given by psi of, of you. Then the characteristic function for v is expected value of e to the j psi AU. And that can easily be written as a characteristic function review. But wave div AB will be replaced by a times I say that is base Kiva scale in time failed them from Fourier fairly. So let's return to this normalized sample that we had in the previous slides. So from the previous video, we know that if we have a sum of independent random variables, then the characteristic function of a sudden. Product of the characteristic functions. That's because a pdf of y hat is convolutions of a PDFs of that case. And we know that if you take the Fourier transform or convolutions that becomes products. Because we've got the scaling term in front, we need to use a scaling result for characteristic functions on a previous slide, but simply gives us this result here. Now, the next key step is that because of random variables z, k's are all identically distributed. But not immediately just simplifies to being the characteristic function of one of m raised to the power of m. And that's the result we have here. So, so far we've got the characteristic function of a sum of all these random variables is characteristic function of 12V independent, identically distributed random variables, wasted power then how do we move forward? But at this point, this is where you start using some expansions or some alternative definition. So characteristic functions. So on this slide, which is just reordered to slightly, we're starting off with this result here. But we notice that by definition, the characteristic function or zed is expected value of each of the J XYZ add. And I'm going to apply an expansion to the term e to the j feature, which is one feta plus V2 squared over two factorial plus theta cubed over three factorial plus dot, dot, dot, dot. And in general is given by the expression p two to the n over n factorial, and that's between 0 and infinity. And that's quite standard result. So in this case, feta is Js I, Z, and say reapply the expansion of e to the j feature to get this expression here. But notice that really it's the expectation of a sudden and save out becomes the sum of an expectation that I've effectively applied for linear property again. So this is the infinite summation of a j is i to the n times n factorial at times amendments every subexpression we did see in the chapter on scalar random variables because this is exactly how we obtain moments from the characteristic function. Now we take this expansion and we apply it into the expression for the characteristic function of a sum, which means it is the expansion of the characteristic function of Z raised to the power of n. And in this subsequent line, I've substituted in for the moment. So we've got mu zed and we got sigma z squared. And I've just written out the first two terms and then know a higher-order terms. Now remember, M is going to be large. And therefore, when you're dividing by the square root of M, these higher-order terms are going to get much, much smaller than these low order terms, and that's why we can discard one. Now the next step is to notice that mu zed is equal to 0 by definition, because of the way we normalized available and sigma z is equal to one. So that gives us the characteristic function is this expression here. Now, button looks as a, is not particularly manageable. But in fact it is because it turns out by another identity we can use is as following limit here. And this is proved using other forms of CalPERS as a standard result in mathematics. But it says, but if you have a expansion, one plus x over n to the n, then in the limit as n goes to infinity equals edX. So looking at this expression, we can ignore these higher order terms. And then effectively in this expression, little n is equal to capital M. Vx here is eater is equal to xy squared or minus i squared over two. Therefore you can see, but the limiting operation here is indeed, is indeed e to the minus half psi squared. Now this expression here is actually the characteristic function of a normal distribution with 0 mean and unit variance. And you could verify that result because we determine the characteristic function of a multivariate Gaussian in a previous video. So you could return to that. And so that is effectively reprove. Now this is quite an informal proof, anvil awesome technicalities of what we've done here, draw even more formal proofs. So perhaps get around some of these limiting operations on I encourage you to go read those if you interested. And as I mentioned at the start of a video, this is just one form of central limit firm. And there are many, many of us which you can have a look at. So if you want to verify the central limit, Farrah more, feel free to go back comparative his code. So for example, as a final example, let's try and exponential. Now four of a functions you will need to change how many random VAB, which I'd together sahib, I've chosen M is a hundred, four hundred exponentials. And if we run that, and you can see that for the exponential distribution on the left-hand side, that's the empirical, that's one of the random variables, but with some of a 100 random variables as shown here, is indeed Gaussian in shape because I chose an exponential premature free than Grameen investors at 300. So you can go and play with that a little bit more. So in summary, this video has discussed the central limbic firm. It showed how it works for adding just you have a for uniform random variables. But also we've looked at what happens if you add random variables of other distributions together or something up exponentially or you need to add many more than four to Kepler Gaussian shape. But nevertheless, you don't really need send m to infinity. I was enjoined MFN. It, it can be reasonable finite vi on learn you can go and play with MATLAB carried to investigate central limit theorem. And finally, in this video, I've given an outline sketch of a proof so that you can see basically where some of his theory comes home. So thank you very much.