This lecture is part of a course on probability estimation vary and random signals. Hello, welcome back to this lecture slide set on stochastic processes. In this topic, we are going to cover some remaining topics in this chapter on stochastic processes. We're going to look at joint statistics, a cross-correlation and cross covariance methods. We're going to look at representing a random process using a random vector. So that's where we take a finite number of samples and concatenate them into a vector. And this will lead to cost correlation matrices and auto correlation matrices and so on. So let's start off with joint signal statistics. So b is a natural extensions of the statistics that we came up with when we looked at multiple random variables. They look at the correlations and covariances between two different processes rather than the same process. So to explain this, let's first of all consider v cross correlation function. So here I'm going to sketch a number of arbitrary realisations of a process. Now these are just drawn in continuous time on completely arbitrary there really just squiggly lines. And this is really quite hard to draw too many squiggly line. So we just draw our best possible approximation. And is simply saying, what's the correlation between the random signal x at time index n one and the values of the second random signal y at time index n2. So the easiest way of thinking about this is to imagine, but we've actually got pairs of random processes. So for each particular outcomes a z two here you got x of n, n1 and zeta times y conjugate of N2 and zeta. And that the effectively so Pad realisations which come from the random experiment. Remember we normally draw sample space and for the different outcomes you end up with the different realisations. Now what we're doing is we're looking at a particular time index, n, n1 through the ensemble of random process x. And similarly a different time index in this case. And two, for the random process. Why? And then what we're doing is we're matching the values or the product of the values across the pairs of realisation. So X at time index n one times y, a time index and two. And then we're taking the average of those pairs. So really it's an extension of the idea of auto-correlation or just the correlation function where you're looking at the same idea but you're doing it within the same London process. So best cross correlation measures a statistical similarity of one process with another. So you're going to get a strong correlation if he's pairs of functions always coherently together so that the average can have a relatively large value. So I encourage you to think about the physical interpretation of his clocks correlation. But the intuitive one is that it simply measures the dependence between these two different separate processes. Now you might say, well, why, why am I looking at cross correlation in the first place? Well, what we can see in the next handout is that we're going to look at what happens the signals go through systems. So if I've got a signal x of n, And that's the input to the system. And I've got a signal at the output which is y. Then I'm mostly interested in what's the relationship between the input and the output of our system? And that's where straight away the idea of cost correlation comes into play. Now we can also extend this cross covariance. And just as we had with random vectors, we can save at Costco variance is equal to the cross correlation months product to the means. So that is the central moment and it's simply an extension of things that we've seen before. Now what we've also seen before is the idea of normalising the correlation functions. And we saw that back again in the chapter on multiple random vectors, well variables. That's because we know that your correlation function can end up being a large number, for example, simply because the different processes have large values in terms of amplitudes, say sometimes in order to get a meaningful measure of cross-correlation of statistical similarity, it makes sense to normalise this with respect to standard deviations. And so we end up with a function called normalised cross correlation function. And so you will see that coming up quite often. That means effectively this normalised cross correlation is bounded and this normalised cross correlation will have appropriates properties which result from that. I'm not going to cover that lie now. Now the idea of cross-correlation and costco advanced leads us to extend some of the ideas we've had for the stochastic processes we've had earlier, say, the idea statistical independence extends. Say, if you have two processes, X sub n and y, then, then these two random processes are statistically independent. If the joint PDF obeys two random processes factorises as being the joint pdf of one of the random variables. Random processes times a PDF of the random process. Now here it makes it look like my notation is relatively loose and I'm just considering two time instances for NX and M1. You can generalise this to a number of time points corresponding to n, x and a number of time points corresponding to y. So this definition can be extended as you need it. Now we can extend the definitions we've already seen for random processes to joint stochastic processes. So for example, to stochastic processes are uncorrelated. If I cross, covariance is equal to 0. And that's just an extension of something we've seen before. And equity, that also means lightly, cross correlation equals the product of the mean. And of course, when I say reporters to the mean, I mean, at each time instance I'm talking about the effect we need product. So v mean sequence. Now following on from of a fairy of 3D scene, it's the case that to stochastic processes say joint stochastic processes are statistically independent or uncorrelated. But that's not necessarily true, feeble way round. So if two random processes are uncorrelated, they are not necessarily independent. I'm the exception to that rule is for Gaussian random processes. Now this idea of F2 processes are statistically independent, implying that the cross correlation is equal to 0 is a very powerful tool. I'm, let me share an example of why. So back in the introduction, I introduced this idea of blind source separation. Here is a graphical representation of a standard so separation programme. But let me just go through and explain exactly what that is again. So the idea and blind source separation is that you have a number of speakers, a number. So signals. Now each of these so signals gaze through some kind of channel. Now in this case, it might be an acoustic channel. For example, there might be an impulse response which is between the sauce with us and speaking and the actual microphone. So this impulse response, which is against time, is denoted by this transfer function H11 of z. For example, it's a vast between source one, mike fame one. By the same time, the second source also appears to travel between source and the first Mike fame. So that case for a transfer function H to one. But it SaaS to also lives a microphone to and solves one also arrives at the microphone to you so you get this mixture. So if I had to listen to the signals at Mike FEM warn you would hear behind those three growls nine c. So that's two people speaking overlapping at the same time. I have a second McFerrin, you would hear this line. Those ground rules sees the O'Toole is, and the aim is simply to separate out these two signals. Because there are two people speaking every whichever by sound very similar, but was actually enough information in the two microphone signals in order to be able to separate those out. So we put them through what's called and I'm mixing system. Now, if we're able to choose B inverse transfer functions correctly. But for example, if we knew what h was, family will be able to set w equal to the inverse of H. I'm not gonna go into that detail now, but you can get that general idea. And we try and separate them out. Say for example, at the output service and mixing system has one sigma, 12345678910. Say Girish quatro cinco seis siete ocho nueve areas. Now the question is, how do we design with some mixing system? If we don't know what the mixing system was, what's the best way of designing it? So let's have a look at how we might solve that. Now one way to solve this problem is to assume that the signals that are spoken at the input to the system are statistically independent. So the assumption is that the statistics of the signals themselves probably don't depend on each other. So the statistics of one talker is very much different to the statistics of a second talk. Now when signals go through system has Fanny tend to introduce correlations, but you're all Definitely, what are you going to introduce? Cross correlations between the two signals, h over microfilms. So the idea would simply be to design and I'm mixing system where the statistics of the two signals at the output are independent of each other. And that's because we've assumed that the signals at the input two independent of each other, then you might suggest that at the input, perhaps two people speaking, they are not completely independent because EEG that part of the conversation, but we're looking at with signal level rather than at any high level, behavioural level, anything else? So looking at the signal level, looking at the signal statistics and that's why we're doing signal processing. Then we can save of a source signals are independent, but measuring independence is hot. And therefore, instead of trying to measure independence directly, which is possible using high order statistics and so forth. But a simpler method is to actually calculate the cross correlation or indeed the cross covariance. And the idea would be that if your signals are independent, then you would aim to minimise who cross covariance, because we know, or at least minimise magnitude of the cross covariance because we know that the cross covariance should be Zao independent signals. So what you do is you would have an initial guess to be. You will then calculate the cost correlation gamma x, y of l for the lags and they knew would adapt be as a result. Now with various optimization strategies for how you do the adaptation. But the idea is that you would basically go around this loop several times until you converge on minimising the cross covariance. Now this is a topic well beyond this lecture course, but it is an application that shows how useful cross covariance can actually be. Now blind source separation has been a problem, but it's been studied for several decades. So there's a lot known about this problem. But if you're really interested in knowing more about blind source O'Brian signal separation, then you might consider projected, for example, statistical signal processing. You're just swiping up on different types of joining stochastic processes. We have natural extensions from a definition of orthogonal processes, ones where the cost correlation is equal to x2. Say we can extend the definition of joint wide sense stationary today are with assuming that the cross correlation on a cross covariance only depends on the lag l. And this is very useful both because it allows us to estimate the signals from single realisations using ergodicity, which I'll mention in a moment. But also because as we'll see in a future handout, it allows us to analyse signals going through systems as well as a spectral description of joints, stochastic processes. We also have the idea of joint ergodicity. And but as I just mentioned a moment ago, allows us, for example, to get cross correlation and cross covariance to be estimated from a time average. So all of these definitions are really just very natural extensions of what we did in the previous topics. Now, moving on to a separate topic relating some of the ideas that we've been discussing in the past. You handouts, we can introduce the idea of a correlation matrix. So that starts off by considering a snapshot of samples from a random process. So what we can do is we can define an M dimensional random vector. I'm going to use a capital letter here to denote random vectors. That will cause confusion later when we start dealing with spectral properties of random processes. But for the moment, let's just treat it as a capital letter. And what we've done is we've taken the past samples of our London process. So I've witnesses very vector with a transpose. So you can see straight away that what were they able to do is to start defining things such as correlation matrix as being the expectation of that random vector times itself transpose. And that's just a very natural extension of what we had in the chapter, or multiple random vectors, multiple random variables. We can also define the mean vector as being the expected value of that random vector x. Now why would we do that to reason for introducing this random vector aspect of stochastic processes is because it will help us define some algorithms where we only need to take a finite number of past samples of our random process. It just becomes a very convenient method while when dealing with mean sequences, OD, auto-correlation sequences. So here's our mean vector defined, and here's our correlation matrix, which is defined, as i said, by creating this. Now, leave it up to you to work out this expansion. It's relatively straightforward for you can quickly see that if I took a column vector and I multiply it by a row vector and each of the terms in this correlation matrix will be, for example, versus the expected value of X sub n times x of n conjugate, and you could just expand it out. Now, this has some interesting properties, or it has even more interesting properties if we assume that random process for which we are creating these correlation matrices is wide sense stationary. So if it is y cents century, we see a couple of interesting things. We see that beakers be correlations or function or any of a time difference than all the elements on the main diagonal as you become correlation at 0 lag. More importantly about CVE at oval correlations themselves don't depend on the time index n. So effectively this becomes a constant. So there are a number of properties which I'll leave you to verify for yourself is described more in the handout. But most of all we find from the correlation matrix is a constant matrix. We find that for simplification of the auto correlation functions van, you find that you end up with what's called a tape create structure. I'll show you in a moment. But there's also conjugate symmetry. So that means that the correlation matrix simplifies to restructure. I believe each verified this yourself as I've already said. But what you'll see is on the main diagonal is always same value. Uber correlation or 0 lag. For you also see that the other diagonals, all sides have the same values. And in fact, over diagonals have popped two. What that means is that the entire matrix is actually defined by the first row. Everything else comes from structure. So in fact, we just need to Nave of M correlation terms and our correlation matrix is already defined in a very straightforward way, but is what is called a Toeplitz matrix. It means you have this diagonal band structure, but it is also Hermitian because it's complex symmetric. So just as an example of correlation matrices, suppose I certain random process X of n has this exponential form for the correlation function. Say We're seeing this type of correlation function before. 0 lag for correlation has a value four and then it drops off in an exponential decaying fashion in both directions. I have given several examples of this are seen in previous videos. So to create the correlation matrix for n equal to three, for example, where we've, we've just taken the first free samples. Then writing out from the previous page, you would have the first three terms are x x, x x one and x two. And you've got the template structure, which means that you got this band structure, which makes it very simple and many simply need substitute in the values in. So you substitute l equal to 012. And this is a type of correlation matrix that you end up with nurses clearly tape clips. It's also symmetric. But you can also use ferry that we had in previous lectures about testing. X is positive definite. So it certainly satisfies the basic tests. It's symmetric. The all x, x of 0 is greater than equal to 0 because it's along the same main diagonal. And you could test this in a positive-definite way as we did in the earlier topic. So just in summary for today API in wrapping up on this chapter on stochastic processes, I've covered a few, if you'd like, loose ends. I've covered some of definitions and of a possible ways of manipulating stochastic processes in themselves are not really worthy of a separate topic, but they all important tonight. Now, actually the topic of joint statistics and correlation matrices. And now allow you to address remained of a self-study questions. Don't want stochastic processes. And it's worth noting that at this point in the course, you are beginning to bring together lots of different bits of fairy. And as a result, each question itself might cover dissonant elements that we've covered elsewhere. So for example, there's not going to be one specific question just on cost correlations, cross-correlation as a tool for issues to solve more complicated problems. Thank you very much.