This presentation is about cohort studies and their role in epidemiology. Thanks to Nanker Halbert smart and Sarah wild for sharing their materials. What will this presentation cover? We'll look at what a cohort study is, what are its key features. We'll look at some examples of influential cohort studies which have improved our understanding of health and disease. We'll look at how data are analysed in a cohort study. We look at the strengths and limitations of cohort studies. And I'll highlight what to look out for when you're critically appraising a cohort study. This is a recap from the previous presentation on cross-sectional studies. In that presentation, we highlighted a key weakness of cross-sectional studies. Namely that because they just collect data on exposures and outcomes at a single point in time. They cannot determine which comes first. Did the exposure preceded the outcome or did the outcome precede the exposure? We use the example of a cross-sectional survey of London Transport Workers, drivers with sedentary jobs, and bus conductors with more physically active jobs. If we had conducted a cross-sectional survey at point B on the slide, we defined that whereas none of the more active bus conductors had angina, 9.5 per cent of this sedentary bus drivers had angina. This might lead us to conclude that having a sedentary jobs such as being a bus conductor increases the risk of angina. But we have to be very careful here. We have to remember that a survey conducted at point B is just a snapshot in time. We don't know whether the bus drivers got angina because of their sedentary jobs or whether they chose to do sedentary jobs because of their poor health. We don't know which came first, the sedentary work or the angina, the exposure or the outcome. This is a key drawback for cross-sectional studies and there's no way round it. If we want to figure out which came first, the exposure, or the outcome, we need a different study design. We need to start our study at point a with a healthy cohort of workers, nobody has angina. We then need to follow up this cohort of workers over time, keeping track of their exposures, what job they do, whether they change jobs and so on. And also keeping track of their outcomes, whether they develop angina. It's only this sort of study, a cohort study, which enables us to determine which comes first, the exposure or the outcome. In other words, it's only by using a cohort study that we can establish a temporal relationship between an exposure and an outcome. The easiest way to understand the mechanics of a cohort study is by sketching it out in a diagram like the one on the slide. At the beginning of the study, the box on the left. All the study participants should be free of the outcome of interest. Some of them should have the exposure of interest and some of them should not have the exposure of interest. So e.g. if we were doing a cohort study to investigate the association between smoking and lung cancer. At the beginning of the study, none of the participants should have lung cancer. Sum should be smokers and some of them should be nonsmokers. Next, the researchers follow up the participants over time. At the end of the study, the box on the right, they count up the number of people with the outcome in the exposed group and in the unexposed group. So they would count up the number of smokers with and without lung cancer and the number of non-smokers with and without lung cancer. If the proportion of smokers with lung cancer is bigger than the proportion of non-smokers with lung cancer. This would suggest that smoking is associated with, with lung cancer. This type of cohort study is called a prospective cohort study. If something is prospective, it means it's expected to happen in the future. The researchers have to wait for the outcome to occur. One of the challenges of doing this sort of cohort study is that it takes a long time, sometimes decades to get the results. And so it's very expensive. There is another alternative. This is called a retrospective cohort study. Instead of waiting for decades for the outcomes to occur, the researchers look back in time. To collect data on the exposures. Nowadays, with the availability of routinely collected data, this is much easier than it used to be. And the approach is becoming much more common than it used to be. Here's an example of how a retrospective cohort study would work in practise. Let's say that the researchers are interested in smoking and lung cancer. Again, they might choose a group of people who are alike in many ways but differ with respect to the exposure. So e.g. they might choose as their study population, female nurses, some of whom smoke and some of whom don't smoke. They would then collect data on the exposure which is smoking and the outcome, which is lung cancer from the medical records of this study population. It still follows the same principles. All the participants are free of the outcome at the start of the study period. The difference is that the study period begins sometime in the past. Here's a summary of the pros and cons of these two approaches. In general, prospective cohort studies are more likely to have complete, unreliable data. Retrospective cohort studies rely on routinely collected data. So data on important confounding factors such as smoking status are often missing. On the other hand, retrospective cohort studies are quicker, cheaper, and easier to conduct and are better for diseases with long latency periods because you don't have to wait for years for participants to develop the outcome. Let's look at some examples of some influential cohort studies that have contributed to advances in our understanding about disease risk factors. The figure on this slide is taken from a very famous cohort study on smoking and lung cancer. Epidemiologists Richard Doll and Austin Bradford Hill used British doctors as their study population. This was because they thought that doctors would be a relatively simple group to keep in touch with and to follow up. In 1951, they sent out questionnaires to British doctors and recruited around 40,000 of them to the study. Numbers of female responders were quite small, so they decided to restrict the study to males. They followed up this group of doctors several times over the next few decades. They published a paper in 1956 showing that the death rate from lung cancer among heavy smokers was 20 times the death rate in non-smokers. They also showed that lung cancer death rates were substantially higher among cigarette smokers compared with pipe or cigar smokers. They showed that the heavier the smoker, the more likely he was to die of lung cancer. Important advantage of cohort studies is that it's possible to collect data on more than one cause of death. This study was also able to show that men who smoked were more likely than non-smokers to die from coronary heart disease. The last paper from this study was published by doll in 2004, just a year before he died. By this time, the idea that smoking causes lung cancer was well established. However, the British doctors study was still making a valuable contribution to our understanding. The 2004 paper showed that smoking Lord, life expectancy by an average of ten years. And that half of those who smoked were killed by their habit, as shown in the image on the slide. Another highly influential cohort study is the Framingham Heart Study, which is still going on more than half a century after it was set up in the middle of the 20th century, little was known about the general causes of heart disease and stroke. But the death rates for cardiovascular disease had been increasing steadily since the beginning of the century in the USA. In common with other developed countries. In 1950, the Framingham Heart Study was set up in the small town of Framingham, massachusetts to identify the common factors that contribute to cardiovascular disease by following its development over a long period of time in a large group of participants who had not yet developed overt symptoms of cardiovascular disease. Cardiovascular disease. Or suffered a heart attack or stroke. The original study involved over 5,000 men and women between the ages of 30.62 who have continued to return to this study every two years for a detailed medical history, physical examination, and laboratory tests. Over the subsequent decades, several more cohorts were recruited, comprising the adult children and grandchildren of the original cohort. And because the original cohort was overwhelmingly white, and additional cohort was established to reflect the ethnic diversity of the population. Over the years, the Framingham Heart Study has identified the major cardiovascular risk factors including high blood pressure, high blood cholesterol, smoking, obesity, diabetes, and physical inactivity. As well as providing a great deal of valuable information on the effects of related factors such as blood triglyceride, an HDL cholesterol levels, age, gender, and psychosocial issues. This has led to the development of effective treatment of preventative strategies. The image on the slide shows some data from this study, which illustrates that although all cause cardiovascular disease mortality rates have fallen over the period of the study, mortality rates are consistently higher for those with diabetes compared to those without. The graph on this slide is taken from a famous series of studies conducted in the UK called the Whitehall Study of British civil servants. This study was very influential in demonstrating the importance of social factors in determining our health and life expectancy. The Whitehall Study of civil servants was a long term cohort study of the health of men employed a civil servants in Britain. The study sample encompassed professional civil servants from the top of the civil service down to clerical and unskilled staff at the bottom. In other words, there was a social hierarchy of participants with the administrators at the top in pale purple on the graph, followed by the professional and executive staff in dark green, then the clerical staff in pale green, and ancillary staff, drivers, caterers, cleaners, and so on in blue, at the bottom of the social hierarchy, the graph shows the relative deaf death rates of these different cadres of civil servants at different ages. The first set of bars shows the death rates among 40 to 64 year olds. The next set of bars shows the death rates among 65 to 69 year olds and so on. What this study showed was that not all needed, the people at the top of the hierarchy live longer than the people at the bottom. But that there was a strong correlation between the social hierarchy and death rates. The administers, the administrators had the lowest death rates, followed by the professional and executive staff, followed by the clerical staff, followed by the ancillary staff. In other words, the health of these men followed a social gradient. The researchers looked at one disease in more detail, coronary heart disease. They wanted to see if they could explain why this was happening. They found that one-third of the social gradient in coronary heart disease mortality was attributable to smoking. Hi, plasma, cholesterol, high blood pressure, being overweight, and having low levels of physical activity. So some of the difference was accounted for by differences in lifestyle. People further down the social hierarchy were more likely to smoke, less likely to take physical activity, and more likely to be overweight than people higher up the social hierarchy. However, that left two-thirds of the social gradient in coronary heart disease mortality, unexplained. Even after taking lifestyle factors into account. Death rates from coronary heart disease were inversely related to social position. We still don't fully understand why this is the case. One possible explanation is that chronic stress, which impacts disproportionately on poorer people, might be to blame. However, more research is needed in this important area. I want now to look briefly at how cohort studies are analysed. You can see that at the end of a cohort study, there are four potential groups. Groupie is the exposed group who developed the outcome of interest. E.g. smokers who have developed lung cancer. Group B. Group B is the exposed group who did not develop the outcome of interest. So smokers who don't have lung cancer. Group C is the unexposed group that developed the outcome of interest. That would be non-smokers who developed lung cancer. And finally, Group D is the unexposed group that did not develop the outcome of interest. So non-smokers who don't have lung cancer. This information can be transferred onto a two-by-two table, like the one on the slide. The first thing you might do in a cohort study is to examine the incidence of the outcome in the exposed population, e.g. the incidence of lung cancer among smokers. This is calculated by dividing the number of smokers with lung cancer by the total number of smokers. So a divided by a plus b. In other words, this is the proportion of the exposed participants. He went on to develop the outcome during the period of the study. Next, you can work out the incidence of the outcome in the unexposed population. In other words, the incidence of lung cancer in non-smokers. This is calculated by the number of lung cancer cases who are non-smokers, divided by the total number of non-smokers. So c divided by c plus d. Or in other words, the proportion of unexposed participants who went on to develop the outcome during the period of the study. At this point, I want to introduce a new metric, relative risk, also referred to as the risk ratio. Relative risk is the main outcome indicator produced by cohort studies. Relative risk is defined as the incidence of the outcome in the exposed population divided by the incidence of the outcome in the unexposed population. So in our example, the incidence of lung cancer in smokers divided by the incidence of lung cancer in non-smokers. It is calculated as shown on the slide. It's a measure of the association between the exposure and the outcome. So how do you interpret relative risk? You can see from the equation that if the incidence in the exposed population is exactly the same as the incidence in the unexposed population than the relative risk will be equal to one. So if you get a relative risk of exactly one, it indicates that there is no association between the exposure and the outcome. If the incidence in the exposed population, population is greater than the incidence in the unexposed population, the relative risk will be greater than one. A relative risk group greater than one indicates that the exposure is a risk factor for the outcome. In other words, that there's a positive association between the exposure and the outcome. Conversely, if the incidence in the exposed population is less than the incidence in the unexposed population, the relative risk will be less than one. A relative risk less than one indicates that the exposure is protective against the outcome. Or in other words, that there's a negative association between the exposure and the outcome. This latter situation might occur, e.g. if the exposure is a vaccine where you might expect it to protect against the development of disease. Like any study design. Cohort studies have their advantages and disadvantages. Let's start with the advantages. Cohort studies are particularly good for rare exposures. This is because you select your study sample on the basis of the exposure. If e.g. you wanted to investigate the impact of exposure to a particular pesticide and the risk of Parkinson's disease. You would set out to recruit people known to have been exposed to that pesticide. Another advantage is that cohort studies enable you to study multiple outcomes. So in the example I've just given. You could study the association of pesticide exposure to a whole range of different disease outcomes. Another advantage of cohort studies, as I discussed in the introduction to this presentation, is that they can provide data on the temporal relationship between the exposure and the outcome. In contrast to cross-sectional studies, you can be sure that the exposure was present before the outcome occurred. Compared to case control studies, which we will look at next. There's a lower risk of selection bias in cohort studies. This is particularly true for prospective cohort studies. The final advantage of cohort studies is that you can estimate the incidence of disease as we have seen. However, cohort studies have disadvantages too. They are inefficient for rare outcomes. If it is a rare disease, you have to wait a long time or have a very large study to develop enough outcomes. Cohort studies are relatively expensive and time-consuming, particularly prospective cohort studies. Finally, cohort studies have a high risk of loss to follow up. In other words, people being recruited into the study and then later dropping out. This is a really serious problem because it can distort and bias the results of the study. If the dropouts differ in important respects to those who remained in the study. This is the final slide of the presentation. What should you look out for when critically appraising a cohort study? The points on this slide are summarised from the Joanna Briggs Institute and critical appraisal skills programme, critical appraisal tools. Firstly, pay attention to how participants were selected. The two groups, the exposed and the unexposed, should be, should be recruited from the same population. They should be as alike as possible, except for the exposure of interest. Think about the generalizability of the results of the cohort studies. Cohort studies often use occupational groups as their study population. But this limits the generalizability of the results to the study, to the results of the study to the general population. Secondly, pay attention to how the exposure and the outcome were measured. This should be done objectively, not subjectively. Any tools should ideally be validated. E.g. if the outcome is depression, a validated tool to measure depression is much more reliable than relying on self-reported depressive symptoms. It's, Is it clear that participants were free of the outcome at the start of the study, because this is a really important feature of a cohort study. So how do you know that? What is there in the study that demonstrates that participants were indeed free at the outcome at the start of the study. Another important point about the measurement of exposures and outcomes is that the same approach should be used in both groups to avoid measurement bias. So the exposed and the unexposed should be treated exactly the same in how they are measured and how they are followed up throughout the study. Next half, potential confounding factors been identified and measured in this study, participants are really important confounding factors missing have confirmed the factors being taken into account in the analysis of the results. So in the analysis section, look out for where the results have been adjusted for confounding factors, whether regression methods have been used in the analysis, whether sensitivity analysis have been done and so on. Pay attention to how participants were followed up. What's the follow-up time reported? And was it long enough for the outcomes to occur where all participants followed up or were there dropouts from the study, which is called loss to follow up. If there was lost to follow up, are the reasons for the loss to follow-up described and explored. Was there anything special or different about the people leaving the study compared to the people who stayed, and what strategies were used to maximise follow-up. Those are the key elements that you need to take into account when you're critically appraising a cohort study. But you should always use a defined tool to do your, your critical appraisal. And the Joanna Briggs Institute and the CASP skills programme tool are two suggestions that you can use.