Taking on the challenge of “big data” to enhance the value of longitudinal cohort studies

Lydia Marshall
18 February 2020

             “Future proofing” longitudinal studies means emphasising their relevance alongside and in combination with “big data”.

Longitudinal surveys are resource-intensive, expensive to run and demand a lot from the people who take part. Meanwhile, a wealth of information has become available in administrative and commercial datasets. For example, many governments and companies hold huge amounts of data about individual people, from national school exam results and tax records through to statistics on what people buy. One challenge facing longitudinal studies is the idea that this “big data” means that we can now find out everything we need to know without directly contacting participants and without paying to conduct surveys.

Longitudinal researchers, like our team at Young Lives, dispute this idea. Survey research can offer unparalleled insights into people’s attitudes, values, behaviours and decision-making. It can also address major blind spots in administrative datasets, which can often overlook or misrepresent certain voices from society. Longitudinal surveys in particular can help us to understand how early circumstances can shape people’s lives, and to track the impact of different policies and demographic changes.

In January, a group of us from Young Lives attended the CLOSER conference on “international approaches to challenges facing the longitudinal population studies”. I was pleased that discussion at the conference did not fall into the trap of bullishly ploughing ahead with survey data to prove it is “better than” big data. Instead, there was a shared sense that “future proofing” longitudinal studies means emphasising their relevance alongside and in combination with big data. Data linkage was seen as a huge opportunity both for conducting powerful research and for convincing funders and participants of the value of our work.

Linking to other datasets can support powerful research…

At the conference, we heard about numerous examples of interesting and impactful research projects making use of data linkage. This included linking survey data to medical records to understand the social determinants of health in older age, linking to employment data to examine social mobility, and linking to geographical information to investigate the impact of outside influences – for instance the availability of fast food – on children’s health and nutrition. All of these examples demonstrated that linking survey data to other datasets can support really powerful research. The crucial takeaway for me was that such powerful research must be hypothesis led– identifying a key research question or policy problem, and then exploring the data available to answer that question, rather than “linking for linkage’s sake”.

…and publishing and amplify powerful research is the best way to make the case for our surveys.

Discussions at the conference also really reinforced for me the idea that data linkage can help to maximise the resonance and relevance of our research.

If we want to carry on successfully conducting longitudinal surveys, we need to convince participants to continue to take part and funders to continue investing in our studies. And, if we do want to link to other datasets, we need to persuade data owners such as government bodies to let us do so.

It was widely agreed at the conference that the best advert for our survey data is the research we can produce from it and how we communicate that research. Because if our cohorts don’t see the relevance of our research to their lives, or understand how it could make a difference, why would they want to continue to be involved? Why would data owners and research participants agree to data linkage if they don’t see the purpose of doing so? And why would research funders continue to invest in our studies, if we aren’t communicating their significance and value?

This means that we need to amplify the resonance and relevance of the work that we are doing. Making the most of our data by using it as a ‘platform’ for data linkage is one exciting way that we can do this. For example, there is lots of debate in the UK at the moment about the influence of social media use on children’s health. Linking longitudinal surveys to social media data would help us answer this question, and this research would likely resonate in the media, policy arena and beyond. Another way to amplify our impact is to encourage other researchers (with ideas other than our own) to make use of our data, whether on its own or linked to other data that they have access to.

Data linkage offers endless opportunities for future Young Lives work. However, it also throws up challenges. There was a general consensus at the conference that our technological capacity for data linkage has developed quicker than the ethical frameworks and standards for best practice needed to support it. The CLOSER conference was a great opportunity to reflect on Young Lives’ own work, meet researchers, data managers and communications and engagement staff working on longitudinal studies across the globe, and learn from their experiences. Such events seem essential to be able to develop shared frameworks and standards and to ensure that we make the most of the rich data produced by our longitudinal studies.