Conversations with Young Lives' Data Managers
Ahead of the publication of our new report on data management, we sat down to speak with some of Young Lives’ Data Managers, past and present, about their experiences of the role.
This is the first conversation in the series, with Anne Yates Solon, Monica Lizama, Tien Nguyen, Shyam Sunder and Hazel Ashurst.
What were your roles at Young Lives and what did they involve?
“My job was first and foremost to manage all the data.”
Anne Yates Solon, International Data Manager, 2007- 2018
Anne: I advised on how the questionnaires were designed to ensure they matched a database system, and I built the databases. I’d then work with the country teams, typically the data managers, to edit and make sure those databases were fully functioning.
In the early days, when we had paper-based questionnaires, we would have teams that would enter the data into the databases. They would do it twice, so we could compare to check for data entry errors. I’d work with them to manage those processes, then I would clean the data, and work to archive it* and provide all supporting documentation, which involved tracking the preliminary questionnaires.
“A lot of the job related to conducting data checks, to ensure that the data was valid.”
Monica Lizama, Peru’s Data Manager, since 2008,
(joined as Statistical Assistant in 2003)
Monica: We relied on the enumerators to conduct data checks in the field, then we conducted additional data checks when we cleaned it. There are various ways to corroborate the data, to ensure it is as clean as possible.
"I'm in charge of data management for the Vietnam team and worked to support Anne in Oxford on data managment."
Tien Nguyen, Vietnam’s Data Manager, since 2006
Tien: For each survey, I support the Principal Investigator (PI) in Vietnam to prepare the questionnaire. When the fieldworkers go to the field, I also go with them to supervise the fieldwork. When the survey’s finished, I support the team with the data entry. I supervised the process to enter the data in Round 2 (from the paper surveys to the program).
“My job involves checking, validating and keeping data securely, as well as analysing the data, when required.”
Shyam Sunder, India’s Data Manager, since 2006
Shyam: I joined the project in Round 2. Maintaining a large data-set and cross correspondence with other partners was initially a challenge for me as I was new to such roles. Fortunately I had support from the Senior Researcher who had been involved since the beginning and was familiar with the data management protocols, as well as a Senior Statistician in our institute who was also involved in Young Lives in the initial years.
Anne: The role slightly changed with the introduction of Computer Assisted Personal Interviewing (CAPI). We piloted versions of CAPI, on tablets. I developed those programs based on the questionnaires and built those with the country teams. Then I would send them out and monitor the process in the field. Aspects of confidentiality, data security and sharing the files on encrypted, password protected servers, were all involved in the role. That’s an important part of the job: holding confidential data and making sure it doesn't get out.
“It was an interesting time because from Round 3 onward CAPI was used. It was exciting and completely different for everyone.”
Hazel Ashurst, Data Coordinator, Oxford from 2011-2013
Hazel: When I joined, Round 3 was being cleaned at that time and I got Round 4 up and running. I helped set up Round 5 too.
Tien: From Round 3 we surveyed with tablets. We supported the programming team to prepare CAPI. In Rounds 4 and 5 we used SurveyBe. Anne came to help prepare the program, but we translated the program into Vietnamese.
Hazel : My job was to set up CAPI screens in SurveyBe. I set up data checks and balances at the point of data entry, including skip patterns, and carried out testing.
How did the introduction of CAPI change how you worked?
Anne: The paper-based questionnaires had a timeframe of six months to a year to get usable data because of all the entry and the cleaning to get it from paper to an electronic version. Often, if there were queries, you could ask the fieldworker for clarification, but they may or may not remember it, but you certainly couldn’t go back and clarify anything with the respondents, since so much time had passed.
With the introduction of a technology-based platform, we were able to build data checks into it, so they happened in the field. For example, if [a Young Lives’ child] said they weren’t in education, but answered an education question later, CAPI could flag that. Then the interviewer would know to go back and dig a bit deeper about why the answers weren’t consistent. CAPI also provided a useful form of data, almost immediately. It still needed further cleaning, but it was a usable format.
What was your experience with using software for data management?
Anne: By and large it sped up the process. It was much easier to clean data, we got it faster, so it was easier to use. A lot of the data checks that previously we would do manually, at a later time, were built into the program. It is important to note that the paper-based system was time intensive on the back end after the interviews, while the CAPI system was time intensive on the front end.
Hazel: We used SurveyBe, which was very time consuming to set everything up on. In Oxford, one person would set it up and one would check it. They were under a great deal of time pressure. In a couple of instances, things were missed, due to constrained working to time. However, this was rare.
Tien: It has changed since Round 3, since that is when we transferred from paper to the program. Two thirds of our survey data were collected on tablets and one third was collected on paper in Round 3. It was very different. We also used the Personal Digital Assistant (handheld PC) in Round 2, and the screen size was very small, only 5 inches. From Round 4 and 5 we did all our surveys by tablet. These tablets were 10 inches in diameter. The larger screen made it much easier.
Anne: The main challenge was that as the research team was building the questionnaire, we had to be building the program. The checks you have to build in are time intensive. You also want to load in data from the previous round, like the names of the schools, what grade they were in, the names of the household members, date of birth, and things like that for verification reasons.
What challenges did CAPI present?
Hazel: It was the right decision to go to CAPI, but it didn’t save as much time as I thought it would. We were still data checking in Oxford. The set-up and testing process of SurveyBe was time consuming and we were dealing with high volumes of data.
Anne: It’s a very detail-oriented process. I think a large challenge was that sometimes people didn’t understand how long it took to get clean data, or to get the program set up. They would say, “just add in this check to the program,” and although that sounds like it would take a few minutes, it could take three days of your time to try to figure out how to program it.
Tien: With the transition to CAPI, some fieldworkers were not familiar with it and initially not good at it. They needed to use paper surveys for the third round. We had to enter the paper data and merge the data. It was a lot of work. We trained fieldworkers, and hired new fieldworkers by Round 4, and we required them to be able to use CAPI by Round 4.
Monica: A challenge with CAPI was the introduction of technology to families who were used to the paper-based surveys. For some families, the technology scared them. Older people, in general, had a more difficult time than young people so we had to be mindful of the way we introduced CAPI, for both the family being surveyed via the technology and the enumerator using it. Some of the enumerators struggled with the transition to CAPI, so we had to train them and ensure they were comfortable with the technology.
Tien: Data skips are built into the program. The fieldworker can check the answer and maybe correct it in the field. If the answer is not correct, they can check it with the program. But, if the program skips something the fieldworker cannot check the information like they could have with paper surveys. The skips are automatic. Interviews do go faster with tablets though.
Monica: Also there were other factors that affected CAPI in the beginning. For example, in Peru we have different regions. The coastal region, the Sierra, the rainforest. In the mountains, the tablets functioned more slowly because of the high altitude. By Round 5, the situation had improved and everything was easier. The new tablets were faster, even in high altitude. They were also smaller and lightweight, compared to the previous heavy tablets. Another difficulty is that the survey program needs to be on a Windows tablet, and those tablets are more expensive compared to an Android tablet.
Anne: We were also concerned about how we’d keep the charge on the devices. If they were in the field all day, or if they were at a site that didn’t have electricity, we thought about how they would charge them at night. It was a challenge, so we bought external battery packs for people to have on hand.
The software we chose didn’t require anyone to have access to WiFi. It could all be saved and shared later. Some software is internet-based, and there are others you download and you can sync up later when you’re in an area with WiFi access.
Shyam: In terms of technical challenges, the biggest was, at the village/mandal level, access to internet. We also had to make sure we had two standby laptops to act as replacements when the need arose. There were some ethical issues as well, but field supervisors and investigators have been trained to restrict village people, who are eager to see the information recorded on the laptop, from having access. They’re also trained to keep this information private.
Anne: Another ethical concern was security. You’ve got people running around with tablets, in some maybe not-so-safe areas. We went to a lot of effort to ensure they had bags to carry them and that they didn’t bring them out on the street. If they felt they were threatened and someone was trying to steal the tablet, we told them not to fight back, just give them the tablet. They didn’t use them if they felt they were in a dangerous situation or a place with high crime, they would use a paper-based questionnaire at that site.
I think one tablet was stolen…actually I think a fieldworker fought them off, even though we told them not to!
I know some were broken, screens would stop working. Sometimes they wouldn’t open, or they got a virus. But those instances were rare. I expected that to be more prevalent than it was.
Monica: Our surveys take a very long time, usually around two hours to complete. And in the past, in Round 3, unfortunately we lost four surveys due to issues with technology. The only option was to return to the field and ask the survey questions again.
We would have needed to tell families that we lost their answers. In the end, we were able to recover the surveys and thankfully in Rounds 4 and 5 we didn’t lose any surveys. But, returning to families would take so much time, so we always need a ‘plan b’ to ensure we don’t lose data, or to ensure we can save it if we lose it. So, we have told enumerators if something goes wrong with a tablet, start recording answers on a paper survey. Either way, data collection is a long process. The enumerators send the data to Lima, we clean it, and I send the data to Oxford. So, we hope in future rounds we won’t have any issues with the technology, because we are always afraid of losing data.
Leading up to the release of our data management report, we will be publishing further conversations with our data managers. Continue the series - read part two and part three of our conversations now.
Due to Covid-19, our in-person research for Round 6, Young Lives at Work, has been delayed till 2021, and we have adapted our processes to deliver a telephone survey which will provide rapid new research data and insights into COVID-19 related impacts. With DFID's support Young Lives longitudinal research will be continuing into 2025.
* Young Lives survey data are systematically prepared and archived with supporting documentation for each round through the UK Data Service.