Social development projects often collect substantial data for monitoring and evaluation purposes. The purpose is usually to track participants through their project experiences and determine the circumstances through which participants are able to achieve project goals. Projects often collect data not only at the individual level but also regarding surrounding environments. Sometimes they even collect data about non-participants for comparative purposes. We collect a lot of data!
In some cases, data feed directly into governmental management and information systems. In other cases, data are just maintained by the projects themselves for the purpose of tracking and client reporting. When projects end, these data are shared with the client and are then archived for posterity. Ideally, data are organized before they are transferred with the necessary human protection protocols to make sure that individual participants cannot be identified and targeted for nefarious purposes.
In almost all cases, though, data transferred to clients are never used again. While an occasional graduate student may request access to historical data for a master’s thesis or doctoral dissertation, the benefits of these large financial investments and human resources that go into projects’ data collection and management usually end with projects’ closing.
With such a wealth of underutilized data available, the opportunity is immense to examine big questions in social-development research. This is particularly so given technological advances in big data management and analysis. It is now possible not only to analyze data from one project at a time but also extract data from multiple sources to examine issues across time and context. The system can even include current projects’ data, with the appropriate protections to safeguard existing participants.
This initiative will build, test, and deliver a publicly available big data extraction and analysis system. We will start with data from donor-funded education projects and government-implemented educational management and information systems in target countries. We will then consider developing parallel systems for other social development sub-sectors. In phase 1, we are working with partners to identify and prepare at least three large-scale, current project databases for piloting, creating the user interface, and conducting initial queries. After the system has been tested and validated, Phase 2 will include the preparation of new databases, using both current and extant data, as well as the release of the user database for public use. Last, Phase 3 will include tools to help project directors and government officials to add their own databases directly to the data portal.
We will share regular updates about progress in developing the system, opportunities for researchers and database managers to test and use the system, and links to research reports based on data analysis using the system.