This is a list inspired by some of our current or potential lines of work at the World Bank Innovation Labs. The “Innovations in Big Data Analytics” program helps to strengthen the World Bank capabilities to effectively use big data in its operational and strategic work.
We are always looking for great Data Scientists. If you can solve any of these [using open software], you'll be heads down helping us from day one. Email us to [email protected]
(This list is updated frequently).
We are building an open stack to process nightly data from satellite and query light output from all known villages. Currently we are doing 20 years of nightly data for 600,000 villages in India.
Beta site and API at nightlights.io
This API is vastly rich with information. The team in University of Michigan has done great analysis, like calculating the electrification point, of slope of growth, and creating statistics and maps. There are many possible improvements to this and related work. We are particularly interested in those with highest operational value, like opportunity analysis, access, covariant dimensions, ...
We are currently using a proprietary source for the 600,000 locations. We would like to use an open source, like OSM. OSM currently only has 30k villages. We could find other sources to improve our open options, e.g.:
- Find open databases
- Based on land classification on Landsat, or higher resolution sources.
- Based on light output. It will be biased towards electrified, but it will improve OSM.
We currently only looking at output at the village coordinates. In many cases, villages have a distinct isolated light area that could be measured, given an indication of growth.
Alexei Abrahams has shown a process to increase the spatial resolution of the satellite data by addressing the issue of the swiveling motion of the detector head in the the older satellites that introduces a known spread function that can be deconvolved. By de-blurring the nightlight images, we can make better comparison of nightlights across time.
In 2015 the world agreed to the goals and targets of the Global Goals or Sustainable Development Goals (SDGs). The Indicators will likely be agreed on March 2016. At our lab we are focusing on the data dimension. We started an open repository to collect all this information and openly offer it on machine readable format. As part of this effort, we would like to answer the following questions:
Some [proposed] SDGs indicators are successors from the MDGs or other system where data has been collected over the past years. Some are new. In most cases this data even available via different API formats. A system that pulls and collects this information would greatly help evaluate where we stand, so we can plan who to get to the target.
Make visualizations to understand where we are in the data inventory and the characteristics of the SDGs indicators by using the metadata of the indicators, such as sources of data, countries and years of availability.
We have road network, road classification, village location, and population data for a defined region in Asia. To determine the impact of Bank rural road projects and to more effectively prioritize roads for future improvements, we seek to measure how improvements in rural road networks affect the percentage of the population that can access urban services within a given timeframe.
Given an OSM road network and GIS census data (points), use open software tools (like turf.js and OSRM) to generate isochrones and calculate access to closest city (boundaries and/or points).
Based on isochrones, generate statistics (X% of target population can access Y in Z minutes). Build very simple scenario query-builder, to see how different road rehabilitation projects (i.e., how increasing travel speed) affects these statistics. Build an optimization model for the minimum length of road improvement necessary to meet pre-determined accessibility targets.
We are currently partnering with providers of traffic data generated via vehicle fleet GPS sensors. The GPS locations are aggregated, anonymized and converted into timestamped speed on OSM segments. Besides offering traffic aware directions, the system builds the capacity to modify provided traffic lights timing to reduce traffic, thereby improving system performance and reducing CO2. We are seeking technical support to prepare congestion analyses with these data. Applicants should have expertise in working with very large datasets and GIS.
We are helping the Bank’s Trade & Competitiveness Global Practice bring data into their work understanding global trade flows. For example, matching trade & tarif codes and descriptions in large text files, then helping us process and visualize these data.
We are hoping to soon start using Github.com to share some of the World Bank Groups’s research code / methods / algorithms / etc. We need somebody to help us figure out the best way to manage governance of a Github presence in a large, complex organization - and help us write some guidelines to help balance control and openness. This may involve designing well-defined manual procedures; configuring automatic triggers; and a degree of open knowledge evangelism!
We are partnering with several units across the Bank and across the world to demonstrate how drones can be used efficiently to address land rights, floodings, coast erosion, urban sprawling, ... We are also buildings tools, best practices and lessons learned from our experience. For example scripts to automate cloud uploading and processing when local resources are limited (but connectivity is not an issue), Flight checklists and preparation checklists to gather all the needed material, reduce risks, and incorporate all relevant stakeholders.
We want to know (1) where roads are, (2) overall condition --paved or not--, (3) single or multi lane. At the regional, national or global level.
We believe we could develop and open stack to train a Deep Learning network to detect roads on mid-resolution satellite/plane/drone images (say ~1m or better) using OSM as a training set. The stack would identify candidates and produce the traces of the un-traced roads as vectors (in phase 1) and classify them as paved or not based on the color (phase 2), multilane or not based on the width (phase 3).
At the current stage we are asking experts and vendors about feasibility, with the idea of producing an appropiate Scope and Terms of Reference.
FYI, I have since left the Innovation Labs, but work continues, of course. Some of these have been delivered or are about to, some are still in the radar, and much more is of course possible.
Please forward questions regarding this list to Trevor Monroe (@trevmon28 on Twitter) ... or myself (but now from my new hat of Social Impact at Satellogic.
Thanks!!