Research - Digital Geo

Digital Geo

Geographic Data Science - with a HINT of nerdom


As a Geographer and Data Scientist, I work on large collections of data in order to understand patterns in this data. More specifically, my research lies at the intersection of remote sensing, geographic data science and urban science. With respect to remote sensing, I’ve had the opportunity to work with and to learn from well- known researchers that use and analyze such this data to explore a wide range of applications. These applications range from basic mapping and analysis of land use and land cover changes using optical and radar data to that of working with applications that require more advance image processing and machine learning algorithms. With respect to data science, some of my research involves the extraction and analysis of open sources of geographic data (e.g. OpenStreetMap, Wikipedia and Google Trends) in order to extract knowledge and insights from this data in a data-driven manner. Finally, as for urban science, my PhD  research was centered on the development of a methodology for automatically mapping slums in developing countries using open sources socio-economic and remote sensing data. This research has allowed me to not only appreciate the increasing role that remote sensing plays in supporting applications such as the mapping of vulnerable populations and poverty, but also the limitations of using this data on its own for supporting such applications. Some highlights from selected research projects follow. A main theme throughout all these projects is the use of open sources of data for understanding, monitoring and building more resilient, smart and sustainable cities. 

Slum mapping using open sources of data and data mining techniques
My PhD focused on the use of data mining tools and techniques for detecting and mapping slums. So what are slums you may ask? Slums are typically characterized by very poor people living under substandard and improper living conditions, which lead to severe issues with respect to public health, discrimination and limited opportunities in some cases to leave such situations. Close to one billion people currently live in slums. The number of slum dwellers is expected to increase to two billion by the year 2030 and further three billion by 2050 if adequate measures are not put in place to better manage their. Most of these vulnerable people live in and around urban centers in developing countries, where the majority of future population are also expected to be absorbed. This is cause for serious concern given that many of these areas often lack the infrastructure and basic services necessary to adequately provide for the increasing number of people, leading to further growth in slum populations and the increase incidence of slum settlements.

Slum in Jakarta (Image source: Wikipedia)

One of the biggest challenges with detecting and mapping slums is that of acquisition of up-to-date and relevant data. Data on slums is usually non-existent, outdated, or inaccessible. As a result, no global database for slums yet exist. One could make the argument that remote sensing data can be used to map slums globally, however, remote sensing only captures physical aspects of the slum with the equally important need to also capture their socio-economic aspects as well. Further, while high and very high resolution imagery have been around for almost two decades, studies using this imagery have been few in comparison the the many locations that contain large slum populations, and have yet to been studied. This is shown in the images below. Consequently, there has been a skewed distribution of slum studies using high and very high resolution image data.  To overcome this challenge, during my research I curated a new database of remote sensing and socio-economic slum indicators, and together with data mining tools, fused these indicators together to map slums at the settlement level in the three largest cities in Kenya, which also contained the largest slum populations for that country. In addition, unlike other research that study slums, I worked on very large geographic areas; slums in these study areas accounted for a very small fraction of the total area study. 

Number of studies using high and very high resolution (H/VH-R) imagery to study slums at the country level

Number of studies using high and very high resolution (H/VH-R) imagery to study slums at the administrative district level

Quality assessment of volunteered geographic information
With volunteered geographic information (VGI) platforms such as OpenStreetMap (OSM) becoming increasingly popular, we are faced with the challenge of assessing the quality of their content, in order to better understand its place relative to the authoritative content of more traditional sources. Most studies have focused primarily on developed countries, showing that VGI content can match or even surpass the quality of authoritative sources, with very few studies in developing countries. In this research, I compare the quality of authoritative (data from the Regional Center for Mapping of Resources for Development (RCMRD)) and non-authoritative (data from OSM and Google’s Map Maker) road data in conjunction with population data in and around Nairobi, Kenya. The results of this study show variability in coverage amongst datasets. RCMRD provided the most complete, albeit less current, coverage when taking into account the entire study area, while OSM and Map Maker showed a degradation of coverage as one moves from central Nairobi towards rural areas. Furthermore, OSM had higher content density in large slums, surpassing the authoritative datasets at these locations, while Map Maker showed better coverage in rural housing areas. These results suggest a greater need for a more inclusive approach using VGI to supplement gaps in authoritative data in developing nations.

Pairwise difference in road coverage. Clockwise from top left: (i) RCMRD 2011 vs. Map Maker 2014; (ii) RCMRD 2011 vs. OSM 2011; (iii) RCMRD 2011 vs. OSM 2014; (iv) OSM 2014 vs. Map Maker 2014; (ii) RCMRD 2011 vs. OSM 2011; (iii) RCMRD 2011 vs. OSM 2014; (iv) OSM 2014 vs. Map Maker 2014 (red cells: first layer has higher coverage; green cells: second layer has higher coverage)

News coverage and digital activism
In line with the aforementioned work on VGI, an important area of study is understanding the various motivational factors that drive people to contribute to online collaborative platforms such as OSM. One area that has yet to be studied until now is that of the influence of news coverage. It is reasonable to think that news would influence people to map places on OSM, especially if their is repeat coverage and if the current topic resonates with users. Using refugee camps from around the world as a case study, we examine the relationship between news coverage (via Google news), search trends (via Google trends) and user edit contribution patterns in OpenStreetMap, a prominent geospatial data crowdsourcing platform. In addition, we compare and contrast these patterns with user edit patterns in Wikipedia, a well-known non-geospatial crowdsourcing platform. Using Google news and Google trends to derive a measure of thematic public awareness, our findings indicate that digital activism bursts tend to take place during periods of sustained build-up of public awareness deficit or surplus. These findings are in line with two prominent mass communication theories: agenda setting and corrective action, and suggest the emergence of a novel stimulus-awareness-activism framework in today’s participatory digital age. Moreover, these findings further complement existing research examining the motivational factors that drive users to contribute to online collaborative communities. This paper brings us one step closer to understanding the underlying mechanisms that drive digital activism in particular in the geospatial domain.

Land use/land cover mapping
I’ve worked on various projects that fuse optical and radar remote sensing data to map land use/land cover (LULC) in both developed and less developing countries. While various sources of global and regional LULC information exist, this information is usually captured at different spatial, temporal and spectral resolutions, and are derived using data from different remote sensing sensors and with different nomenclatures in some cases. Such information may also not be updated on the temporal basis required for supporting land use/land cover activities for some countries. Traditionally, land use/land cover data was sourced from optical data, however, the persistent cloud cover that exists over some regions of the world make this data less useful in some cases. Radar can penetrate cloud and other atmospheric disturbances, making it suitable for collecting information in cloud persistent areas. The fusion of this data, as my research shows, results in higher LULC mapping accuracies compared to the use of optical and radar data on its own. Throughout such research on LULC mapping and monitoring, I also investigate the use of image texture, along with related properties such as window size and image despeckling for improving LULC mapping accuracy. Further, given the increasing availability of multiple sources of remote sensing data that is becoming almost ubiquitous in science applications, I further explore various opportunities for selecting the most suitable image bands and combinations of these bands for improving LULC mapping accuracies.

PALSAR 12 May 2007 image (HH, VV, and HV BGR) of Wad Madani. Centre image coordinates 14.4° N, 33.5° E.

Arequipa, Peru divergence values by best combination for different number of bands, horizontal axis. This graph shows that only 5 or 6 bands are required for a viable classification.


Academic and Professional Groups
LinkedIn |  Academia |  ResearchGate