Generating Data Action

2013 interactive civic tech landscape study from the Knight Foundation.

2013 interactive civic tech landscape study from the Knight Foundation.

How an MIT professor hopes to pave the way for data to empower civic change

By Mary Aviles

Even when well-articulated, the private sector applications of data science can sound quite alien to public servants. This is understandable, as the problems that Netflix and Google strive to solve are very different than those government agencies, think tanks, and nonprofit service providers are focused on.
— Alex Engler, 
Brookings Institute

Technology, applied responsibly, has the potential to drive social change. Public tech, sometimes called gov-tech, can connect and mobilize people, improve city experiences, and reduce government friction. I have seen, just in my own work, the benefits of applying technology to examine issues concerning: inclusive economic development, workforce, education, youth development, mobility, urban planning and design, food security, housing, and poverty. According to Gartner Group, public tech spending is growing on digital services, like public health, impacted by the pandemic. Despite that, capacity challenges and scarce funding have left much of this potential untapped. Scalability and sustainability are major challenges in this sector. Against this backdrop, Sarah Williams, an associate professor at MIT’s Department of Urban Studies and Planning, has built a portfolio from civic empowerment.

Source: MIT Press.

Source: MIT Press.

In December 2020, Sarah will release her first book, Data Action: Using Data for Public Good. She considers it “a manifesto for those who want to use data to generate civic change.” Recently, I interviewed her about her interdisciplinary expertise and the project work that informed the book. The following transcript has been lightly edited for clarity.

When did you start to think of yourself as a “data person?” Or do you?

Sarah Williams: I worked a lot on remote sensing and GIS (geo-informatics systems) very early on in my career trajectory. But I don’t think I ever thought of myself as a “data person” specifically. I always thought that I used data to answer questions that I was interested in. I think of myself as a landscape architect. I have a lot of interest in environment, climate change, and racial equity — that is, applying my skills as a data scientist on environmental and racial equity issues that shape our public landscape. I do think that I’ve been branded more as a data person because of my work at MIT and really trying to emphasize the need to use data to create policy change.

I also felt like there was a missing area, where when we talk about the ethics of data, or we talk about the use of data for both elevating certain positions but also oppressing, that there was perhaps this real hole in the current literature, so I became interested in this stuff. Maybe that’s how I became more of a data person as well.

You’ve worked in all different locations. There are various disciplines involved. Your projects have a variety of applications. I’m curious about the path from one to the next.

SW: I came into data science through geography. I was a computer science and geography major during undergrad. I think my projects have just been a combination of people reaching out to me and me reaching out to them. For example, I have a lot of work in the African continent, and that has to do with somebody very early on in my career asking me to get involved in a project with Nairobi. I developed a commitment to that region. At first, the goal was to improve the condition of the city of Nairobi, but then there was this realization that what we were doing in Nairobi could not only be applied to other cities in the African continent, but also in the global South in general. A lot of the work that I’ve done with informal transit really started there.

[Author’s note: The Digital Matatus Project is an open data effort that collects transit data from cellphones for use in mobile routing applications.]

I’ve also done quite a bit with criminal justice and criminal justice policy — looking at issues of equity and race. In fact, one of the chapters of my book covers the ways in which we can use data to help highlight some of the injustices that exist within our criminal justice system. That started as an area of interest and after I left grad school when I got involved in the Million Dollar Blocks Project and just kind of kept going.

[Author’s note: The Spatial Information Design Lab and the Justice Mapping Center sourced inmate residential addresses from Bureau of Justice statistics data and census data to show blocks where more than one million dollars is spent annually to incarcerate residents.]

L/R Source: Columbia Center for Spatial Research.

L/R Source: Columbia Center for Spatial Research.

Screen Shot 2020-10-20 at 5.10.44 PM.png

Recently, I’ve been reinvesting in restorative justice work. Right now, we’re looking at a visualization project that examines prisoners rights, especially related to workforce — how much they get paid and some of the injustice involved in the way those jobs are created. There has been recent coverage related to prisoners fighting fires on the West Coast, but then they can’t actually become firemen after their incarceration.

I’ve also been involved in “data literacy” projects. Data literacy needs to be included in our school curriculum. We use data all the time, and it should be a skill that we learn, just like we learn math. City Digits focused on using data as a way to teach youth about issues in their community, while learning math at the same time. We relied on the kinds of data points that were most relevant to the particular community with whom we were working, tying the teaching to a real-world subject. We worked with the Bushwick School for Social Justice and we embedded data literacy within the math curriculum.

Source: Civic Data Design Lab at MIT

Source: Civic Data Design Lab at MIT

We used maps quite a bit because maps are oftentimes fractions, right? And, we taught ratios and percentages — for example, the percentage of African Americans in a community. We also decided to pick a topic related to a particular issue that the students wanted to investigate. With one particular class, we examined lottery tickets, which also involves math. We could look at the percentage of people who buy lottery tickets and cover not only how much money they spend on lotteries, but also the probability that they’ll win. That way, we could demonstrate how to collect data or where data comes from, but then actually take it to the end and show them, using math skills, how that plays out.

How do you get non-data people over the relevance barrier? How do you get them to engage?

SW: So you mean like how do we move them from, “This is a stat or a number” to something where they can take action? It’s absolutely about visualizations. I’ll say it over and over again, the communication strategy is the number one way that I get people to understand the power of data. In almost every project that has been the case. Consider the Digital Matatus Project where we collected data on informal transit systems in Nairobi. Everyone knew that data was important because they’d heard data was important. They knew the hype around smart cities meant you must have data. The city and the officials were kind of loosely interested in the project until we created visualizations of the concept and showed it as one comprehensive system that they could use to make decisions.

Source: MIT Department of Urban Planning.

Source: MIT Department of Urban Planning.

Source: MIT Department of Urban Planning.

Source: MIT Department of Urban Planning.

It just transformed that project and really created something that the government, NGOs — everybody — could use because now they could understand what that data meant and what they could do with it.

Visualization is the number one way that you communicate the power of data. — Sarah Williams

I talk about this idea in Data Action. Very early on, statisticians knew that they needed visualization as a skill to communicate their efforts. Building interdisciplinary teams is critical to making powerful visualizations. You need policy experts in the field who help contextualize the problem. You need data scientists who can help process that information. And then the designers and the communicators who can transform and translate those insights to the broader public. One further team member of critical importance, though, is the community represented by the data itself. The community feedback is absolutely essential. I don’t know how many times I’ve been in a meeting where the stats are wrong, and somebody from the community could have told them that right away, had they just asked.

Tell me about your experience with the networks that develop and evolve to continue to support some of this community work?

SW: In Nairobi, we have a center for the development of open data for transport. We additionally have one in Latin America, called DATUM, which is also focused on development of data for informal transit. The Latin American network was informed by the work that we did in Africa. To step back for a second, these are the main bus systems in most of the world. It’s only Europe and the US and some parts of Asia that have more formal systems where data are collected and can be analyzed. In these informal systems, the data just do not exist. So when we did the project in Nairobi, we sparked interest from a lot of people who wanted to do their own data collection. We started to help them use our tools for their projects. Then, those people started teaching other people. And, through that, we built this network. Then, we actually raised funds to keep that network going. As a result, now on DATUM, there are tutorials, links to resources, and connections with other groups that have done the work.

This kind of data collection is hard during COVID, but we recently finished a project in the Dominican Republic, that was informed by what we learned on a project in Mexico City. Now, I don’t have to be personally involved — the network can be teaching other parts of the network and people from the Latin American context can be teaching each other.

Part of what we recommend as critical to this network is connecting to local universities — having the local communities do all the data collection and work with students or others in that process. We’ve created training materials that go through how to get started and who to connect with in your community. For example, on the transit work, getting buy-in from the local transit system owner was a major first step. Typically, they have a union. It’s not just the government that you need to talk to, but it’s also making sure that you talk to the drivers, the actual workers, as you get started.

Where are some opportunities you see for addressing inequity with data?

SW: We live in a world where we think data is everywhere. One of the things I talk about in the last chapter of my book is missing data. We have so much missing data, and that missing data tells you so much about what we’re interested in, what we care about, but also it can really lead to inequity. As practitioners, we talk a lot about showing observations in data as being inequitable, but what’s missing can be just as inequitable.

Ghost Cities was a project a funder brought to us. The guiding question was: how can we create socially-equitable real estate in China? These ghost cities that have been manufactured in China are going all over the world and they’re not equitable. They create huge risk in the real estate margins. We set out to explore how we could we address it with data.

[Author’s note: Researchers scraped data from from Chinese social media open access API’s, including Dianping (Chinese Yelp), Amap (Chinese MapQuest), Fang (Chinese Zillow), and Baidu (Chinese Google Maps) to evaluate community viability and score foreclosure risk on the Ghost Cities project. The model identified areas without amenities and allowed the team to map these over-developed locations.]

Source: Civic Data Design Lab, urbanNext.

Source: Civic Data Design Lab, urbanNext.

How have you been able to open policy dialogues or get invited to those tables?

SW: A lot of my projects are kind of bottom up. On the Digital Matatus Project, we didn’t have the data we needed to answer the questions. But, we were constantly building dialogue on transportation. And, on the Ghost Cities work, which was also bottom up, I really had to go after it, really leverage my connections and start talking to people in China. But, the biggest door opener was when we made a website where people could visualize our analytics and play around with them.

After trying to get a sit-down with academics and real estate agents and getting nowhere, the visualizations helped a ton to allow for that dialogue to happen. It was absolutely a marketing vehicle. In all of the data visualizations that I’m doing, I’m advocating for something. My bias is all over it. It’s fine to just say it’s a marketing thing. It’s a communication device. It is also a transparency device. It can build trust. In Ghost Cities, I allowed the Chinese government to explore the data and the model behind my work. There was instant trust — that doesn’t always happen when dealing with government bodies.

Alright, now tell me a bit about the book; let’s give readers a preview.

SW: I frame the book with a historical perspective - examples of how we use data for good and bad, so that when we talk about good and bad uses of data, the meanings are clear. I hope that people use the book as a methodology for how they can create change using data.

There are three really important components to that:

  1. I advocate for collecting your own data and using data that’s out there creatively, bringing both qualitative and quantitative data together.

  2. Sharing and visualizing are critical.

  3. I also emphasize that building interdisciplinary teams is the most effective way to create data for policy change.

I end the book with a discussion on the future of data and society, asking some larger questions such as: “Are we data colonialists?” Data access is being consolidated, and not just by the government anymore, making regulation more difficult. Private companies play a large role in decisions that are being made with data. I hope the book challenges people to consider the ways in which they can use data for action in their own communities.

[Author’s note: Data colonialism refers to the process of appropriating data for the purposes of extracting value rather than to, as a government might, establish societal safeguards.]

Data Action: Using Data for Public Good will be available from the MIT Press in December, 2020, and can be purchased from a variety of retailers.


For more information about Sarah Williams’ projects such as Million Dollar Blocks: background on the architecture and justice and the pattern, as well as details about the scenario planning used by the project team. Here is an online visualization of Chicago’s Million Dollar Blocks.

Here is some additional detail about City Digits: Local Lotto.

More about the Ghost Cities project can be found here and here is a video explanation of their amenity model.

Derek Poppert provides a useful primer on public tech. And, WIRED has a recent take on anticipated industry growth.


This article first appeared in Nightingale, The Journal of the Data Visualization Society.