Data for Good

So You Want to Be a Data Scientist?

A Q&A with Jake Porway, co-founder of DataKind, captures the emerging ethical considerations and market realities of data science as a profession.

January 25, 2016

A week before Glassdoor ranked “data scientist” the best job in America, based on its potential for career earnings and growth, a discussion was underway about another no-less-critical dimension of this emerging profession: its potential to contribute to social change.  That conversation took place on Reddit, on an “Ask Me Anything” thread with Jake Porway, the co-founder of DataKind, a non-profit organization devoted to partnering data scientists with social sector organizations on Data for Good initiatives. At a time when political and business leaders are gathering in Davos to discuss concerns about how rapid technological change might adversely impact employment, the questions and answers suggest opportunities in a future of tech-driven work— and not just for number crunchers, either. Some highlights:

The most popular question: The ethics of a new field

question 1

PORWAY: I think data scientists’ first ethical imperative is to practice “safe stats.” It can be tempting to go hacking on a dataset, excitedly training models and building flashy new data products. However, if we’re not using statistical rigor in our work — understanding the biases in the way the data was collected, deeply understanding the biases we’re introducing with different models — then we risk perpetuating junk science or, worse, creating dangerous solutions.

As more people base decisions off of our work, it’s on us to ensure that analyses are sound and that more isn’t taken from results than the results themselves support — so brush up on your statistical methods. Secondly, I’d say data scientists need to be more aware of the context surrounding their analyses and algorithms. It’s relatively easy to tune a model to, say, maximize a safety score for pedestrian walking directions, navigating people around areas that are higher in crime. Seems like a nice feature to keep users safe, right? However, that seemingly benign algorithm could have huge social ramifications. That same algorithm may inadvertently result in people avoiding neighborhoods based on a strong correlate, like its racial composition. It’s not that that outcome is bad or good, per se, merely that it’s a recognition that inadvertent social consequences can arise from our work. As more and more algorithms are put into use in social contexts, e.g. predictive algorithms that are used to sentence people standing trial or predictive algorithms that determine whether you’re fit for a loan, our projects take on real weight. This is a huge and thorny topic that deserves (and thankfully is getting) much more thought than I have time for here. Check out the Machine Eatable podcast we did about interrogating algorithms (http://www.datakind.org/blog/machine-eatable-recap-interrogating-algorithms), the Berkman Center and others that every day wrestle with the notion of ethical data science. In the meantime, you can do your part by going beyond asking “Can I do it?” to “What will happen when I do?”

In a job market where the most talented data scientists often work in the private sector, pro bono work plays a key role

question 2

 

PORWAY: Ah yes, of course I have to quote Jeff Hammerbacher here: “The best minds of my generation are thinking about how to make people click ads.” I actually see this as a huge opportunity, and it’s exactly what led me to found DataKind. Not a day goes by that we don’t hear from extremely talented individuals who want to apply their skills to making the world better. We try to highlight job opportunities we hear about that allow data scientists to work on social challenges (http://www.datakind.org/blog/data4good-job-alert-jan-2016), but unfortunately those opportunities are still somewhat rare. All that is to say that people need to pay the bills, and it’s not always possible to find a job that fits in the perfect intersection of data science + social good. That’s why pro bono models are such game changers. They allow people to plug in and use their skills to make a difference without having to leave their day jobs. So yes, your point is well taken, but I’m still optimistic that there’s a huge portion of data scientists that would love the chance to use their skills for good if only there were enough opportunities to do so. We would not exist without these individuals, and I think we’re only going to see more and more physicists, social scientists and biologists at big data companies spreading over to the social sector.

Even in a work ecosystem where data crunching takes the lead, professionals with other complementary skill sets are much needed.

question 3

 

NICK ENG, DATA SCIENCE MANAGER AT DATAKIND: One of the very important roles we have on our project teams is project manager — they help the teams define goals and make sure they’re on track to hit them, and they work with our partners to communicate progress, manage expectations, further define the problem statement, work through issue areas and contribute their soft skills overall in working with the project team. Although it definitely helps if you have data-crunching skills, and we definitely look for those skills in our project managers, that’s one way that someone who doesn’t have data-crunching skills could contribute to this field.

Agriculture and human rights count as major fronts in the future of data science

question 4

 

PORWAY: We’re really only just starting to scratch the surface of what’s possible when harnessing new, untapped and unique data sources like satellite imagery, open government data, cellphone data, etc., so I think there is room for serious data science work even for organizations whose work isn’t primarily quantitative…

Because underlying data technologies are increasingly being used by new social entrepreneurs, I think we’ll start to see more startup social change organizations like Crisis Text Line leveraging their data to tackle tough challenges in new ways. I think we’ll also see more established organizations like Amnesty International, Red Cross or the World Bank continue to embrace data-driven approaches to their work as more groups become aware of the potential of data science.

Agriculture, healthcare and education also come to mind as sectors where we see HUGE untapped datasets. One area I’m super-fascinated by is the human rights space, which is not only notoriously under-resourced and are therefore data poor, but it also faces a huge philosophical challenge in using data methodologies as they’re often the same technologies being used by malicious actors to perpetuate human rights violations.

Overall, I think this movement is just beginning due to the vast quantity of datasets now available and the inevitable uptick in understanding of data science in the social sector.