Anthropology and data science need each other

I have mixed feelings about my Ph.D. in anthropology. I often suspect I wasted a lot of time and money getting that degree. I studiously avoid using most of the theory I picked up in my formal education, I use methods that many (of course not all) anthropologists seem to view as quite un-anthropological, and anthropology as an academic discipline sometimes seems bent on making a poor name for itself among the general population.

Reading a news article about politicians mocking the idea that anthropological training is an employable skill, my initial reaction was “well, take out the snotty attitude and they may have some fair points.” Most of the reactions I saw among other anthropologists, however, seemed to mirror this response (from a LinkedIn thread about the article):

Anthropology students are trained in a wide variety of skills and disciplines that make them exceptionally qualified for the kind of work that Fortune 500 companies are looking for in their R&D; Marketing; Sales; and Legal departments, as well as good candidates for leadership positions in every department. Yet these companies are only looking for engineers, accountants, programmers, and MBAs. The last one being the biggest problem. All of these are valuable, but none of them are oriented towards thinking and working beyond the small perspective of the project immediately in front of them. To target liberal arts in general and anthropology in particular as being “unemployable” simply denotes the lack of imagination and forethought representative of the larger problem. If your people cannot think critically and evaluate problems across a variety of different horizons, then the bigger questions that drive innovation never get asked.

I see two major problems with that kind of defense of anthropology:

It defines anthropologists’ “perspectives” as their major source of value, which seems to be just about the weakest endorsement anyone could come up with, especially in a business environment. I’m not saying this line of defense it wrong. I’m saying it’s ineffective.

It equates anthropological/social science/liberal arts training with the ability to think critically and/or strategically. I think that’s both wrong and ineffective. No discipline has a monopoly on the ability to question assumptions or see the bigger picture – training for any set of skills creates as many blind spots as it does insights.

I haven’t seen any descriptions of what anthropology has to offer the world of engineers, accountants, programmers, and MBAs that doesn’t fall into one or both of the above dead-end arguments. I don’t think people generally assume that anthropology adds value, so it seems we’d be well served by figuring out better ways to justify our existence.

On the other hand, seemingly everyone assumes that data science adds value. But when I look at examples of data science’s value given by data science evangelists, I see something very similar to the public face of anthropology – the only difference is, people seem to accept the proclamations of data science’s value with much less skepticism. For example:

The other day, my twitter feed pointed me to these visualizations of every tweet ever geo-tagged. You can actually identify roads by the density of tweets. It’s totally cool. Is it valuable? I have no idea.

I just saw this one a few days ago: “How can we visualize millions of photos taken in New York, Bangkok or Tel Aviv in such a way that cultural differences between these cities can be revealed? How can we read the “stories” made up by the users’ sequences of photos?” They plot the mean hue and brightness of all the photos on Instagram. Again, the visualizations are great aesthetically, but it seems more than a stretch to say that this tells us anything about “cultural differences.”

I get excited about the tools and skills that make these kinds of data science products possible, but the products themselves don’t demonstrate the value of data science. Most “big data” visualizations look impressive but they rarely help us do things we couldn’t do just as easily without the visualizations. For example, this site shows heatmaps of protests, stating that ”having a live map of protests…with ‘slow motion replay’ functionality could be quite insightful given current upheavals.” I don’t see how that can be particularly insightful, other than being able to point at the pictured and say “Wow, look at the protests.” For me, insight = actionability, and this stuff isn’t actionable.

I’m sorry to hear data scientists describe the the value of data science as “telling stories” (see here and here, for example). That just so drastically undersells the discipline. If I pull together thousands (or millions, billions, etc.) of data points from dozens (or hundreds, thousands, etc.) of different sources to create an analytic report which I then hand off to a manager, executive, politician, commander, or some other person supposed to allocate funds or make otherwise consequential decisions, then the ability of that product to actually influence a consequential decision depends on (in what I suspect is the order of priority):

how much of the report the decision maker actually paid attention to;

finances, logistics, and other operational concerns that limit the range of decisions that are actually viable at any given point in time;

how many other, conflicting reports of varying authority and credibility the decision maker receives on the same subject;

how much of the report the decision maker forgot between receiving the report and making the decision the report was supposed to inform;

the rhetoric/performance/design in which I dress my report in order to grab people’s attention and get them invested in the analytic results.

So my main concern is that storytelling, in most situations, is probably way down the line of things that allow data to translate into action. But even if we ignore all those other considerations, a focus on storytelling cuts an analysis down to whatever can be packaged into the data scientist’s interpretation of the results – the part of the analysis farthest removed from the actual data.

I tend to view all stories – whether from data scientists or from anthropologists – the same way: often engaging, sometimes thought-provoking, but rarely persuasive. The real power of data lies not in the ability to create stories but to evaluate alternative courses of action. I’m interested in what anthropologists saw in their fieldwork and in what data scientists saw in their models. I’m much more interested in why and to what extent I should believe that they saw what they think they saw, because that helps me decided how much better off I could be by acting on their findings rather than just sticking to what I’m already doing. A report, a presentation, and even most visualizations can’t provide that kind of decision advantage: they present a handful of interpretations and then abandon those stories to fight with all the other concerns and interests competing for decision makers’ attention. That’s a losing strategy.

What I have always found compelling about anthropology is its emphasis on ethnography as a research method. “Ethnography” itself is a poorly-defined concept, but generally it entails non-fleeting participation, ideally immersion, in a particular environment to produce a description that is both recognizable to people familiar with that environment and intelligible to people unfamiliar with it. Anthropologists don’t own the rights to ethnography – sociologists, psychologists, political scientists, journalists, and others have all conducted research that can rightfully claim the title of ethnography – but to my knowledge no other research discipline has made ethnography such a central part of its program. An anthropologist has a hard time getting a degree without doing ethnography. I don’t think that’s generally true of any other discipline.

Ethnographic research has been rightfully criticized for largely ignoring questions of validity and reliability. It is the right of any decision maker (and of the public in general) to ask a researcher questions like “why should I believe that you understood the situation correctly?” and “why should I generalize your insights?” I’ve seen anthropologists wave off those concerns with vague statements that their “qualitative” approaches should “inform” the use of “quantitative” research tools to address those questions. On my more forgiving days, I view that kind of response as a sort of lazy discomfort with unfamiliar tools. Most of the time I view it as an abrogation of our responsibility to those we study. Maybe that’s too harsh a response on my part…but if my job as an anthropologist is help communicate how other people view the world, I feel like doing everything I can to make sure those views don’t get denigrated or ignored is part of that job. It’s perfectly right that someone ask me to prove that I’m not full of crap. And it’s perfectly right for someone to ask me to prove that I’m not cherry picking. Ethnography alone is not capable of answering those charges.

I wonder if data science isn’t heading down the same road. What I like about data science is being involved in idea generation and research design as well as the analysis and product implementation. It looks like the idea generation and research design aspects of the job are what will eventually differentiate a discipline of data science (if such a discipline endures), from disciplines of traditional quant analysis and software engineering. Businesses will always need people who can crunch numbers and write programs, but if data science can’t consistently demonstrate an ability to come up with novel, actionable, relevant questions to answer, it will make more sense to keep the question-asking as the domain of executives and managers.

Anthropology (if done right) produces stories clearly connected to on-the-ground concerns and perspectives but fails to demonstrate that those stories more generally hold true over time or across other populations. Data science produces stories that often have the time- and population-level generalizations down, but frequently fails to connect that output to the practical context and constraints of individual decisions. Data science, as it is often marketed today, assumes away the decision-making part of the analytic problem. Anthropology, on the other hand, assumes away the fact that people’s decisions always involve multiple, competing options and that people need help choosing between those options.

This isn’t some sort of general call for more “mixed-method research.” I don’t find that designation very helpful. A lot of user-experience research, for example, just involves looking over someone’s shoulder to see that an interface is confusing – one data point may be enough to make a decision (unless simplifying the interface carries potential costs). In the context of real-world, consequential decision making, anthropology by itself is largely irrelevant and data science by itself is largely impotent. That’s not true of all product and situations – of course it’s not – but that seems to be the larger pictures we’re looking at.

I like John Foreman’s recent post about the need to to get data science consumers to try their hands at data science so they can get a sense of what it actually involves beyond the TED talks and PowerPoint presentations. I’m skeptical that people other than analysts, data scientists, and software engineers will ever feel comfortable talking about details of a regression analysis. I’ve even had people comfortable with regression dig in their heels as soon as I mentioned a random forest. But I do think if the systematic tools of data science can be coupled with the on-the-ground connection of anthropology, it’s possible to create prototypes that allow decision makers to pit their intuition against the predictive models, to try out multiple courses of action and get back probabilistic estimates of the outcomes of those actions. That requires a data science more focused on teams of diverse skill sets that go beyond particular technical abilities.

I think anthropology’s traditional tools – ethnographic field methods – offer the best opportunity to design products that facilitate decisions rather than just informing then. And I think the tools currently lumped together as “data science” are our best bet for making probability estimates as widely used as year-over-year revenue comparisons. I don’t think either discipline can meet its potential without the other.

If you want to comment on this post, write it up somewhere public and let us know - we'll join in. If you have a question, please email the author at schaun.wheeler@gmail.com.