What would it mean for the digital humanities to build more bridges in their work? Last week nearly 700 digital humanists went to Mexico City to participate in the annual international Digital Humanities 2018 conference. The conference title was “Puentes/Bridges” – and a central question was how digital humanities can build bridges and create a more inclusive, global community. Here are a few of the highlights.
Opening Keynote speaker, Janet Chávez Santiago, an indigenous language activist, delivered an incredible talk in part about creating Teotitlan del Valle Zapotec, an online talking dictionary, to document and teach the Zapotec language. In her talk, she highlighted some of the barriers in her work. It’s hard to find partners to help her in her work. It’s difficult, too, to read Wikipedia articles that refer to Zapotec in the past tense. She playfully took photos with her iPhone during her talk to show that she – and other people who speak Zapotec – are not in the past. At the end of her talk, she addressed the audience: what will we put our effort and time to? Will our work increase access? Audience members turned to Twitter to ask if scholars were working on indigenous languages:
These are the kinds of conversations that build bridges in the digital humanities community.
Closing Keynote Speaker, Professor Schuyler Esprit (Dominica State College) talked about access in a different sense. She teaches her students digital humanities without many of the Internet resources available in the United States. When students don’t have access to as many resources, they often additionally face the psychological barrier of feeling “not good enough.” She prepares her students for digital humanities work despite these obstacles. Are there ways that universities with many resources could help out?
But even when scholars have all the resources possible, access is still huge issue in the digital humanities. In several workshops, we used datasets that others had created and made available – datasets available freely by companies like Google and Facebook, or datasets researchers created by using APIs to access digital editions of newspapers or other public documents. There is no central repository for humanities data, so finding relevant data is hard, tedious, and limited. The New York Times API, for example, will let you gather articles from the late 1980s to the present. The collection of articles you can collect, then, is already limited. But, additionally, there are many publications that do not have an API. How does relying on APIs and pre-made datasets limit what digital humanists study? What are we studying and what are we not?
These were some of the questions we explored in Eun Seo Jo, Javier de la Rosa, and Scott Bailey’s Machine Reading workshop that looked at, in part, an algorithmic approach called “Word2Vec” to understand a large corpus. The approach takes a large corpus of texts and produces a vector space with each word in the corpus assigned to a vector. The point of it all is to be able to understand, to some degree, a giant corpus of text.
We looked for the pronouns “she” and “he” in a corpus of articles from Google News to see what terms are most likely to appear in close proximity to these pronouns. For example, we asked: will “she” often appear closely in vector space to words like “doctor,” or “philosopher”? It turns out, as we all can assume: no.
“She” was paired in vector space with “nurse” while “he” was paired in vector space with “doctor.” While we learned how to create word embeddings and explore word analogies, we also spent time thinking about the biases in the text corpora we were exploring. Studying pronouns is one example of how word vectors can show aspects of a corpora that humans need to think critically about in order to understand the broader meaning behind why words exist closely together in a text. How do we avoid carrying forward the biases in our datasets in our research?
There are no easy answers to the questions of access that so many raised during DH 2018. But I’ll continue thinking about how I can critically look not only at what datasets I choose or make, but also at how my research and digital humanities projects invite or speak to a global audience. What could digital humanities be if we built more bridges with our work?