Three Almost Invisible Women
(Or, What You Can See in a Word Cloud)
By Stephen P. Weldon
Last week while talking to my staff about what to include in a major revamp of the website, it became clear that we could do something very interesting with the thesaurus. Several years ago, I posted a list of the 23,000+ terms (subjects, persons, institutions, and place names) that I have in my thesaurus file, all nicely alphabetized. It was list that was primarily for reference. It didn’t do anything on its own, and it certainly couldn’t give anyone a sense of the bibliography, but I figured it was important for anyone who wanted to see what we used as authorities.
But now the trick was to figure out how best to provide the information and make it more relevant. “How about a word cloud?” I don’t remember who suggested it, but it was one of those Aha! moments in which a whole host of ideas emerged all at once in my head. More discussion, and we had a plan. A single word cloud wasn’t enough; we needed smaller chunks. So we made an arbitrary cut through the chronology at 1800–this roughly divides the bibliography in half; and we clumped the disciplinary divisions into about five or six big chunks. The result was a manageable set of items.
I went home and made some new indexes to extract the terms more easily, and, voila!, I had a list of words and phrases that I could paste into a word cloud generator. I chose Wordle simply because it is easy and popular. I will likely try a few others before I settle on a final solution, but this is great for now. It shows the top 150 most popular terms in a text that is pasted into the browser window, and this is quite enough to get a good impression.
The first thing I noticed is that some of the clouds are easily readable, whereas others are not; together, they are quite revealing. Of the first sort, one finds that a lot terms have a frequency that makes them no more than fiver or six times smaller than the largest words—still legible. This means that you can visually assess the relative importance of quite a lot of people and subjects. The modern social sciences, for example, show this kind of a pattern in both the subject term list and the personality list.
The unreadable ones, however, have one or two terms or names that so dramatically out-number all the rest, that most of the list is simply a fog around the central concept or person. This is the case with biology in the twentieth century. Here Charles Darwin dominates so completely that you can barely read his co-theorist’s name, Alfred Russell Wallace, who comes in a very distant second along with a small group of men like Ernst Haeckel, T. D. Lysenko, Theodosius Dobzhansky, and Ernst Mayr. Then the names shrink into indeterminate lines of pixels.
There’s a lot to ponder there, for someone who wants to assess where we are as a discipline. Of course, some of this must be a result of the classifier—that is, me—and this list does force me to look closely at what I’m doing, to make sure, not only that I’m not missing whole citations that should go in, but also that I’m tagging people and topics fairly. But I know that “classifier bias” can only be a partial explanation. These clouds do reveal a lot about the nature of the sciences we study as well as the way we write history.
Looking at these smaller chunks of the CB was interesting, but I really did want to check out the picture from the very top. What do we get when we use all the indexed names and all the indexed subjects. So, 149,904 subject tags and 24,925 personal name tags later, I produced two more Wordle clouds. Neither one was particularly surprising to me on first glance. These are the people and subjects I would have expected to see dominating the cloud.
However, as I casually looked around at the personalities cloud—the one that showed the top 150 cited persons in the bibliography—I realized that I didn’t see any women. None at all. That was a bit of a shock. I looked and looked and looked, and finally I saw, peeping over the top of the list (above Albertus Magnus) Émilie du Châtelet. With a little more effort I also found, thank goodness, Marie Curie and Margaret Cavendish. (I don’t think you can read any of these on the version below; you’ll have to get them at the Wordle site.)
I always knew we gave a picture of history that was quite male oriented, but this word cloud really did put things into perspective for me. All kinds of questions come to mind about what this means and whether it tells us anything of special interest that hasn’t been said before. But for now, I’m going to leave it at this. Just the visual image. Just the teaser. For those who want to explore this and a few other word clouds that I made over the weekend, here is the link. And I promise to add more in the next few weeks.
In a couple of months, I will be unveiling the first iteration, a beta version, of the IsisCB Platform that I’ve mentioned in previous posts. When that happens, then everyone will have a chance to play around with this data. I’m hoping it will encourage some innovative historiographical research. I know that it will provide me with ideas about how to make the discovery service more attuned to different researchers’ needs.
Before I close, I want to acknowledge my assistants Kirsty Lawson and Carolyn Scearce for such a scintillating discussion that prompted all this.