The Burden of the Veil

Revealing Themes in A Light in August, Native Son, Invisible Man, and To Kill a Mockingbird

Introduction


It's so long ago and far away that here in my invisibility I wonder if it happened at all. Then in my mind's eye I see the bronze statue of the college Founder, the cold Father symbol, his hands outstretched in the breathtaking gesture of lifting a veil that flutters in hard, metallic folds above the face of a kneeling slave; and I am standing puzzled, unable to decide whether the veil is really being lifted, or lowered more firmly in place; whether I am witnessing a revelation or a more efficient blinding.

Invisible Man, Ralph Ellison



  This project is an extension of my dissertation research, in which I examine the affective dimensions of discourses concerning race and race relations since the era of transition between the colonial period and the early republic (when anti-black prejudice became solidified by the systematic conceptualizations of race necessary to ideologically construct the new republic and its citizenry). In the first iteration of my project, I used the default suite of Voyant Tools, primarily Cirrus, Document Type Frequencies Grid ("Terms"), Document Type KWICs Grid ("Contexts"), and Type Frequencies Chart ("Trends") to identify (potentially latent) key and otherwise significant terms, concepts, themes, and patterns in the following texts: William Faulkner’s Light in August (1932), Richard Wright’s Native Son (1940), Ralph Ellison’s Invisible Man (1952), and Harper Lee's To Kill a Mockingbird (1960). In this iteration, I begin to explore topic modelling processes using MALLET ("MAchine LEarning for LanguagE Toolkit") to continue analyzing these novels. I have selected these texts to comprise my corpus because they are written by key critics of U.S. race relations whose thinking reflect major/significant racial perspectives of their day. I am interested in investigating common themes throughout the corpus: what themes or topics do they share and do not share?; where do the thematic preoccupations of the novels overlap and why? Although these are some questions I have spent quite some time answering, using tried and true close reading analysis, I hope to discover latent or inconspicuous structures and patterns in my corpus by implementing the linguistic, statistical, and machine learning techniques involved in computer-assisted analysis.

Next Section

Tools & Methods


  "MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text" (McCallum). To get started using and familiarizing myself with the software, I went through The Programming Historian’s quickstart tutorial. I was soon on my way toward working with my own text. Before I could successfully do that, however, I had to do some data preparation. First, I had to transform all of my PDF copies of the novel to plain text documents (.txt). Then, I had to do some date cleaning, including removing the HTML code that Adobe uses to format the text as well as symbols that MALLET cannot recognize (e.g., ï, single and double smart quotes/curly quotes). Unfortunately, I had not realized that my data was dirty until the first few trial runs with MALLET, where I saw some funky stuff, like this:


5 0.05204 bigger dalton don’t max jan white felt room eyes mary i’m black life didn’t gus asked britten mrs ain’t feel


I also had to find an alternative copy of Native Son, as the copy lost spaces between words in MALLET’s processing (which has happened in other instances of machine reading).

  After initially looking through the topics that were produced, I was at lost with how best to interpret them. I found the articles in The Programming Historian’s bibliography to be helpful. "[The World's Plainest] Introduction to Mallet & Python" by Ted Underwood gave me a helpful perspective. He explains that, for a literary scholar, it’s a good thing when a topic hard to interpret (this made me feel better about my confusion). He writes, "I want this technique to point me toward something I don’t yet understand, and I almost never find that the results are too ambiguous to be useful. The problematic topics are the intuitive ones — the ones that are clearly about war, or seafaring, or trade. I can’t do much with those" (Underwood). In short, the goal is to get "’meaningfully ambiguous’ results" (Underwood). In addition, I searched things like, "how to use MALLET with novels." Christof Schöch’s "Topic Modeling with MALLET: Hyperparameter Optimization" made me realize 1) that I could adjust the optimization-interval, yielding more or less helpful results by enabling some topics to be more prominent than others; and 2) that I needed to divide my documents into smaller segments. In regards to the first point, I had been setting the optimization interval to 20, at TPH’s suggestion. Reading about the hyperparameter optimization on the MALLET website, "Optimization every 10 iterations is reasonable." I decided to try this setting, which is what I used for the model that I will be analyzing in this iteration of my project.

  Secondly, I wasn’t sure how to go about segmenting the novels, so I looked into an article I remembered the TPH tutorial mentioned, called "Finding scientific topics" by T. L. Griffiths and M. Steyvers (2004). It seemed a bit too scientific for me, so I googled "how to segment novels for topic modelling." In "On Paragraphs: Scale, Themes, and Narrative Form," Algee-Hewitt, Heuser, and Moretti (2015) write, "if one wants to use topic modeling to analyze literature – then paragraphs are a better unit than ‘mechanical’ segments . . . our findings suggest that paragraphs . . . act as the textual habitat of themes" (8). This made absolute sense to me. However, I did not know how to parse my documents and split them by paragraphs into separate files using Python or R; I intend to learn in short order. In the meantime, to do this manually would be an insane endeavor. So, I resigned myself to splitting them by chapter, which I did manually. Because Native Son does not have chapters, I determined ends of "scenes" at which to segment the novel. Although not quite as segmented as suggested, this was a start.

  In addition to the hyperparameter optimization and segmentation of my novels, I experimented with the number of iterations for MALLET to cycle through and the number of topics to generate. After reading a few articles, I decided to go with 2,000 iterations because it seems to be proportional to the number of texts in my corpus while supposedly increasing the quality of my topic model (this is something that I want to further test through more experimentation). I started off with 20 topics, following the example of the TPH tutorial, but I found that 20 topics wasn’t enough because not many showed correlations between all of the texts. I experimented by increasing the number 5 until I reached 50 and decided to cap it there because I read an article for a topic model study of a corpus comprised of 270 crime fiction novels that ran 60 target topics with MALLET (Schöch 2014). I’m glad that I settled with 50 because more would have been a bit more unwieldy for me to handle at this time. It was a good balance between balance and manageability.

  All that said, this was my final string of commands:


bin\mallet train-topics --corpus-chapterized.mallet --optimize-interval 10 --num-iterations 2000 --num-topics 50 --output-state corpus-17-topic-state.gz --output-topic-keys corpus-17_keys.txt --output-doc- corpus-17_compostion.txt


(As you might guess, this was the 17th run I’d executed in MALLET.)

  After running the analysis and getting my results, I explored the 3 documents that MALLET produced: 1) the list of every word in my corpus and the topic it belongs to (huge file that had to be compressed and then unzipped); 2) the list of 50 topics; and 3) the text file indicating the percentage breakdown of each topic within each of my files, which I imported into a spreadsheet to help me understand the data better. I went, primarily, between the first and second documents. I used the composition breakdown in the spreadsheet to help me determine which topics strongly correlated to which files (i.e., which segment of a novel), which means that the presence of the topic was at least 10%. This helped me better interpret the topics because I could gather to which text(s) a topic was particularly pertinent and get an idea of how each text (or segment of a text) was topically related to another. I was able to really start grasping my topic model once I labeled each topic with the name of the text(s) to which it was strongly related because I could, then, organize the topics to make them less overwhelming and start putting terms in the topic into context by recalling what a chapter was about or by searching in the corresponding file.

  In the follow section are reflections from my initial analysis of the topics generated in this model.




Works Cited

McCallum, Andrew Kachites. "MALLET: A Machine Learning for Language Toolkit." http://mallet.cs.umass.edu. 2002.

Algee-Hewitt, Mark, Ryan Heuser, and Franco Moretti. "On Paragraphs: Scale, Themes and Narrative Form." Pamphlets of the Stanford Literary Lab, no. 10, 2015. Internet resource.

Schöch, Christof. "Topic Modeling French Crime Fiction." The Dragonfly’s Gaze, 1 Feb. 2014, https://dragonfly.hypotheses.org/530.

Next Section

Analysis


In this section, you will see groups of topics (each its own pararaph) with a heading label indicating the text(s) to which they strongly correlate, followed my reflections. I put the topics in descending order, from most prevalent to least. The number at the beginning of the topic paragraph indicates the topic number. The second number reports the Dirichlet parameter, which is roughly proportional to the overall portion of the corpus assigned to a given topic. Finally, the terms in the topic are listed in descending order.



All Novels


14 1.64887 back thought looked time long door heard made knew hear put voice left wanted found stopped mind feet good home

48 1.06125 man time woman night told day face men years life knew kind lived god live young home place dead white

42 1.0285 don't it's i'm that's didn't he's you're i'll things make good folks can't thing ain't man you'll people i've wouldn't

10 0.90526 head white black light eyes heard cold lay voice stood body slowly air sat ran hard move dark people street


  Topic 14 is a bit tough for me to interpret still. It does remind me of the prevlance of terms like "time," "voice," "hear(d)," "knew," and "wanted" from the first iteration. I will later search the files to identify the significance of terms like "good" and "home," or perhaps use Voyant Tool's "Contexts" tool. Topic 48 pertains mostly to A Light in August. As I expected, from the first iteration, this novel is preoccupied with "man" and "woman," as the text is about man-to-woman relationships, especially considering the tension between accepted intraracial sex and the taboo of/aversion to interracial sex. Topic 42 contains a lot of negative words (e.g., don’t, didn’t, can’t, ain’t, wouldn’t), pronouns (e.g., it’s, I’m, he’s, you’re, I’ll, that, it), and nouns (folks, things, thing, man, people). I'm not sure if this is particularly meaningful, or if it just means that there isn't a ton of thematic similarity represented in this topic. Topic 10 pertains to all of the texts but To Kill a Mockingbird. There seems to be the expected theme of color and race (white, black, dark), the related duality of light and dark, and, as in the first iteration, the significance of voice and hearing. I'm curious to understand how "head" relates. I'm not sure I would have thought about it, but in the three novels to which this topic pertains, there is quite a bit of running from danger, implication, etc. (ran, street); but what is the connection to words like "lay," "stood," "slowly"? There is a lot related to motion and lack thereof in this topic, with these words and a word like "move." I'm wondering what this might suggest.



A Light in August


41 0.42071 house face thinking child bed door hands hand dark believed don't turned quietly years quiet entered room empty beneath watching

5 0.27669 reckon ain't town don't folks house told didn't fire fellow jefferson wasn't watching run nigger face kind back brown couldn't

25 0.17903 began road negro hard darkness beneath trees town car earth brown single rose window sun summer cot cabin trousers dust

2 0.06964 son father thinks war faces phantom spirit blue slaves stage wheel past yankees instrument accepted physical son's created death slave

15 0.05362 christmas brown byron work mooney square saturday mill whiskey monday drunk bunch brown's whistle sawdust morning yesterday clothes foreman burch

38 0.04113 father cabin i'll tyrica highlight burden mother christmas calvin boy comment dish father's born kitchen married phase pistol saturday year

3 0.03261 byron hightower it's man ain't cabin thinks woman watches brown that's i'll sits doctor desk good bunch wagon feel find

40 0.02261 grimm deputy sheriff running square ran christmas blood train uniform stevens jail courthouse pistol grave negress fire thirty bicycle legion

7 0.01462 mceachern joe boy woman max waitress waiting blonde don counter mrs rope bobbie money downlooking holding road secret lane saturday

23 0.01805 sheriff dogs negro men christmas deputy thousand jail roz proprietor square street white durn hot paid sheriff's smell heat immediately

22 0.01608 hines doc nigger man god eupheus christmas jail didn't uncle children matron god's lord devil abomination mottstown mad chair corridor


  Admittedly, my analyses of Light in August are more limited than the other novels because I primarily listened to it on audiobook, and it's more difficult for me to retain material that I did not read. That said, the topics for this novel were the most difficult for me to begin wrapping my mind around. What I note in Topic 41 is that it seems to be domestic and solemn; that is, is a sense of being inside the home (e.g., entered, house, child, bed, door, room), darkness, quiet, and emptiness. Terms that figured prominently in the first iteration like "face," "thinking," "believe," and "watching" appear in this topic. In A Light in August, there is a lot of introspection and watching/being watched, in which the face is common. In Topic 5, there are a lot of negative terms (e.g., ain't, don't, didn't, wasn't, couldn't), which I find fascinating. Why is that? I'm interested in going through the text more to explore this air of negativity. Also, in contrast to Topic 41, Topic 5 seems to be more outward/community-focused, and Topic 25 appears to relate to that which is outside and, perhaps, on the move. It's interesting how these more prominent topics cover different kinds of physical environments. Beginning with Topic 2, the topics become less dispersed across the text and more easily attributed to the specific chapters to which they pertain. Topic 2 seems to be concerned with the past, genealogy, and haunting, in addition to the slavery and the American Civil War; it's a very dark chapter. Topic 15 the culture around working at the mill. So on and so forth with connections to particular chapters. Perhaps upon further inspection (i.e., revisiting those chapters), I may be surprised by what I found/find those chpaters to be concerned with and what the MALLET topic model reflects. Unfortunately, I didn't realize that my notes in the PDF copy of the novel were carried over into the file I ran through MALLET, so Topic 38 includes content from those comments; I will discount that topic for now, but perhaps it's an indication of points in the novel that were of particularly interest to me.



Native Son


21 1.00336 eyes room felt face turned hand hands black didn't looked boy head feeling leave side moment open opened answer things

35 0.19116 bigger white don't wanted bessie naw bed i'm ain't knew stood body yeah i'll asked money felt round floor feel

8 0.15518 boy life today crime act guilt boy's fear lives girl honor state form civilization nation social understand deal newspapers land

32 0.13385 negro negroes room white family dalton evidence policemen cross front coroner rape caught led table death public back witnesses rose

6 0.07708 vera buddy mother snow roof gun paper water rat empty flat negro trapdoor yuh curtain police bread alley shoot icy

1 0.06909 dalton bigger jan mary britten mrs car trunk door night furnace peggy yessuh yessum told girl asked miss dalton's kitchen

24 0.05125 max bigger court life thomas men feel honor hate judge guilty die people buckley kill murder black feeling make law

39 0.041 bigger gus jack g.h job asked doc scared laughed blum's sun movie rob shut gus's table tone plane poolroom world

4 0.03708 bigger jan dalton son boy preacher mrs buckley death killed yuh hands kill words thing mother hate talk inquest earth


  Interestingly, Topic 21 is strongly represented throughout Native Son and Invisible Man. Given my interest in affect, I am happy to see that "felt" and "feeling" are such significant words. I am surprised and intrigued by the prominence of the word "eyes." I will be looking into the usage of this word in context; perhaps the same with "hand(S,)" which I recall being a big word in a topic in A Light in August. Topic 8 is amazing to me. I correctly surmised that this topic relates to the end of the novel, in which the implications of interracial sexual relations (collapsed into the rape of white women by black men) on the state of the American nation and its society culminates in discussions of Bigger's murder and supposed rape of Mary Dalton in the newspapers and the court trials. The terms in this topic wonderfully encapsulate the stakes, states, and sentiments of this epic issue (e.g., guilt, fear, honor). Similarly, Topic 24 is exciting to me because it involves affect in relation to interracial violence, law, and society in the U.S. The issue of the (alleged) rape and the murder itself is played out in Topic 32. Topic 6 eluding me initially; there is a correlation between the opening scene in which Bigger is home with his immediate family members and the rat appears and section in which he is on the run from the police; I suppose this might be because he is running through homes in his neighborhood or communities like his. Topic 1 is clearly related to Bigger's presence in a "white" space and his murder of a white girl, whose body he burned in a furnace in an attempt to destroy the evidence. However, I find it interesting how prominently his "yessuh" and "yessum" figure in this topic, and right next to each other in frequency--a sign of his subordinate position in this white space, but it also reminds me of Invisible Man's concept of "yessing them [whites] to death" in their spaces in order to fool them by taking advantage of their expectation of deference and inferiority in this era.



Invisible Man


47 0.39553 suddenly glass beneath eye laughed hot lost hadn't great isn't water red close part angry idea history smile large hearing

19 0.3802 street called high side corner shot running run let's coming hell night damn block window houses gun ain't heavy big

30 0.301 man men white crowd watched give suddenly voice building hell yelled chairs forward fellow blue ahead past shouted action man's

31 0.14646 sir college bledsoe letter job school emerson southern north suddenly morning letters return campus south case york power address damn

18 0.08436 invisible i've hole audience world invisibility power loved death possibility principle mind freedom point blind human arena applause music feel

33 0.08681 pain hospital machine glasses factory vague nurse organ doctor primitive mother struggle personality hidden sense vision prepared appeared focused rabbit

28 0.08309 brother jack brotherhood brothers i'd people thought committee dark clifton movement discipline call tarp work wrestrum kind we're tobitt harlem

26 0.07078 boys rug ring told blows speech battle boy fighting floor gold fought blow royal bell dollars big social knocked deliver

44 0.06466 founder bledsoe great words leader chapel days young platform knew friends hear barbee guests campus slavery grass recall story land

37 0.05056 sir i'm sees git bout kate school gits young white eyes road matty gal caint lou somethin tryin ole trueblood

36 0.04274 clifton crowd cop clifton's doll tod bar barrelhouse curb forward moving dolls i'd faces beer dead sun sambo park committee

27 0.0343 norton white sir vet golden he's halley crenshaw day norton's school car supercargo stairs folk bring doctor fat young balcony

12 0.02729 black mahn ras clifton exhorter dark knife red leaders community sign women ideology young shit organization district we'll plan crazy

43 0.02652 called snow we're cake wise crowd steps speech law-abiding dispossessed yam eighty-seven eviction yams they're longer couple bible junk stuff

34 0.00779 case ras scofield dark moved dupre ole git started coal store thought spear hoss goddam holding huge oil water hang

49 0.00771 paint kimbro brockway union fink brother git sir brothers chairman lucius buckets damn that's job floor gauges tank macduffy plant


  There were a surpising number of topics that were most relevant to Invisible Man! Way more compared to the others. Anyway, it's kind of funny how dramatic this novel is compared to the others, with the frequent use of words like "suddenly" (and exclamation marks). But, more importantly for my research, Invisibe Man is chock full with feeling and emotion/emoting, with words such as "laughed," "angry," and "smile" being some of the most common words and a part of the most prevanlent topic in the novel, according to this run in MALLET. I was also surprised by how strong Topic 19 is, though it is mostly present in onle one chapter, of which I thought immediately when I read the terms in the topic; it is the penultimate chaper of the novel, in which the protagonist is running for his life in the night's streets of New York; it is, indeed, like a hell into which he's descended, as though in Dante's Inferno. From then on, the topics start to become more intuitive to interpret and associate with particular scenes of the novel. A significant exception is in Topic 18, which connects the opening "Battle Royale" scene, the chapter in which the protagonist is about to give his first big speech as a member of the Brotherhood, and the end of the novel when he is coming out of his hole and reconsidering his "invisibility"; I would have never made the connection myself, but I am anxious to look deeply into these three chapters in conjunction to see what it might reveal. What I perceive now is the relation between invisibility and performing before the gaze of a white audience. Another thing I find surprising and, thus, interesting is the co-occurence of "law-abiding" and "yam(s)" in Topic 48, as "yams" figures as a motif of subversion in the novel. I wonder if this is a reflection of the theme of feigned submission as subversion in the novel; I am very curious to explore this topic further by revisiting the related sections of the novel and juxtaposing them.



To Kill a Mockingbird


0.12372 jem atticus miss scout maycomb asked finch house maudie front children jem's calpurnia yard radley porch head county sir home

16 0.08258 aunt alexandra mrs uncle jack francis aunty dubose family bed ladies merriweather sat jean cousin landing sister rose afternoon louise

13 0.07478 judge atticus ewell taylor mayella tom didn't gilmer jury robinson court witness suh dill finch asked reverend courtroom chair tate

17 0.06363 dill radley boo dill's place summer play doors simon pole gate breath radley's tire till outa sundays fence radleys harris

45 0.06309 church town sunday calpurnia people wife pulpit reverend street negro congregation sykes cal minister women folks jefferson zeebo study ladies

46 0.04839 tate heck cecil mrs ewell dead costume road bob johnson tim link aunt stage thing alexandra boo ground merriweather reynolds


  The topics that are pertinent to To Kill a Mockingbird heavily center around characters (and their relations between each other) and setting (time and place). It is also fairly easy (for me) to connect topics with particular scenes or sections of the text. For example, without checking the composition. I can tell that Topic 13 Drawbacks from sections towards the latter half of the novel with court scenes, and Topic 45 pulls from the scene in which Scout and Jem accompany their black housekeeper, Calpurina, to service at her church congregation. Outside of placing the when, where, and who, it's difficult to derives themes from the topics in the novel. Running analyses on this text alone, without the other three, produces similar results, although there is a bit more nuance and the addition of other types of words that can provide more variation in thematic substance. In future analyses of this novel, I suspect it might be very helpful to create stopwords to remove all of the proper names and then compare the results.

Next Section

Conclusion


  I found the topics related to specific texts to be much richer and enlightening than the few related to all of the texts. Moving forward, I will go beyond merely reflecting upon the topics and pursue actually ascribing themes/concepts to them. However, with this particular corpus, I want to segment the texts by paragraphs, continue experimenting with the API settings, and otherwise fine-tune the MALLET analysis before settling with a topic model to work with. Additionally, I would like to compare different approaches to analyzing the text, such as running analysis on a single novel at a time, as opposed to all at once. I decided to analyze all at once because I am intersted in automatically capturing the similarities between them using MALLET. If I were to analyze the singularly, and then compare the topic models, then I would have to make that connection myself. In any case, it will be interesting to compare and contrast the varying results, outside of the fact that no two MALLET "runs" are the same.

Next Section

Project Files


Corpus Topic Keys: corpus-17_keys.txt
Corpus Topic Composition (.csv): corpus-17_compostion.csv
Corpus Topic Composition (.xlsx): corpus-17_compostion.xlsx
Corpus Topic Terms: corpus-17-topic-state-cleaned.txt

Contact


Tyrica Terry
University of Pittsburgh
PhD Candidate in English
MLIS Student

tyt3@pitt.edu