One would be hard-pressed to find a more explosively controversial subject within literary scholarship than the invocation of the Western Literary Canon, no matter the context. A single mention of its mere existence, let alone its contents, is enough to ignite a firestorm of criticism from all sides, ultimately (usually) ending only when one side shouts more loudly or more robustly than the other, and both sides, failing to achieve an equilibrium in which they agree to disagree, retire irritably to their respective positions, ready again to take up arms in the name of their version of the list.
The Western Literary Canon, also known under the noms de plume of "The Literary Canon," "The Canon," or "Canonical Literature," or some variation thereof, is widely acknowledged to be a list that contains texts important to Western Literature or culture. This fact, however, is just about where the consensus ends. The most vocal, and diverse, debate that surrounds the canon concerns its purported contents. Which authors belong? Which do not? Who's had their time in the sun, and should be shuffled off the list to make space for other authors and texts? Does an author belong more than once, if he or she had more than one important text? What about texts that were translated to English, but originally written in another language? Do they count? If so, should they equally count? Other debate surrounds the criteria that comprises the qualifications for the list's inclusion. Is this a list of the best literature? The most influential? The best-selling or most popular? Does a recent adaptation place a work on the list, or no? How about an older one? Still others debate about the uses of the list, once it is comprised. Should this be a guide for lifetime reading? What about a purchasing guide for universities? Perhaps a pedagogical map to literary studies? All, or none, of the above? Others still debate the existence of the canon in its totality. Why do we need a canon? Shouldn't we just read what we enjoy, or what is good? Who, ultimately, is responsible for the contents of the canon, and why can't it simply be a personal choice? Should we have a canon for each genre? Each format? Each decade or century? Each demographic? How many divisions is enough, how many is too many, and who, ultimately, draws those lines?
It is unlikely that the answers to any of these questions will ever be forthcoming, at least not in a format that is in any way satisfactory to those asking them. The reason the canon is such a spark-igniting touchstone stems from its existence as a subjective list with pragmatic implications; pragmatic implications, by the way, that have significant monetary, pedagogical, and cultural impact. If the list itself was merely a subjective "best of" list, not at all dissimilar to any number of other such lists ("The Best Video Games of all Time," 1000 Places to Visit Before You Die, etc.), then the furor that surrounded it would likely not be quite so fierce. Everyone, after all, is entitled to his or her opinion, and it is much easier to reach equilibrium of opinion when nothing is necessarily on the line.
This, however, is not the only space the canon occupies. Deservedly or not, the canon is used, primarily through the mechanisms of tradition, to govern the literary reading lists we all know so intimately. Millions of dollars are spent by textbook companies acquiring liscences and publication rights to texts. Millions more are distributed through universities in the form of purchasing lists for Composition I textbooks, a requirement at most every liberal studies university, and those lists often contain texts from the canon. Syllabi are written, lesson plans are designed, classroom activities are constructed, and, ultimately, minds, culture, and tradition are shaped due to the influence of canonical texts. Because of this rather extreme pragmatic impact, one would believe it would behoove us to identify the most critically important texts and, for lack of a better phrase, "get it right." When everything from departmental budgets to student enrollment to the literal future of culture is on the line, it seems highly suspect that the selection of canonical texts is left to the devices of whomever elects to draft a new list. With no institution, academic or governmental, affirming control over the list as such, subjectivity reigns, and the arguments continue unabated.
This is a good place to note that this project is not, in any way, attempting to argue that such an authority exist. Alan Moore reminds us to take care of "Who watches the Watchmen," or, in this case, "Who watches the Canon?" Nor, by the way, is this project any attempt to assert a final word, an ultimate authority on the list itself or its contents. I readily recognize that such a position is equal parts fruitless and foolhardy, as, in addition to angering anyone that might happen to browse this project, would also serve only to be a hypocritical position at odds with my own position: namely, that the debate that surrounds the canon is not reductive but rather highly generative. My pedagogical philosophy has always revolved around the idea that the free exchange of ideas and concepts, in addition to, of course, rigorous scientific discussion, research, and discourse, is of the utmost importance. "The text is not always right," I am fond of telling my (often aghast) undergraduate students. "Feel free to disagree, but make sure you tell me why. Find me some proof that the text is wrong, and I'll change my mind." It is to this end, then, that this project was envisioned: not as the "last, final, correct canon," but rather an analysis of that all-important debate that surrounds it.
The existence of the canon itself is a necessity, though many who have unique perspectives of literature, and literary history, might argue otherwise. The pragmatic problem that the existence of the Canon purports to solve (and, I would argue, does an excellent job of doing so), is to address the questions of "What's worth analyzing?", "What's worth studying?", and ultimately, "What's worth reading?" We must ask these questions because in the course of literature itself we have written far more texts than one individual may ever read in his or her lifetime. Author and mathematician Randall Munroe notes in an article in his "What If?" web series (titled "Reading Every Book") that "...the total number of books in English probably passed the lifetime reading limit sometime in the late 1500s" (par. 14). Even if we assume his mathematical estimate is accurate (and, as he freely admits, "Getting accurate counts of the number of extant books at different times in history is very hard bordering on impossible" (par. 1)), we nevertheless note that "reading everything" is an absolute impossibility. We do have an estimate, however, courtesy of software engineer Leonid Taycher, a member of the Google Books team, who, in 2010, pinned the number at 129,864,880 unique texts (par. 1) (though over a decade old at this point, this is the best estimate we currently have, and will be used for some calculations later in the project). Therefore, we group works into many different genres to narrow and organize those texts into managable portions, starting with perhaps the most important categorization, and certainly the most fundamental: "Is it worth reading?" Texts to which we say "yes" find themselves frequently occupying space on the Canon, while texts in which that answer is "no" usually occupy a lower rung on the literary social order: everything from Harlequin Romance to airport novels, for example.
The contents of the Canon, while freely debated, are often added to an individual's list because of personal criteria known only to that individual. Perhaps a text was particularly memorable in their youth. Perhaps it was a family member's favorite, or contains some kind of close personal connection. Most often, however, while building this project, the reasoning behind why a text appeared on the canon was little more than tautology. A text appeared on the list because it was "good," and it was "good" because it appears on the list. Other, far less useless criteria seemed to center around tradition, namely, that scholars read what they have always read, and what they have always read was texts from the canon (because that's what they were taught). I'm sure you see the Venn diagram here, wherein one sphere is "tautological fallacy" and the other is "vicious cycle," and the intersection between the two seems to be "The Canon." The only key deciding factor on textual (or authorial) inclusion on any particular list seemed to be something of a consensus. If other lists included it, more lists were likely to include it. This was a reinforcement of the tautological phenomenon, surely, but it also served to remind me that the Canon itself, by its very nature, was little more than a popularity contest. As rates of adoption increased, so did word of mouth, and texts began to appear more readily and more commonly, which only served to further raise their profile, and so on.
This led me to a rather orthodox, but perhaps rarely investigated inquiry, which ultimately became the basis for this project:
"Since the Canon is ultimately a popularity contest," I asked, "What's most popular?"
With this research question in mind, I set about designing my project. As a digital humanities scholar and professor by trade (but also one trained in literary criticism), I quickly surmised that the only way to find an accurate picture of the debate that surrounded the criteria of "most popular" would be heaps of research data. I couldn't, for example, rely on a single source. After all, if the objective of the project was to take a "snapshot" of the debate, then I couldn't very well rely on a single voice within that debate. Further, I couldn't rely on something like sales data, or download statistics from free websites, or syllabus data from other DH projects like opensyllabus.org (though these sources eventually became just a few of the many sources I queried for my ultimate finished project). I knew that I would need an exhaustive scraping of data to possibly accomodate a realistic picture of the debate. Further, I knew that the debate, ever-shifting as it is, was likely to change in only months. Therefore, I needed my data quickly. With these criteria in mind, I began to harvest information.
The techniques I used were basic, at first. I wrote a custom scraper tool in Python that would scour manually entered literary websites (like goodreads.com, for example) and search for texts. This quickly proved both unnecessary and impossible. In addition to retrieving results from every text, rather than the "canonical" ones I was searching for, the parameters for how individuals defined these texts were just as nebulous, making it next to impossible to program guidelines for my software with any kind of reliability. Frustrated, I scrapped that software and began anew, this time manually. I knew that the subjectivity that was inherent in building various canonical lists would likely be necessary to analyze them, and so I began to Google. And Google. And. Google. I searched university reading lists, knowing that inclusion there meant a resounding "Yes" in answer to "What's worth teaching?" I looked at aggregate reading sites (like the aforementioned goodreads.com). I culled data from the publication lists of textbook publishers, anthologizers, and reprinters alike (Penguin Classics, Norton, etc.). I pulled data from other DH projects, and I even began to grab statistics from various individual blog posts and published texts alike in search of anyone's and everyone's idea of the Canon. In the end, I procured over two hundred individual lists, many of which included data from millions of users. It is impossible to give a final total of sources queried, but some, like opensyllabus.org, or the "most downloaded" list from ProjectGutenberg, readily counted in the millions and hundreds of thousands, respectively. (Note that because of the extremely varied number of users per source, as well as an unknown number of users giving input on each source, the ultimate data was scored on a log scale, designed to demonstrate not specific number of recommendations, but rather the relationship amongst qualifying texts).
Now that I had my data, the next step was to clean it. Easily the most time-consuming step in the process, this required me to examine each individual text and make corrections as necessary, categorizing and sorting information accordingly. There were myriad unforeseen problems which I began to notice. For example, I wanted to group texts on the list into Fiction and Nonfiction and see which was more prevalent. What category, then, should I group the King James Bible? That answer, of course, depends on your theologic viewpoint. Was that same text verse or prose? What about fictional autobiographies that stem from an author's personal experiences? One would be hard-pressed, for example, to call Frederick Douglass anything but nonfiction, but what about Solzhenitsyn, or Philip K Dick, or Elie Wiesel? What about texts that were originally published periodically (like Dickens); in what year should I place their publication? What about ancient texts, wherein we have only a general idea of their publication date (if you can even call it publication)? When was The Epic of Gilgamesh published, for example? What about when one list included a part of another text, but not the entire entry? (This seemed to be a popular technique for Dante, Whitman, and Carroll, in which "Inferno" was often included sans the remainder of The Divine Comedy, Whitman's Leaves of Grass was often reduced only to "Song of Myself," and Carroll saw success with both Alice in Wonderland and his sequel poem from Through the Looking-Glass, "Jabberwocky"). Should The Divine Comedy, then, get a rec each time only part of it was mentioned? Should I create an additional entry for only part of one text?
As I worked, these problems began to become more and more clear. I had to make editorial decisions on many fronts, and although I eventually created a rough system that I believe was effective in organizing my texts, I will freely admit that most of the data I was interested in sorting around was either impossible to find or subjective in nature. In the end, I believe that the data I have uncovered and recorded here is accurate, though I will be the first to say that it is, at best, an accurate estimate. This is far more empirical, however, than anything we've ever had before.
My next undertaking was to visualize the data itself. I knew that a project like this one would be far less effective as a mere essay; I wanted readers to visualize and recognize the data along with me. I began, then, to graph. Finding the automated chart and graphing systems on Numbers and Excel (where my data spreadsheets were cleaned and sorted) inadequate, I used Adobe Illustrator, alongside quite a bit of math, to present the information in a more accurate and easily digestible way. I made a chart for every parameter I could think of, beginning with the ones I had cleaned for and ultimately moving toward the ones which interested me the most. These charts are presented herein within the body of the website. You'll note, in some cases, chart errors (like percentages that don't add to precisely 100%, for example). These are almost always the result of rounding errors, though some are the result of cleaning ambiguity. How do you choose a nationality for an author, for example, that hails from an extinct kingdom? Should I use modern national boundaries or ancient ones? What about authors that claim more than one nationality? Are they the nationality of their birth, or their naturalization? ...and so on.
I couldn't, however, call the visualization complete until it was unified on a platform through which I could share my results with everyone. To that end, I wrote the html and css (and designed the highlight graphics) of this very website you see. I decided that a website was the medium of choice for presenting such multimodal data, and, dissatisfied with the level of control offered by most web templates, created this one from scratch.
Within this project, which I have titled (canon) aggregate, I have attempted to take a snapshot of the debate that surrounds the Western Literary Canon and what should be included amongst its hallowed (and influential) ranks. This Aggregate Canon (AC) is an attempt not to be an authoritative final word on textual inclusion or quality, but rather to provide empirical data concerning the most valued texts within scholarship and casual readership today. Ultimately, I hope it allows us to find forgotten and lost authors that should perhaps be represented by the canon, or to give due credit to texts that belong...no matter which ones you personally believe they are.
We begin our data and analysis with a summary of the highest-scoring texts on the AC. When I began this project, I confess that my first curiosity was to discover who resided at the top of the proverbial literary mountain. This chapter will answer that question, as well as discuss the highest-scoring finishers on the AC in general. Further chapters go into more detail about statistical findings like authorship, demographics, genre, and format. Before we get to the results, however, we must briefly visit the methodology used to rank and score our various entries.
The AC is arranged according to number of "recs," or recommendations, a text received. Each time a text recieves a rec, this is indicative of an online list on which it appeared. Some sourced canonical lists were ranked (for example, "The Top 100 Books of all Time" was ranked 100-1) and some were not, and were instead merely large groups of works. If a source list was ranked, that ranking was discarded for the purposes of establishing the AC; a text's presence, at any position on any list, is sufficient to earn a rec, and each and every appearance matters. In many cases, if an author was named, but not a specific text, each of that author's texts already on the AC recieved an additional rec (a common occurrence with Shakespeare). Recs are sorted into two types: University (UR) and Generic (GR). University recs were culled from university reading lists, university websites, and syllabi. Generic recs are lists created by hobbyist and enthusiast readers, or were found in other published material (like books and blogs, as well as the publication lists of reprinters and anthologizers); in other words, any non-university linked source. These scores were added together to create a combined rec score (CR), and it is according to this number that the bulk of the AC is ranked. Tied CR scores were left intentionally unresolved, and therefore any text with the same CR may be considered to be at the same position as another. This is not especially a problem at the highest levels, but near the bottom of the AC, in some cases, over a hundred texts recieved a single GR or UR apiece. All of these texts may be considered "of equal value" to the AC, despite the ordering system arranging them into a hierarchy.
As the full AC includes 532 individual works, which may be arranged and grouped according to any number of criteria, I consider a list of the top 50 texts on the canon something of a "Mt. Rushmore" of literature - these texts were well represented enough to warrant near universal inclusion. It should be noted, however, that even amongst these top texts there is wide variation (text 50 scored 26 CR, while text 1 scored 60 CR), meaning our number one overall text was over twice as popular as the number fifty. After fifty, our scores drop more slowly, until we reach the vast spaces in the 300 and 400 rankings, where each text recieved 3, 2, or only a 1 CR. In other words, we may better rank the AC into tiers, like so:
Perhaps the most amazing statistic that is represented by this graph is the sheer exclusivity of the canon itself. As previously stated, we have a (relatively dated, circa 2010) approximate estimate of published books at nearly 130 million (Taycher). According to publishing statistics given by Forbes, nearly 2.2 million books are published every year in the US alone (circa 2016). This means that if we were to extrapolate that 2.2 million figure (and assume the worldwide publishing industry is even 150% that found in the US), and then add that publishing figure to our 130M baseline, then by 2022 there are likely closer to 150 million unique texts. Because of the speculative nature of this estimate, however, this study uses Taycher's ~130M figure as our probable baseline. With only 532 texts on the AC receiving any votes whatsoever, this means that 532 / 130,000,000 texts are exalted enough to reach any type of "canonical" status. This puts the percent of books in the canon at 532 in 130,000,000, or 4.09e-6. To put that number into perspective, if you were to publish a book tomorrow, the odds it would be entered into the canon are roughly twice as poor as being killed by an asteroid impact (1 in ~75 million) and about twice as good as winning the Powerball jackpot (1 in ~292 million).
Because of this extreme exclusivity, any text that appears on the AC in any manner, no matter how small its representation, should be accorded its just due. Note that our 130M figure includes significantly more nonfiction texts than fiction texts, a trend which is completely reversed on the AC (more information on format may be found in Chapter Four). As you can see represented in our Tier chart, texts that recieve multiple recs are orders of magnitude more rare than those that recieve only a few. By the time we reach Tier A, or "Universal Canonical" status, the figure falls to only 71 texts. Reaching this status with a text has a scant 5.46e-7 chance, or about seven times poorer than merely reaching the canon at all.
When we isolate our evaluation to only this top tier (Tier A, "Universally Canonical"), some interesting patterns of data emerge. Each of these statistics are explored more fully in their relative chapters, though I'm sure you're likely interested, like I was, in which texts are the "best of the best." It's no surprise, for example, that Shakespeare appears all over the top texts (and the AC in total - more on him later). He boasts ten separate texts within Tier A, (Hamlet, Romeo and Juliet, The Tempest, Macbeth, A Midsummer Night's Dream, Othello, King Lear, The Merchant of Venice, Twelfth Night, and his Sonnets, with Julius Caesar just missing the cut by a single CR), besting Jane Austen (3 texts in Tier A, Sense and Sensibility, Pride and Prejudice, Emma), James Joyce (2 texts in Tier A (Ulysses, A Portrait of the Artist as a Young Man)) and George Orwell (2 texts in Tier A (Animal Farm and 1984)). More information on Tier A may be found in the chart below, and further on in this chapter you'll find the Top 90 texts on the AC visualized as a periodic table.
Above, you'll find the highest-scoring texts on the AC visualized, according to a modified and customized periodic table. This chart was the most concise visualization I could build, with each cell containing multitudes of data from each AC entry at the top of the list. Each "element" is a text title. In the upper left hand corner is the author's name. The upper right is the year of publication. The bottom-left corner indicates the text's position on the total chart (all genres, types, and formats included). The number on the bottom right indicates the CR a text recieved. Additionally, the texts are colored according to their format, the key to which may be found at the bottom of the chart.
It is important to note that although this graph is ordered as a ranking system, there are numerous ties and little deviation between entrants. Only 3CR, for example, separates The Tempest (10 overall, 44CR) from To Kill A Mockingbird (18 overall, 41CR). This means, for example, that there is almost as much deviation between numbers one and two on the list (2CR) as between numbers 10 and 18. The further down the AC we go, the more pronounced this counting becomes. As the AC in total houses 535 texts, almost half of them scored only a single CR or two CR, meaning that, for all intents and purposes, there is no difference in popularity between a text at position 300 and one at position 500 (the majority of Tier C, for example). Any text on this graph (and several that just missed the cut) may be considered highly popular for purposes of this project, and no outsized attention should be given to minute rankings amongst texts, especially amongst those that have tied CR scores. Note that this data did not break ties in any fashion, so one may consider a text like Nabokov's Lolita (70 overall, 22CR) in exactly the same position as Eliot's Middlemarch (66 overall, also 22CR).
As you can see demonstrated by the number of blue squares on this chart, the novel and the novella reign as the supreme format at the top of the canon. This data is elaborated on further in Chapter Four, though even at the top of the list our preference for novels is well established. Some texts, like Chaucer, Homer, Voltaire, Thoreau, and Plato, could also be reasonably classified as novels given current publishing trends, further extending the format's dominance. This result was surprising to me, considering the form has been around in its modern incarnation for less than 200-400 years (beginning, debatably, with either Don Quixote (considered by many to be the original novel) or Robinson Crusoe (considered by many to be the first English-language novel). The dominance of the format makes Shakespeare's repeated appearances on the AC even more impressive, considering the Bard wrote precisely zero works of the most popular format of all time. Don Quixote, for its part, appears at number 47 (26CR), while Robinson Crusoe clocks in at number sixty-one (23CR).
At number 20, Jonathan Swift's Gulliver's Travels (38CR) is the highest-scoring satire, though far from the only one to appear, either from Swift or from other satirists. Harper Lee's To Kill a Mockingbird (18 overall, 41CR) was easily the highest-scoring text to not score a single UR, collecting every single one of its recs from generic sources. This statistic astounded me, until I realized that the text's reading level was likely closer to that preferred by secondary education, and as such was not likely to appear on numerous university reading lists, despite being recommended by so many. The text had one of the highest GR on the entire AC (while tied with many texts for the lowest UR). Shelley's Frankenstein (13 overall, 42CR) was, according to opensyllabus.org, the most assigned text on all gathered syllabi. Additionally, it was the most downloaded text on Project Gutenberg's US affiliate. Despite this incredible showing, however, it didn't quite make the top ten, due to a rather poor GR showing. Other texts in the top twenty (Twain's The Adventures of Huckleberry Finn (14 overall, 42CR), Salinger's The Catcher in the Rye (15 overall, 42CR), Melville's Moby Dick (17 overall, 41CR), and Hawthorne's The Scarlet Letter (12 overall, 43CR) encountered the same problem: popular amongst universities, less so amongst generic lists.
The highest-scoring children's text is, unsurprisingly, Carroll's Alice's Adventures in Wonderland (38 overall, 29CR), making it more popular than the Bible, but less so than Stoker's Dracula (33 overall, 30CR). In addition to Shakespeare, the authors to have the most texts on the AC in total were Aristotle (8) and Charles Dickens (8). Dickens' highest-rated text, A Tale of Two Cities (43 overall, 27CR) makes an appearance here, as does his second-highest rated text, David Copperfield (88 overall, 18CR). Aristotle, on the other hand, was an absolute enigma. Despite being tied for the second-most texts to appear on the entire AC, not a single one of his publications reached more than 8 CR (highest scoring text: Poetics, 207 overall, 8CR) Diverse, but unpopular? Unpopular, but forced? Further discussion of Aristotle's strange metrics may be found in Chapter Three.
The Shakespeare Problem
As you might have guessed from this chart, Shakespeare is going to be a problem - a massive statistical outlier which dwarfs all other authors on this list. It was solely due to his presence and dominance that I briefly considered scoring the data logarithmically, as he was so far above other authors it was astounding. Ultimately, however, despite his great lead, I elected to include his works as normal, and discuss him here for the crushing presence that he is. Statistically speaking, his texts absolutely dominate the AC in every concievable way. In addition to having the most texts on the AC (29, with the next closest, a two-way tie between Aristotle and Dickens, appearing with only 8 apiece), Shakespeare owns the top seven dramas, and accounts for three of the top ten (30%), five of the top 20 (25%), and ten of the top one hundred (10%). These numbers are absolutely outrageous. To put this into perspective, according to these figures, ten percent of the best, most influential, most important texts in the history of the English language all came from a single author and within a thirty year period. His influence on this list cannot be overstated, and although Austen reigns supreme with the number one text, Shakespeare is the undisputed (and likely unsurprising) king of the AC. If even a few of his 29 entries (literally six times more than Austen's five entries) to the AC were removed, the removal of the "splitting of the vote" would likely result in a top 10 of Shakespeare only, or at least something similar.
In addition to Shakespeare's texts being incredibly popular at the top of the list, as seen here, further down the list (around position 150, with ~12CR) we encounter an altogether similar problem. A majority of canonical lists used as sources, (at least a dozen), named "Shakespeare" only as an author, rather than by specific text. Because of my methodology, which granted a GR or UR to each of an author's works if the author was mentioned only by name, this resulted in a veritable data "block" of Shakespearean text, including most of his lesser-known plays (Titus Andronicus, Timon of Athens, Richard III, or his lowest-scoring work, The Two Gentlemen of Verona, for example) scoring quite highly.
Historically speaking, Shakespeare's dominance is fascinating as it seems to "kick off" the modern era of English literature. Speculation may abound as to whether or not Shakespeare was truly ground breaking, or whether or not he simply happened to be the best writer in a time when interest in the art was growing faster than ever before, but it is interesting to note that almost none of Shakespeare's contemporaries have more than a single text or two represented on the AC. This includes luminaries like Edmund Spenser (one entry, The Faerie Queen, 124 overall, 14CR); Ben Jonson (one entry, The Alchemist, 372 overall, 3CR) and Philip Sydney (no entries). Shakespeare's texts themselves seemed to divide into an algorithm of sorts on their own, seeing his tragedies consistently rise to the top, followed by his comedies, and finally, his histories, with a few exceptions. Hamlet reigns supreme as Shakespeare's highest scoring text (7 overall, 50CR), with A Midsummer Night's Dream the highest-scoring comedy (21 overall, 38CR), and Henry IV the highest-scoring history (141 overall, 13CR). Shakespeare's only non-drama entry, his collected sonnets, finished at 56 overall (23 CR). Additionally, we can see the distaste for what is classically considered Shakespeare's least-liked work, Titus Andronicus, which, despite being the ever-popular Shakespearean tragedy, is able to defeat only three other works (The Comedy of Errors, Richard III, and The Two Gentlemen of Verona, each with 12CR apiece and firmly entrenched at the bottom of Tier B. It is worth noting that no Shakespearean text was lower than Tier B on the AC in total.
Further information on date of publication may be found in Chapter Two, but one may be tempted to guess that Shakespeare's dominance might be due to a vast lack of other texts found in his era, a problem similar to what sees Dante, Chaucer, and, to a lesser extent, Sophocles, Aeschylus, Euripides, and Aristophanes rise so highly. As noted previously however, many of Shakespeare's contemporaries saw little traction on the AC (if any at all), and Shakespeare's works outstrip even those mentioned here by a (very) wide margin. Know that for the rest of this analysis, Shakespeare will always need to be taken into account as the extreme outlier that he is, and many provided calculations will set aside his works to get a better picture of the rest of the AC, with an acknowledgment that his drama reigns supreme.
Finally, note that Shakespeare's texts were listed according to publication and performance. Many of his texts were first published posthumously with the first folio (~1624), though many of his works had relative publication dates listed as "date of first performance." These dates were used where available, meaning Shakespeare's era, which spanned the end of one century and the beginning of the next, makes him the most popular author in two separate centuries. Until the 17th century, Shakespeare alone would have accounted for over half of all works on the canon itself, an absolutely astounding figure. If separate canons were eventually called for in which divisions were made for genre, time period, format, and so on, Shakespeare would command a canon of his own, one that outstripped all others in terms of popularity, importance, and (perhaps) even quality. More information on authors in general, and Shakespeare in particular, may be found in Chapter Three.
Conclusion
It is important, at this point, to note once again that a text's inclusion or rank on the AC does not necessarily constitute importance or even quality. People read what they want to read, and this is exactly as it should be. No authoritative source is responsible for informing you, scholars and hobbyists alike, as to what is critically important and what is not. The AC's goal, as previously stated, is to provide a "snapshot of the conversation," so educators, textbook manufacturers, and other indidividuals responsible for describing canonical literature can select traditional or untraditional texts; to allow something which has previously been hidden to shine; to add some emphasis to a text or author that has been recognized (but otherwise glossed-over); to identify texts he or she might enjoy that are similar to texts already engaged at a high level of scholarship. For publication purposes, this project can help scholars identify the ubiquitous "holes in scholarship" that permit us to examine texts, authors, genres, and time periods otherwise ignored. At the end of the day, however, as I said in the introduction (and will repeat several times throughout the body of this project) the Canon is nothing more than a popularity contest, and everyone's answer to "what's the best?" will be subjectively different...and that's a good thing.