Some Notes on Book Hierarchies and Genre Classification

At the moment, Katie and I are working our way through one of the major normalisation tasks we need to complete before we can release the beta version of the database to our testers.  This involves going through a (quite intimidating) spreadsheet detailing the 13,000 editions currently in our system and creating works-level records and assigning genres.

A works-level record is a means of grouping together all versions of a text that might be considered essentially equivalent.  Our system uses a hierarchical structure to organise books, consisting of four levels: works, editions, holdings and volumes:

  • A work is a text considered as a conceptual object or whole (such as Adam Smith’s Inquiry into the Nature and Causes of the Wealth of Nations).
  • An edition is a particular physical printing of a work (such as the 1776 quarto edition of Adam Smith’s An Inquiry into the Nature and Causes of the Wealth of Nations printed in London for William Strahan, Thomas Cadell and William Creech).
  • A holding is a copy of an edition (or in some cases multiple editions, as when pamphlets are bound together) held at one of our libraries (such the copy of the 1776 quarto edition of Adam Smith’s Wealth of Nations held by the Advocates Library).
  • A volume is a part of a holding (such as volume 2 of the Advocates Library’s copy of the 1776 Wealth of Nations).

It would have been possible to omit works-level records from our system and only give editions, but that would have made certain kinds of enquiry a lot more difficult.  By adding works-level records as umbrellas, people will, for example, be able easily to move between all borrowings of Homer’s Odyssey and looking specifically at the edition prepared by Alexander Pope.  For different kinds of enquiry, different levels of our books hierarchy will be helpful – by providing this level of nuance, we both make the database more flexible and acknowledge that while some users will be thinking mainly in textual terms, for others, the precise editions or copies that our historical readers borrowed will be of considerable interest.

In assigning genres to works, we are employing a taxonomy we have been slowly refining.  Kit, who has led on this aspect of the project, discusses the early stages here.  Our genre categories at present are as follows:

  • Sermons
  • Theology
  • Law
  • Medicine
  • Natural Philosophy
  • Philosophy & Morality
  • Mathematics
  • History
  • Politics, Society & Political Economy
  • Lives
  • Education
  • Travel
  • Drama
  • Poetry
  • Fiction
  • Belles Lettres
  • Practical Arts/Useful Knowledge
  • Fine Arts
  • Reference Works
  • Periodicals
  • Miscellaneous/Other

The point of assigning genres is to allow database users both to filter and to trace larger-scale trends in the borrowings in the system, considering, for example, whether religious reading becomes less prevalent over the course of the period our records cover.  We know that some users will be particularly interested in particular subsets of the data, and the genre system provides what we hope will be a helpful means of focusing in.

When deciding how to describe genre, we initially tried out a two-level system, but this proved too complicated for ensuring consistency.  We therefore settled on a relatively straightforward system designed to mediate between eighteenth-century understandings of texts and categories that modern users will find fairly self-explanatory (although we will be providing glosses for certain categories, like Fine Arts and Belle Lettres, and clarifying what all the categories contain in the explanatory text we’ll be preparing for the site).  Another aim was to try and select categories that would be of a useful size for analysis (this is one reason why we divided Theology and Sermons).

One thing we decided against was including temporal designations in our genre categorisation, so Greek and Roman drama is assigned to the Drama category, rather than a separate Classics category.  This is in part to avoid duplicating the filtering potential of our languages, publication dates and author dates categories, which can all be used to isolate works by Greek and Latin authors.  It also reflects an issue that we found with the more complex version of the genre taxonomy, which seemed to be trying to distinguish between too many different orders of thing.  By focusing the genre categorisation on the nature of a text’s content, rather than its age or source culture, we could be a lot more consistent in our assignments.  Of course, this does not preclude other, more specialised divisions of the data in the future (either through use of other categories in our system or through downloading the data and adding further tags).

The point of the genre taxonomy is not to provide an exhaustive commentary on a work’s contents – where possible, we are assigning only one genre, although we have the option to assign two or three where a work is genuinely fairly evenly divided between several categories.  We are generally assigning the first thing that would come to mind for most readers – while fully to describe Laurence Sterne’s The Life and Opinions of Tristram Shandy might mean assigning the majority of our genres, we think most people would think of it first as being Fiction, so it seems most useful to assign that category alone, rather than aiming for a comprehensive description of everything a work might be considered to be.  It is sometimes difficult to assign genres to the collected works of writers who wrote voluminously, but here we apply a rule where we focus on the most prevalent forms in which they wrote, rather than trying to account for every genre a series of collected works might contain.  For certain kinds of writing, we have reflected eighteenth-century understandings, assigning works like William Derham’s Astro-Theology: or, a Demonstration of the Being and Attributes of God, from a Survey of the Heavens to both the Natural Philosophy and Theology categories.  In practice, one of the trickiest distinctions I have found is between Lives and History, particularly where monarchs are concerned – the question of whether the life of a king is a biography or necessarily historical is a complicated one.  However, in cases like this, we are able to survey the contents of works to help us decide and we can use the flexibility of our tagging system to apply both genres where that seems most appropriate.

On the whole, the genre taxonomy seems to be holding up well so far with around 35% of editions classified.  Mark Towsey, who has been using the same taxonomy for the C18th Libraries database, has also generally found the system robust.  Putting theory into practice has thrown up some issues to consider – for example, we’re currently working out what to do with certain kinds of works on languages for which the Education and Reference Works headings don’t seem accurate.  However, for the most part, we’re making good progress.  This kind of exercise will always be hard to complete in a manner acceptable to everyone, but we hope that our taxonomy will provide users with another useful way into the data and will lead to some helpful discussions that will help us refine how we understand both eighteenth-century disciplinarity and modern expectations.