Introduction
It is hard to imagine a more iconic figure in the history of science than Charles Darwin. In addition to being globally famous in his own lifetime, the image of the old man with the large beard has become synonymous with scientific progress and is an almost universally recognisable figure. As a result of his prominence, Darwin’s archive was preserved carefully by his family and later deposited in Cambridge University Library (CUL) to make it available to researchers. His significance as a figure in the history of science, combined with a substantial and well-described archive, have led to the creation of numerous editions of his unpublished papers, and recently to multiple websites presenting digital versions of his writings. This article will describe the nature of the different digital editions presenting the work of Charles Darwin and the differences and connections between them. It will focus particularly on the resources developed by the holding institution, CUL, but will also touch on other platforms presenting material in the same collection.
Darwin and geology
As a scientific figure Darwin hardly needs introduction, but as this is a journal devoted to geology it is worth pointing out that although he is now primarily known for his theory of natural selection in plants and animals, including humans, as a young man Darwin was intensely interested in geology. As a student at Cambridge, he studied under Adam Sedgwick both in the classroom and on a field trip to North Wales (Secord, 1991). On the Beagle voyage between 1831 and 1836, he spent about two thirds of his time on shore, where geology was his primary pursuit (Herbert, 2005). From the Andes to the coral reefs of the pacific, his notes on geology grew twice as fast as those on zoology. In London, he delivered a series of papers at the Geological Society of London before becoming its secretary in 1838, the most significant scientific obligation Darwin would ever accept (Secord, 1991). In the same year, he took up the puzzle which would be one of his most serious failures, trying to explain the mysterious parallel roads of Glen Roy in Scotland (Rudwick, 2017). However, as his health deteriorated, he soon lacked the stamina for demanding field work, and in the end, he increasingly turned his eyes more to subjects he could observe in the garden of his house in Kent.
His time focussing on earth sciences unquestionably influenced him, though. It is impossible to emphasise enough the influence of Charles Lyell’s work on Darwin’s thinking; Lyell’s notions of gradual change through long periods of time are fundamentally important to Darwin’s work on evolution and natural selection, in addition to the Scotsman’s style of interpretation based on observable causes (Secord, 1991). He later wrote, "No other work of mine was begun in so deductive a spirit as this; for the whole theory was thought out on the west coast of S. America before I had seen a true coral reef. I had therefore only to verify and extend my views by a careful examination of living reefs” (C. Darwin, 1958). His publications on this topic were some of his earliest works and were instrumental in creating the reputation he later banked on to solicit help and observations from people in the scientific establishment. Darwin’s interest in geology is a lesser-known but vital part of his study, and it is best illustrated by tracking his activities across the various platforms now available to the researcher.
The Darwin Archive
The Darwin Archive at Cambridge University Library is at the centre of all major digital resources relating to the work of Charles Darwin. Although Darwin’s children began publishing selections of his manuscripts as early as 1887 with an edition of letters by Francis Darwin, who had worked as his father’s assistant for several years (F. Darwin 1887), it was not until much later that the original papers became available to the public. The core of the collection consists of a gift to Cambridge University Library of papers from the Darwin family, funded by the Pilgrim Trust, in 1942, although a full listing was not published until 1960 (Burkhardt & Smith, 1994, p. 2). These include the working notes from Darwin’s time on the Beagle forwards, a small number of draft manuscript pages of his published works (mainly kept because they had been used as scrap paper), tens of thousands of letters exchanged with Darwin himself and various members of the family, most of his scientific book collection, and several thousand offprint articles, many now unique and annotated to show significant use by Darwin. Some material, such as non-scientific manuscripts and unannotated books, were sent to Down House, Darwin’s home in Kent which was already functioning as a museum and is now run by English Heritage. Many of the papers given to Cambridge were put together by his son Francis and further curated by Francis’ son Bernard and further organised for the library by later descendants such as Lady Nora Barlow, who also published an edition of Darwin’s Autobiography (Darwin, 1958 and Kohn & Montgomery, 1994). Another large collection of the papers was purchased on the death of Sir Robin Darwin in 1974 (Barrett et al., 1987), and boxes of manuscripts continue to come to CUL. Currently the collection contains 60 linear meters of manuscript material, more than 700 printed books from Darwin’s library, and almost 5,600 article offprints from his working collection (Fig. 1).
It was also in 1974 that the first of the large Darwin publication projects began, the Darwin Correspondence Project (DCP), leading to what some scholars call the ‘Darwin Industry’ (Browne, 2022). This ambitious project led the way for the detailed transcription and description of other portions of the archive, leading to a trio of Darwin-related online platforms. The DCP published the complete letters to and from Charles Darwin. The non-letter manuscripts, such as notebooks and drafts, were published by the Darwin Manuscripts Project (DMP), run out of the American Museum of Natural History in New York, and the published works of Darwin can be found on a website called Darwin Online, which is run from the National University of Singapore. Although all three projects exist in the relatively small world of Darwin studies, they use data forms that are very different to each other and to date have only formally interacted with each other through third party platforms. CUL, which holds the collections, has also presented its Darwin collections through library-based platforms, namely its own Digital Library and the Biodiversity Heritage Library (see Fig. 2). In almost all cases, under UK copyright legislation the copyright of the texts of manuscript materials remains with the Darwin family, so all projects received permission from representatives among his descendants.
The Darwin Correspondence Project
The Darwin Correspondence Project was an independently funded research team consisting of special collections staff at Cambridge University Library, affiliated with the Department of the History and Philosophy of Science at the University of Cambridge, and supported in part by the American Council of Learned Societies in New York. It was founded in 1974 by Frederick Burkhardt, an American scholar who had recently finished editing the papers of William James, with the goal of locating and researching all letters written both to and from Charles Darwin, and publishing complete transcriptions of them with contextual notes, footnotes, and translations of any letters that were not in English (Ruse, 2023). The 30-volume print edition was completed in early 2023 and the project disbanded (Browne, 2023). The complete edition contains about 16,000 letters, of which 9,000 are in the Darwin Archive at CUL. The other letters are scattered around the planet across more than six hundred archives, libraries, museums, and private collections. The full edition is now available to read as a set of hardcover volumes published by Cambridge University Press, on the CDP’s main website, as a downloadable repository of XML files on GitHub, and on the DCP’s offshoot science letters platform Epsilon.ac.uk, which will be described in more detail below. Because the project was started in the 1970s, it has in many ways spanned the history of humanities computing. Many editorial decisions were made on the basis that the core goal of the project was to produce a print run of the letters, based on the options available at the time, and the editors had to work out subsequently how to make the most of the opportunities available and to keep up with evolving data standards. A particular challenge was to make sure the project evolved technologically while still preserving the ability to produce print books that are visually identical to those originally printed in the 1980s. However, by the completion of the edition in 2023, the books were still being produced directly from the core data which was available on three different platforms in addition to the PDFs sent to Cambridge University Press. Cambridge University Press holds copyright over any texts produced by the project.
Because the project ran over such a long period of time and was relatively well funded for a humanities project, the website is large and has multiple functionalities. At its core, the website is a way to search and read the sizeable database of letters edited by the DCP’s expert editors. Although it would be a lengthy history of humanities computing to go through all the permutations the data went through, the important points are as follows.
Right from the earliest days of the project, the need for identifiers was recognised. Each letter was given a unique identifier. The project commissioned a markup to describe the beginning and end of letters and to mark up anything that would need to appear in a particular way on the printed page – for example addresses and dates at the top of letters were indicated so that they could be right justified. Special characters, italics, and bold text were also marked up. Over the five decades of the project’s work, the editors suffered through several technological shifts and had to periodically learn new ways of working. However, these core standards remained throughout. As a result, many of the building blocks of a digital edition were there; when the project decided to start using a more established standard, the files were sufficiently regular, and the letters and correspondents distinguished from each other, so it was possible to transform them into TEI XML. This was a vital shift for the project, as it was important to the editors that the data left behind by should be in a format that permitted its reuse in multiple contexts, as well as being presented immediately across multiple different platforms, and that it conformed to well-established standards to increase its chances of being preserved and relevant. This is the data that underlies the current website. The core work of the project exists as about 16,000 letter files, along with several thousand person files and bibliography files, all available in various ways on the website, and able to be either searched or browsed. Unfortunately, the effort to make sure that all the underlying logic of the old files was preserved in the transition to XML, combined with a loss of time due to the Pandemic, meant that some work to improve the data was never completed, such as connecting citations to external resources or person files to external authorities, but the existing data was well documented and made publicly available (DCP public repository). The format has allowed the DCP files to be used for both textual and network analysis in both research and teaching; moreover, the encoding decisions have proved influential in newer correspondence projects, such as Unlocking the Mary Hamilton Papers (https://www.maryhamiltonpapers.alc.manchester.ac.uk/).
One major feature of the website is its capacity to host web articles to capture the expertise of the editors on the project. Most of the staff had been working on Darwin for decades and had knowledge on any number of subjects, including the earth sciences, that had no place in the rather strict format of the letter edition. Moreover, the DCP was able to give a chance for graduate students from Harvard to begin exploring areas of interest that in some cases became PhD topics for them (Browne, 2023). Because of this, even though the core project staff had not worked actively on Darwin’s geology for some decades, the section of the website discussing coral atolls continued to grow (https://www.darwinproject.ac.uk/commentary/geology/darwin-coral-reefs). The website also permitted the DCP to build connections to the other significant collection of Darwin’s geological work in Cambridge, the Sedgwick Museum. Although there was never an opportunity to systematically build connections with the museum database that contained numerous rock specimens gathered by Darwin and his contemporaries, the project was able to produce an interactive resource on the website that connects individual specimens with other materials relating to the Beagle voyage (https://www.darwinproject.ac.uk/commentary/curious/cordillera-beagle-expedition).
The DCP was initially tightly focussed on producing transcriptions only of texts that went to and from Charles Darwin in the mail. The continued drive and passion of generations of Darwin scholars to complete this work sustained the project for almost fifty years and brought it into the digital age. The development of a website around this data, though, gave the opportunity for a large and long-serving staff to present information about Darwin and his world in a popular and accessible manner.
External presentations of the Darwin Archive
In addition to the DCP, there were two more significant projects to make the Darwin Archive public that were not organised in Cambridge or part of CUL. However, both projects worked closely with the institution and fulfilled a different approach to digitising the life and work of Charles Darwin. The Darwin Manuscripts Project and Darwin Online both set out to cover material that would never fall into the remit of the DCP. Between the three platforms, the Darwin Archive has an unusually comprehensive online presence.
The DMP was conceived and implemented as a born-digital project. Originally titled the ‘Darwin Digital Library of Evolution’, the project was founded in 2005 at the American Natural History Museum (AMNH) by Dr David Kohn, Emeritus Professor of History at Drew University and formerly of the DCP. The project arose as part of an intense interest in Darwin leading into the bicentenary of his birth in 2009 and was re-launched for those celebrations (Goldstein 2009; 2010). This site, still hosted by the AMNH (which holds copyright of the transcriptions), features digital access to 34,643 folios of manuscript primarily in the Darwin Archive at CUL, giving both images and transcriptions. This permits a very different exploration of Darwin’s geological activity, as it presents the notes and specimens Darwin took for his own purposes during his time in South America, rather than those he composed afterwards in letters to others and his published accounts. The DMP also contributed to a significant collaboration between CUL and the Biodiversity Heritage Library, giving images of many of the books from Darwin’s library now held at CUL, with transcriptions of any annotations on the pages – especially useful given how challenging Darwin’s handwriting was, even for experts (https://www.biodiversitylibrary.org/collection/darwinlibrary). The DMP resources are invaluable pieces of the wider picture of Charles Darwin’s online presence.
Often known simply as Darwin Online, The Complete Work of Charles Darwin Online is another relative newcomer to the Darwin landscape. Run by Dr John van Wyhe, now of the National University of Singapore, this invaluable resource is extremely wide ranging and gives necessary context to the more targeted material of the DCP and the DMP. It began in 2002 as a scholarly edition of Darwin’s published works. Since then, it has ballooned into a huge platform providing transcriptions of works about Darwin as well as by him, scans of manuscripts from various institutions, and most recently a list of Darwin’s library that was almost twenty years in the making (Van Wyhe, 2009 and https://darwin-online.org.uk/Complete_Library_of_Charles_Darwin.html). DO features even the most obscure publications and mentions of Darwin, so it is extremely useful for tracking his geological thought, beginning with the privately circulated pamphlet printed of geological extracts from letters Darwin sent back to his Cambridge professors from the Beagle Voyage, given to members of the Cambridge Philosophical Society in 1835 (https://darwin-online.org.uk/EditorialIntroductions/Freeman_LettersOnGeology.html). DO features transcriptions and articles about the Archive contributed by many people and groups, which are subject to a variety of copyright restrictions.
Cambridge University Digital Library
Although both the DCP and the DMP were working on material that was often in the same archival collections and recorded in the same catalogue, it was by no means straightforward to present their material together. The Cambridge University Digital Library (CUDL) has in some ways solved that problem, bringing together multiple data sets relating to the same collection. Over many years, the majority of the Darwin Archive has been photographed for inclusion in CUDL, most recently with the completion of a five-year conservation and digitisation project. By the end of this most recent project, 75% of the classes in the collection were available on CUDL (https://specialcollections-blog.lib.cam.ac.uk/?p=26758). This was in part possible because detailed, high quality metadata already existed for most of the archive, having been published by the Correspondence Project and the Manuscript Project. Because the DCP and CUDL were both working in TEI XML by the end of the project, the correspondence was straightforwardly added to images of the relevant letters online. The DMP, although external to CUL, worked closely with CUDL and it was possible to include their transcriptions alongside images of the relevant pages. For both publication projects this did not provide an ideal solution as both had published transcriptions of manuscripts that were not in Cambridge and thus not on CUDL, but it was a significant step forward to be able to present CUL’s own Darwin Archive, with correspondence and other manuscripts together alongside transcriptions that had already been created for other projects. Now, a keen scholar can view both Darwin’s letters sent from South America and the notes he was taking on a single website, alongside the library catalogue data from the institution which holds the items.
Epsilon
It was a common view among editors of the DCP that by working on his correspondence, Darwin was in many ways decentred; increased attention was paid to his network, rather than to Darwin himself, and to how much could be learned about the world around him (Browne, 2022). Moreover, the traditional model of editing the correspondence of one individual felt increasingly limited, as even a cursory dip into the content of the letters revealed that they were simply a drop in the ocean of the conversations that were occurring around and about Darwin. The volume and inherently scattered nature of correspondence makes it difficult to research, especially given the sheer volume of documents that still exist in archives around the world. One result of these reflections was Epsilon.ac.uk (https://epsilon.ac.uk/).
Epsilon is a nineteenth century science letters platform (although it holds a very loose interpretation for both ‘nineteenth century’ and ‘science’). It is an attempt not only to give access to multiple sets of science letters all in one place, but to combine transcription data from correspondence projects often run in academic departments with catalogue data from archival institutions. It also permits the enrichment of existing data about correspondence, giving scholars the opportunity to make public transcriptions they have made of letters listed on Epsilon, such as those in the online calendar of the Sir John Herschel Collection. Currently Epsilon contains the letters of (among others) Alfred Russel Wallace, Michael Faraday, André-Marie Ampère, Joseph Dalton Hooker, John Tyndall, and others. It also holds catalogue data from institutional collections such as the Linnean Society of London and the Royal Society of London. It is designed to work alongside other resources, giving an additional view onto data that could be used on multiple other platforms (indeed most of the contributors have their own sites).
Towards the end of the DCP, with CUDL and Epsilon presenting the opportunity for non-Darwin letters and for letter images, it was possible to fill in one final piece of a story around Darwin’s geological interests – in the final years of the project, CUL received a donation of several letters that had been written to William Kemp, a Scottish amateur geologist who had emigrated to Australia with a bound collection of letters as a souvenir, including those from Darwin. Kemp and Darwin had exchanged several letters in the early 1840s about Scottish geology and botany. At the time when the letters from William Kemp had been published in volume 2 of the Correspondence of Charles Darwin (Burkhardt et al. 1985-2023), the location of any letters from Darwin to Kemp was unknown. A well-meaning descendent of Kemp donated this volume, which contained additional letters to Kemp and gave a wider sense of his involvement in the scientific networks of the day rather than the simple snapshot of his exchange with Darwin (https://epsilon.ac.uk/search?sort=date&f1-contributor=Ruth%20Cramond). Not only was it photographed for inclusion on CUDL, but all the letters had been transcribed by the family and these transcriptions are featured both on CUDL and Epsilon.
Final summary
To summarise, the digital resources around Darwin’s work were the result of many decades of focussed work dedicated to bringing a single archival collection to the public and making it as accessible as possible. In many ways, this could only occur because of Darwin’s prominence as a popular and scientific figure, so that sustained academic and philanthropic funding could be procured to support the work over almost a hundred years, from the first purchase of archival material from the Darwin family to the present day. In some ways, the patchwork of digital projects describing the archive can seem disparate, spread across many websites hosted by multiple institutions. The data relating to the collection existed in many different formats and was created for many different purposes, and each project had its own coverage, remit, and audience. However, the projects provide a useful variety of approaches and complement each other. The DCP website is able to host letters embedded in in-depth interpretative content for all ages. Its offshoot, Epsilon, is a targeted research tool that nests those letters in wider conversations of the nineteenth century. CUDL pulls together multiple data sets to present CUL’s collection as informatively as possible. Finally, DMP and Darwin Online let the reader explore Darwin’s working life and the scholarly and popular context around him in rich detail.