Appendix - Methodology#
This book has been published using the Juptyer Book publishing framework.
The original texts were sourced predominantly from
archive.org, the National Library of Wales and the Hathi Trust, all of which are free services. I also retrieved article text via a personal subscription to the British Newspaper Archive, and via a university subscription to the Times online archive.
Source text excerpts were added to separate text files corresponding to each separate publication. Excerpts were added to the text files using the following convention:
Periodical / book title
The page number element (
p...) was REQUIRED and served to separate the metadata (URL, title, date) from the text. The page number element could take various forms (for example,
pxiv, and if unknown / not relevant,
p?). Individual records in a file were separated by
Records were loaded from the text files into a simple file based
sqlite database. This provided a means for searching over the content, as well as republishing into skeleton book pages, (for example, a page containing all the articles published in a particular year and that contained a particular search term).
Several tools were developed to support “fuzzy” searching where search terms almost, but don’t quite, match terms included in a particular text, and tools for extracting fragments of text from each record (for example, extracting a particular paragraph or the text between start and end fragments).
As a related activity, I have also started developing full text searchable collections of articles from publications such as Notes & Queries, as described in Story Notes – Technical Recipes.