HathiTrust: Text mining and analysis
Text mining and analysis with HathiTrust
Memorial University students, faculty, and staff have access to the HathiTrust Research Center (HTRC). The HTRC lets you conduct research with items in the HathiTrust digital library through computational analysis.
While items covered by current copyright restrictions cannot be read or downloaded, everything in HathiTrust, including in-copyright and public domain items, can be used for “non-consumptive” uses like text and data mining.
HTRC Analytics provides a toolkit of services for large-scale computational analysis of the works in the HathiTrust digital library (although this site is advertisement-free, you may need to disable ad blockers or other browser extensions to load it correctly). In order to use HTRC, you must create a HathiTrust Analytics user account. With your HathiTrust Analytics account, you can perform data mining activities that fall into one of four categories: Datasets, Worksets, Algorithms, and Data Capsules. Learn more about tools offered through the HTRC.
For more information about the relationship between the HathiTrust digital library and research center, watch this short video from the University of Illinois.
Additional resources
-
HathiTrust Research Center Documentation: This wiki provides training materials as well as general information about HathiTrust, HT Analytics, and the HRTC. It also provides instructions for accessing Bookworm, algorithms, and getting into the data capsule environment.
-
HathiTrust+Bookworm: Bookworm visualizes word trends in texts contained in the HathiTrust Corpus. Easily generate a visual representation based on words, and apply a variety of filters such as date range, country of publication, literary form, etc.
-
Examples & Use Cases: Browse this list of projects undertaken using HTRC's text analysis tools.