HathiTrust Help & Tutorials: HathiTrust Research Center (HTRC)

This guide provides information on how to find and access resources in HathiTrust, a digital preservation repository that includes over ten million volumes from partner library collections.

HathiTrust Research Center (HTRC)

KU Libraries are a member of HathiTrust—an institutional cooperative that offers researchers access to millions of digitized titles from around the world.   The Hathi Trust Research Center (HTRC) is one of the numerous resources available with this membership, providing tools for scholars to perform computational analysis on public domain and in-copyright texts available in the digital library without compromising copyright.

Video: HathiTrust Research Center Informational Video

Text Mining the HathiTrust Research Center (HTRC)

Text Mining: HathiTrust Research Center Expands Services to Scholars

The HathiTrust Research Center (HTRC), a cooperative service of Indiana University, University of Illinois, and HathiTrust, has expanded its services to support computational research on the entire collection of one of the world’s largest digital libraries, held by HathiTrust. HathiTrust’s collections include over 14 million digitized volumes, including more than 7 million books, more than 725,000 US federal government documents, and more than 350,000 serial publications. HathiTrust’s collections are drawn from some of the largest research libraries in North America, including Indiana University and the University of Illinois.

Previously the HathiTrust Research Center supported analysis of only the public domain subset of the HathiTrust collection. HTRC is now the only place where scholars can perform text mining on the entire HathiTrust collection. In other words, researchers can now explore the entire collection, run an algorithm against all 14 million volumes, and make new connections and discoveries in the process.

At first, researchers will be able to access the HTRC collection through its Advanced Collaborative Services grants. This peer-reviewed grant process gives awardees dedicated HTRC staff time.

HTRC expects to make the full collection available through its secure HTRC data capsules in spring 2017. A features data set, derived from the full collection at both volume level and page level, will be released in fall 2016. “The upcoming release of the extracted features data derived from the full collection will enable researchers to have hands-on access to HT materials allowing scholars to refine their research questions for the corpus in the comfort of their own labs. Another game changing breakthrough for HTRC,” said J. Stephen Downie, the Illinois co-director of HTRC and a Professor at the Graduate School of Library and Information Science (GSLIS), University of Illinois.