Sample Datasets
We’re happy to provide sample datasets for use in research and teaching. These datasets include open access content on JSTOR, and can be used for research, or as sample datasets for teaching and practicing text mining techniques.
Early Journal Content dataset
The Early Journal Content (EJC) on JSTOR includes public domain journal articles published in the United States before 1923 and articles published in other countries before 1870, and includes discourse and scholarship in the arts and humanities, economics and politics, and in mathematics and other sciences. The EJC dataset includes full-text OCR and article-level metadata.
Open Access Ebooks dataset
We have partnered with leading presses on a project to add open access ebooks to JSTOR. Thousands of titles are now available from publishers such as University of California Press, Cornell University Press, NYU Press, and University of Michigan Press; most books in this group were published between the years 2000 and 2017. The open access ebooks dataset includes full-text OCR and title-level metadata.