Start Date
6-10-2018 10:45 AM
End Date
6-10-2018 12:15 PM
Description
Access to research materials is an issue that cuts across disciplines and impacts most researchers as they gather information. For a digital scholar in need of a textual corpus, however, these challenges may be particularly acute. Those studying mid-to-late 20th century works may find themselves in uncertain territory with regard to copyright and licensing. Those studying historically marginalized populations may have trouble finding a pre-compiled corpus, or finding texts at all. Researchers at smaller institutions or in underfunded departments may find that existing datasets are not available to them due to cost, or that they run into copyright and licensing barriers when attempting to compile a large corpus of texts. Even an existing or easily harvested corpus may present structural challenges for our tools. How do we diversify and democratize digital scholarship while also navigating the difficulties of equitable access to information?
Keywords
text analysis, large text corpora, access
Rights
Copyright 2018, Gesina A. Phillips
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Type
Presentation
Session
#s2b, moderator Diane Jakacki
Language
eng
Location
Elaine Langone Center, Center Room
A Critical Look at the Digital Scholarship Corpus: How Access Influences the Questions We (Can) Ask
Elaine Langone Center, Center Room
Access to research materials is an issue that cuts across disciplines and impacts most researchers as they gather information. For a digital scholar in need of a textual corpus, however, these challenges may be particularly acute. Those studying mid-to-late 20th century works may find themselves in uncertain territory with regard to copyright and licensing. Those studying historically marginalized populations may have trouble finding a pre-compiled corpus, or finding texts at all. Researchers at smaller institutions or in underfunded departments may find that existing datasets are not available to them due to cost, or that they run into copyright and licensing barriers when attempting to compile a large corpus of texts. Even an existing or easily harvested corpus may present structural challenges for our tools. How do we diversify and democratize digital scholarship while also navigating the difficulties of equitable access to information?