NSF/JISC Repositories Workshop
Sayeed Choudhury,
Johns Hopkins University
April 9, 2007
Download: PDF Version WORD
version
My comments reflect experience with two projects led by Johns
Hopkins University that seemingly come from opposite ends of
the spectrum—the Virtual Observatory[1] (VO)
and the Roman de la Rose Project.[2] The
VO represents one of the quintessential cyberinfrastructure
projects, with large, complex datasets being shared and analyzed
by a distributed group of astronomers.
The Rose Project features the development of a digital environment
that will include content and services related to manuscripts
written (and illuminated) in medieval French from the late
13th century to the middle 16th century. At first glance,
one might assume that the Rose Project offers little insight
regarding data-driven scholarly communication. Even a
completely digitized corpus of Rose manuscripts would not come
close to the scale of the VO datasets. However, when
I reflect on the current—and historical—nature
of both disciplines, I note an important observation regarding
the relationship between data and scholarly practices and communication.
There is a widespread belief that humanists work primarily
in isolation, resisting collaborative ventures with other humanists,
whereas scientists work as teams, embracing opportunities to
work with fellow scientists. While there is ample evidence
to support this belief in the present, when one considers the
historical context of the humanities and sciences, the picture
becomes more complex.
Rudolphine Tables = Open Content Alliance?
In his position paper, Michael Nelson mentions the Rudolphine
Tables, making the interesting observation that they might
be considered on par with the Google Book Search[3] or
the Open Content Alliance.[4] Certainly,
the publication of these data inspired major advances in astronomy. Michael
also appropriately mentions that these tables might not have
been published for a host of reasons including “significant
infrastructure costs (in the form of purposebuilt observatories),
professional jealousy, intellectual property restrictions,
and political and religious instability.” Even
astronomy, not too long ago, was a discipline defined by a
lone astronomer who would guard her or his data with great
secrecy. In “data poor” times, it seems that
scientists did not readily share data or collaborate.
By the time the Rudolphine Tables had been published, the
Roman de la Rose story had been written, re-written, re-purposed,
recast, illuminated, and shared many times over. While
it might seem like an unorthodox interpretation, this period
represented a “data rich” time for the humanities. Before
the development of scientific instrumentation, “data” consisted
of the spoken word, the written word, illuminations, etc. And,
it seems, in this relatively “data rich” environment
of the Middle Ages, humanists did collaborate. Perhaps
it is human nature, rather than humanists’ nature, that
defines scholarly practice.
Rather than assuming some inherent characteristics of specific
disciplines define their modes of scholarship or communication,
perhaps it is the relative ease or difficulty with which they
can generate, acquire or process data that ultimately influences
scholarship. As an engineering student, I was led to
believe that humanities materials are data poor but, in reality,
they are data rich in several ways. A single Rose manuscript
contains a tremendous of amount of textual, visual and semantic
content, which is difficult to extract in meaningful ways. As
our ability improves to move these data into digital format,
I believe humanists will naturally collaborate. Indeed,
large-scale digitization might drive the humanities into a
new age of data-driven scholarship as the Rudolphine Tables
inspired astronomers.
A Moment in Time
During the ACRL Conference in Baltimore, I had the pleasure
of meeting with the CLIR Postdoctoral Fellows in Scholarly
Information Resources.[5] During
this conversation, Chuck Henry, the President of CLIR mentioned
that scholars from the sciences, engineering, social sciences
and humanities have each developed their own cyberinfrastructure
studies and reports, perhaps representing an unprecedented
convergence of interest. There is no doubt, however,
that the sciences and engineering are leading the way for data-driven
scholarship in our current environment.
As our digital library group at Johns Hopkins has learned
more about the VO, we realize that we are not facing a data
deluge: we are facing a data tsunami. Having said this, perhaps
the Roman de la Rose was so popular precisely because it felt
like an overwhelming new mode of interaction with data. Let
me submit a controversial statement that I believe merits some
discussion: putting aside obvious aesthetic differences, scientific
datasets are the modern equivalents of medieval manuscripts.
Roles for NSF and JISC
There are, of course, undeniable differences in our environment
given the scale of data. Bill Arms’ position paper
quotes Greg Crane: “When collections get large, only
the computer reads every word.” Rather than urging
scholars to consider the “crisis in scholarly communication” or “barriers” to
change, as the amount of data increases, there will probably
be a natural shift toward new methods for publication, collaboration,
etc. that emphasize machine readable and actionable methods. As
scientists such as astronomers lead the way, it will be worthwhile
to ascertain whether cyberinfrastructure related tools, services,
and systems from one discipline could support other scientists,
engineers, social scientists and even humanists. NSF
and JISC can help track the portability of such resources.
NSF and JISC can undeniably influence the environment through
its rewards structure. Funding for projects that support
increased data acquisition, integration, processing and analysis
should be encouraged. NSF and JISC are well placed to
fund collaborative efforts within the US and UK, respectively,
but joint funding programs would have the obvious benefit of
bringing together collaborators with similar research or teaching
goals, but different perspectives.
Finally, NSF and JISC should provide significant funding and
support for digital preservation and data curation. These
essential, yet largely unaddressed, areas of support are essential
for scholarly communication. Eric F. Van de Velde’s
suggestion from his position paper of Centers of Excellence
in Data Preservation is worthwhile.
Imagine the loss to science and scholarship if we had not
preserved the Rudolphine Tables or the Roman de la Rose manuscripts.
References
- http://www.us-vo.org/
- http://rose.mse.jhu.edu/
- http://books.google.com/
- http://www.opencontentalliance.org/
- http://www.clir.org/fellowships/postdoc/postdoc.html
|
|