This web site is for a invitational workshop on data-driven
science and data-driven scholarship sponsored by the US National
Science Foundation (NSF) and the British Joint Information
Systems Committee (JISC). The workshop will be held in
Phoenix, Arizona on April 17 to 19, following the meeting of
the Coalition for Networked Information.
Here is some background information. A series of recent
studies and reports have highlighted the ever-growing importance
for all academic fields of data and information in digital
formats. Studies have looked at digital information in
science and in the humanities; at the role of data in Cyberinfrastructure;
at repositories for large-scale digital libraries; and at the
challenges of archiving and preservation of digital information. The
goal of this workshop is to unite these separate studies. The
NSF and JISC share two principal objectives: to develop a road
map for research over the next ten years and what to support
in the near term.
One of the themes of the workshop is
data-driven science and scholarship. Some academic leaders are beginning to recognize
that data-driven science is becoming a new scientific paradigm – ranking
with theory, experimentation, and computational science. Fewer
people appreciate that the combination of large-scale digitization
of books, scholarly journals online, and huge data sets provides
opportunities for new methodologies for scholarship and research
in all academic disciplines. Is this really a fourth
paradigm of science or is it new wine in old bottles? Can
we articulate the importance of this area, so that university
presidents (in the US) and vice-chancellors (in Britain) understand
the potential and challenges?
A second theme is technical. Data-driven scholarship
is technically difficult. Many of the collections are
huge by any standards and they often have complex internal
structure. Organizations such as the National Virtual
Observatory, the Internet Archive, and the Shoah Foundation
have demonstrated the challenges in reconciling these two parameters,
scale and complexity, particularly when the research questions
to be asked of the data are not known in advance.
A third theme is organizational. Collaboration, cooperation,
and standards are needed to exploit heterogeneous sources of
data, but the difficulties of cooperation are often forgotten
and the benefits often fall short of expectations. Many
organizations have expertise in some aspects of data-driven
scholarship: research centers, libraries, supercomputing centers,
archives, Internet companies, and so on. But in almost
every instance such expertise is incidental to the major expertise
of the organization. What is the role of these organizations
and how might they change? We anticipate that new hybrid
organizations will emerge. What is the role of government
agencies, such as the NSF and JISC, in stimulating such developments?
A fourth theme is the changing world
of scholarly communication. This
goes far beyond electronic publications and academic repositories. What
are the best ways to disseminate the results of data driven
science and scholarship? How can very large data sets
be made available to other researchers? How do we reconcile
traditions of peer review with pre-publication over the Web? How
do concepts of public good and sustainability fit with the
practicalities of research?
Finally, what are the enabling conditions, both human and
technical for wide adoption by individuals? Large-scale developments
in data-driven science and scholarship depend on the enthusiasm
of individuals. Recent years have seen rapid changes
in the behavior of researchers in some matters (e.g., the dissemination
of research papers and data from personal web sites), and strong
resistance to change in others (e.g., conservatism in publishing
practices, low contributions to institutional repositories). What
are the barriers and incentives to change and how can the NSF
and JISC influence them?
The workshop is bringing together some thirty people chosen
for the variety of their expertise and viewpoints. The
agenda will include plenary discussions of these themes and
small group breakouts to explore them in greater detail.
As co-chairs of the workshop we would like to acknowledge
the help of the planning committee in establishing the agenda. The
members of the committee were Bill Arms, Dave Cook, Bas Cordewener, Steve
Griffin, Sam Gustman, Carl Lagoze, Ron Larsen, Clifford Lynch,
Rick Luce, Norman Wiseman, Eric Van de Velde. After the
workshop, our task is to organize the ideas and conclusions
into a report that will be of real value to the NSF and JISC.
University of Pittsburgh
This workshop is funded in part by a National Science Foundation
grant and by the Joint Information Science Committee.
Any opinions, findings, and conclusions or recommendations
expressed do not necessarily reflect the views of the National
Science Foundation or the Joint Information Science Committee.