April 17 - 19, 2007   
Hyatt Regency Phoenix   
Phoenix, Arizona   

 

Welcome and themes

 

This web site is for a invitational workshop on data-driven science and data-driven scholarship sponsored by the US National Science Foundation (NSF) and the British Joint Information Systems Committee (JISC). The workshop will be held in Phoenix, Arizona on April 17 to 19, following the meeting of the Coalition for Networked Information.

Here is some background information.  A series of recent studies and reports have highlighted the ever-growing importance for all academic fields of data and information in digital formats.  Studies have looked at digital information in science and in the humanities; at the role of data in Cyberinfrastructure; at repositories for large-scale digital libraries; and at the challenges of archiving and preservation of digital information.  The goal of this workshop is to unite these separate studies.  The NSF and JISC share two principal objectives: to develop a road map for research over the next ten years and what to support in the near term.

One of the themes of the workshop is data-driven science and scholarship.  Some academic leaders are beginning to recognize that data-driven science is becoming a new scientific paradigm – ranking with theory, experimentation, and computational science.  Fewer people appreciate that the combination of large-scale digitization of books, scholarly journals online, and huge data sets provides opportunities for new methodologies for scholarship and research in all academic disciplines.  Is this really a fourth paradigm of science or is it new wine in old bottles?  Can we articulate the importance of this area, so that university presidents (in the US) and vice-chancellors (in Britain) understand the potential and challenges?

A second theme is technical.  Data-driven scholarship is technically difficult.  Many of the collections are huge by any standards and they often have complex internal structure.  Organizations such as the National Virtual Observatory, the Internet Archive, and the Shoah Foundation have demonstrated the challenges in reconciling these two parameters, scale and complexity, particularly when the research questions to be asked of the data are not known in advance.  

A third theme is organizational. Collaboration, cooperation, and standards are needed to exploit heterogeneous sources of data, but the difficulties of cooperation are often forgotten and the benefits often fall short of expectations.  Many organizations have expertise in some aspects of data-driven scholarship: research centers, libraries, supercomputing centers, archives, Internet companies, and so on.  But in almost every instance such expertise is incidental to the major expertise of the organization.  What is the role of these organizations and how might they change?  We anticipate that new hybrid organizations will emerge.  What is the role of government agencies, such as the NSF and JISC, in stimulating such developments?

A fourth theme is the changing world of scholarly communication.  This goes far beyond electronic publications and academic repositories.  What are the best ways to disseminate the results of data driven science and scholarship?  How can very large data sets be made available to other researchers?  How do we reconcile traditions of peer review with pre-publication over the Web?  How do concepts of public good and sustainability fit with the practicalities of research?

Finally, what are the enabling conditions, both human and technical for wide adoption by individuals? Large-scale developments in data-driven science and scholarship depend on the enthusiasm of individuals.  Recent years have seen rapid changes in the behavior of researchers in some matters (e.g., the dissemination of research papers and data from personal web sites), and strong resistance to change in others (e.g., conservatism in publishing practices, low contributions to institutional repositories).  What are the barriers and incentives to change and how can the NSF and JISC influence them?

The workshop is bringing together some thirty people chosen for the variety of their expertise and viewpoints.  The agenda will include plenary discussions of these themes and small group breakouts to explore them in greater detail. 

As co-chairs of the workshop we would like to acknowledge the help of the planning committee in establishing the agenda.  The members of the committee were Bill Arms, Dave Cook, Bas Cordewener,  Steve Griffin, Sam Gustman, Carl Lagoze, Ron Larsen, Clifford Lynch, Rick Luce, Norman Wiseman, Eric Van de Velde.  After the workshop, our task is to organize the ideas and conclusions into a report that will be of real value to the NSF and JISC.

William Arms
Cornell University

Ronald Larsen
University of Pittsburgh

Co-chairs

This workshop is funded in part by a National Science Foundation grant and by the Joint Information Science Committee. Any opinions, findings, and conclusions or recommendations expressed do not necessarily reflect the views of the National Science Foundation or the Joint Information Science Committee.