Ubiquitous Knowledge Environments - The Cyberinfrastructure
Information Ether
The January 2003 NSF report “Revolutionizing Science
and Engineering Through Cyberinfrastructure”1 led off
with the observation that “multiple accelerating trends
are converging and crossing thresholds in ways that show extraordinary
promise … in how we create, disseminate, and preserve
scientific and engineering knowledge.” That study concluded
that the “National Science Foundation should establish
and lead a large-scale, interagency, and internationally coordinated
Advanced Cyberinfrastructure Program (ACP) to create, deploy,
and apply cyberinfrastructure in ways that radically empower
all scientific and engineering research and allied education.” It
envisioned a program “to build more ubiquitous, comprehensive
digital environments that become interactive and functionally
complete for research communities in terms of people, data,
information, tools, and instruments and that operate at unprecedented
levels of computational, storage, and data transfer capacity.”
Were it not that the Panel’s charge limited it to science
and engineering, this observation would no doubt have been
extended to knowledge resulting from all scholarly and creative
pursuits. Digital libraries were envisioned a decade ago as
the answer to networked knowledge environments, and much progress
has been made towards that goal, but the pursuit of the goal
has also extended our view beyond that which could have been
anticipated a decade ago.
What began as an effort to create “digital libraries” has
transformed into something much more dynamic than was originally
envisioned. Certainly, the idea of curated, network-accessible
repositories was (and remains) a fundamental need of scholarly
communication, as was the notion that these repositories should
support information in multiple formats, representations, and
media. But not until serious efforts were made to build such
resources, particularly for non-print digital content (audio,
image, and video, for example) did it become apparent that
this venture would stretch the bounds of computer and information
science, and, indeed, require the articulated confluence of
multiple computer and information science disciplines. It now
appears that these emerging tools and capabilities hold the
potential to transform the conduct of disciplinary research,
itself, and even to foster the creation of new areas of investigation
at the interstices of existing disciplines.
Much of the potential for new digital content is only beginning
to be understood. See Tim Rowe’s work on paleoradiology,
recently featured on NPR2 , for an excellent example of an emerging
source of novel data of substantial scientific interest. An
appropriate network-based information infrastructure would
make access to and utilization of such information sources
routine and intuitive. The science, the knowledge that results
from project’s such as these is not fundamentally different
because of the digital medium chosen to express it, however.
Fundamentally, knowledge communicated or expressed in digital
form is the same as that expressed by other means.
Early efforts largely focused on building digital replicas
of print-based materials rapidly morphed into a new mode of
communication among scientists, enabling rapid dissemination
of new findings, discussion and debate around these findings
leading to major reductions in time for fully-vetted results,
and a new form of scholarly communication infrastructure that
holds the promise of enabling fuller exploitation of knowledge.
Whereas traditional means of scholarly communication led to
highly compartmentalized knowledge along axes of discipline,
geography, and individual, the networked knowledge paradigm
holds out the potential for exhaustive exploitation of new
and existing knowledge, across all previous barriers. The breakdown
of disciplinary barriers creates a multiplicative effect unrealizable
with traditional approaches. Our present vocabulary falls short
of capturing the essence of this phenomenon. Traditional characterizations
feel passive in light of what is emerging, and traditional
infrastructures such as scholarly journals and research libraries
fall short of delivering on this potential. The terms “repository” and “library” capture
only a small portion of what is emerging.
But developing this new scholarly communication infrastructure
(embedded in the larger cyberinfrastructure advocated in the
referenced report) is no mere software development activity.
Experience has already shown that it engages the talents of
the best minds and talents of computer and information scientists,
including network and systems design, human computer interaction,
artificial intelligence, information retrieval, information
organization, machine translation, database systems, and complexity
theory. The real challenge is to build systems supporting scholarly
communication that yield new capabilities and capacities so
effectively and efficiently that they are intuitive and transparent
in their operation. Indeed, a serious measure of success may
be how simple the resulting infrastructure appears to operate
to the casual (and serious) user. This is what we refer to
here as the Ubiquitous Knowledge Environment, or the “information
ether.” Underlying this concept is a vision that “information
retrieval” could become a dated notion, a vestige of
an era in which information was difficult to locate, and even
more difficult to acquire.
Realizing such a vision requires not only advances derived
from research but also international coordination, as noted
in the Cyberinfrastructure report. The European Commission
has also taken note of this opportunity and the research needed
to realize it. Its DELOS-initiated “6th Framework Programme,”3 calls
for a “new infrastructure and environment … created
by the integration and use of computing, communications, and
digital content on a global scale.” NSF and the DELOS
Network of Excellence on Digital Libraries have organized six
joint working groups “whose aim is to define a research
agenda … for cooperation between the EU and US researchers4 .
Working groups include:
- Spoken-Word Digital Audio Collections
- Information Extraction
from Digital Libraries
- Personalization and Recommender Systems
in Digital Libraries
- ePhilology: Emerging Language Technologies
and Rediscovery of the Past
- Digital Imagery for Significant
Cultural and Historical Materials
- Preservation and Archiving
This proposal to the National Science Foundation suggests
that the time for a serious investigation and development of “Cyberinfrastructure” is,
indeed, both timely and strategically important to the United
States, and that the Panel’s recognition of the need
for a new infrastructure supporting scholarly communication
and collaboration is both necessary and timely. We propose
to bring together a group of recognized national and international
scholars and researchers to propose a plan for the long-term
research necessary to realize such a scholarly communication
infrastructure, building on the Cyberinfrastructure results
and complementary results of a related study in which NSF was
encouraged to initiate a program of research into the science
of information management.
The Science of Information Management
The November 2002 report “Technology Requirements for
Information Management”5 documents the findings of a NSF
/ DARPA / NASA study panel conducted “in support of application
domains of particular government interest, including digital
libraries, mission operations, and scientific research.” Of
particular interest to the federal sponsors was the pursuit
of research and development to support applications typified
as “highly distributed with very large amounts of data
and a high degree of heterogeneity of sources, data, and users.”
What is becoming increasingly clear from these studies is
the universal recognition of the need for a focused research
effort, beyond the development of faster hardware and greater
bandwidth, to reap the benefit of Moore’s Law in the
use of information. Digital library research began this exploration
nearly ten years ago and, while notably successful in what
it achieved, may be more noteworthy for clarifying what remains
beyond reach.
Digital libraries are not the total solution to the problems
identified, nor are they the sole path to the opportunities
envisioned. But they are part of the solution, just as computers,
networks, and operating systems are part of the solution. And
they are on the path, just as the other components are. The
challenge, however, is to understand what else may be on that
path, particularly at a conceptual or theoretical level. This
is what the RIACS report identifies as the science of information
management. “A science of information management would
deal with the underlying principles of information management
and how humans deal with it. By contrast, information technology
provides the tools and systems to achieve the desired functions
and goals.”6
Some of the defining characteristics of the problems to which
a science of information management applies are:
- Heterogeneity of information systems and sources
- Heterogeneity
of users and information providers
- Seemingly unbounded scale
of data and users
- Need for preservation and records management
for an indefinite future and into indeterminate future
environments
- Evolving legal and social frameworks
- Varying human and organizational
capabilities and behavior
- Curated information
- Real-time operation
- Collaborative, synchronous processes
- Both proactive and
reactive uses
- Incomplete and uncertain information
- Enormous range of granularity of data
- Need for comprehensive
metadata
- Increasing emphasis on the semantics of and correlative
relationships among data
Interoperation is considered a key and well-known requirement,
and one for which the needs are better known than the solutions.
Some characteristics of interoperation for common applications
such as collaboration, visualization, and decision-making include
interoperation:
- of sources, services, and ontologies
- with information about
possible futures
- in spite of incompleteness,
inconsistency, and uncertainty
- that supports authentication,
privacy, and security.
A ubiquitous information infrastructure is envisioned by the
studies cited. The successful realization of such an infrastructure
will be supported by a science of information management that
will yield new generations of knowledge environments that evolve
in pace with advances in computing and communications.
This proposal initiates the detailed development of such a
program with an invited workshop of national and international
researchers and scholars, to be held in June 15-17, 2003, at
Wequassett Inn in Chatham, Massachusetts. Approximately 40
participants are anticipated for the 2.5 day workshop. A report
will be produced for the National Science Foundation following
the workshop that details the participants’ recommendations
and rationales. While this proposal is not specifically proposing
it, a follow-up conference open to a hundred or more participants
is anticipated, where the report would be
presented, critiqued, and vetted, with the intent of providing
NSF the detailed advice and counsel of the scientific community
regarding the development of a research program to develop a
science of information management.
1.Atkins, Daniel E., et. al., “Revolutionizing
Science and Engineering Through Cyberinfrastructure: Report
of the National Science Foundation Blue Ribbon Advisory Panel
on Cyberinfrastructure,” January 2003, available online
at http://www.communitytechnology.org/nsf_ci_report/.
2. See NPR’s story on Tim Rowe at http://www.npr.org/display_pages/features/feature_1183920.html
3. Schauble, Peter and Alan F.
Smeaton, “An International Research Agenda for Digital
Libraries,” October 1998, available online at http://www.iei.pi.cnr.it/DELOS/NSF/Brussrep.htm
4. DELOS/NSF Joint Working Groups,
http://delos-noe.iei.pi.cnr.it/activities/internationalforum/Joint-WGs/joint-wgs.html
5. Graves, Sara, Craig A. Knoblock, & Larry
Lannom, “Technology Requirements for Information Management,” RIACS
Technical Report 2.07, November 2002, available online at http://www.riacs.edu/trs/
6. RIACS TR 2.07, see previous
footnote
|
|