June 15 - 17, 2003   
Wequassett Inn, Cape Cod   
Chatham, Massachusetts   
NSF/JISC Workshop
 
General
Welcome
Background
Agenda
References
Important Dates
Participants List
   
OUTREACH
China 2004
Bangalore 2005
   
For Contributors
Call For Papers
Papers
Breakout Reports
Final Report
Opening Plenary Session
Supplementary Contributions
   
For Participants
Expense Form
Accommodation
Tourist info
Travel
   
Organization
Sponsors
Contacts
   

 

   
Papers  
   
Enabling the Semantic Web for Scientific Research And Collaboration
 
   
   

Eric Miller, em@w3.org
Semantic Web Activity Lead
W3C World Wide Web Consortium
Download: PDF Version    WORD Version

Introduction

The January 2003 NSF report "Revolutionizing Science and Engineering Through Cyberinfrastructure" led off with the observation that "multiple accelerating trends are converging and crossing thresholds in ways that show extraordinary promise …in how we create, disseminate, and preserve scientific and engineering knowledge. That study concluded that the "National Science Foundation should establish and lead a large-scale, interagency, and internationally coordinated Advanced Cyberinfrastructure Program (ACP) to create, deploy, and apply cyberinfrastructure in ways that radically empower all scientific and engineering research and allied education. It envisioned a program "to build more ubiquitous, comprehensive digital environments that become interactive and functionally complete for research communities in terms of people, data, information, tools, and instruments and that operate at unprecedented levels of computational, storage, and data transfer capacity.

The World Wide Web is an instrument for enhancing and extending human communication. With design choices tailored to the rapidly expanding Internet, the Web has achieved notable success in scientific communication as well as in commercial and business communication. The Web as it exists today has realized only a part of the potential that it is capable of providing. Today's Web is successful because humans have populated it with documents intended primarily for other humans to read; that is, the Web has provided a very social communications infrastructure.

The Semantic Web is an enhancement of the current Web to allow machine-processable data to span application boundaries in the same way that human-readable documents do currently. The Semantic Web extends the deployed infrastructure of HTTP (HyperText Transfer Protocol) [HTTP] and URIs (Universal Resource Identifiers) [URI]. In one of its several possible representations, data in the Semantic Web is exchanged using XML (Extensible Markup Language) [XML]. XML permits a single container to communicate both presentation markup for humans as well as semantic markup for data specific to software systems, including Semantic Web applications.

The goal of the Semantic Web initiative is as broad as that of the Web: to be a universal medium for data. It is envisaged to smoothly interconnect personal information management, enterprise application integration, and the global sharing of commercial, scientific and cultural data. It is in this way that the Semantic Web may be viewed as an infrastructure for supporting the objectives outlined by the January 2003 NSF report.

Research Agenda for Digital Libraries

Digital Libraries are an important component for shaping the infrastructure necessary for supporting the objectives outlined by the January 2003 NSF report. The following are a set of research areas for the Semantic Web which directly relate to the Digital Library community and is hoped will have a significant impact in addressing. This overlap will not be surprising to those familiar with Digital Libraries contribution the early design of the Semantic Web. These areas of overlapping research may be helpful in forming an agenda for Digital Libraries to facilitate the scientific research and collaboration in a global ubiquitous information space.

Enabling an Information Infrastructure

The Semantic Web aims to build on the Web's contribution of Uniform Resource Identifiers [URI] and resilience to failures of closed world assumptions while bringing additional tools to bear on the exercise of communicating human knowledge. From simple descriptive metadata to advanced rules and formulae permitting higher-level reasoning, the Semantic Web is providing the additional infrastructure to express both the context and structure of information as well as relationships among information in a form that supports machine processing.

Making available open, domain and cross-domain taxonomies and ontologies useful for classifying and organizing content is important. Providing persistent URIs for these concepts and making these available via common Web architecture means provides an important backbone for the enabling the Semantic Web.

Creating a Policy Aware Infrastructure

The computational machinery behind the Web is today somewhat like a small child in a shopping mall; it has no mechanism for distinguishing what sources of information to trust, no mechanism for identifying what is considered socially acceptable information to disclose, and only limited mechanism to act as a reliable transporter of information between third parties who have a more highly developed awareness of information provenance, reliability, and decay. The absence of these mechanisms will become a severe limitation on the long-term utility, and perhaps even viability, of Web search engines, including those that perform co-citation analyses to infer nodes of expertise. The introduction of trust awareness and social policy description mechanisms will provide the foundation for a set of services and applications we can only begin to imagine.

The development of a Policy Aware Infrastructure for the Web is required. The Semantic Web will only achieve its potential as an information space for the free flow of scientific and cultural information if its infrastructure supports a full range of fine-grained policy controls over the information contained in the Semantic Web. If we are going to entrust more of our knowledge to the Semantic Web, we must be assured that the Web will respect many more of the social agreements that we enforce in the physical world. For the Semantic Web includes not only freely available information, but also personal information and information available to a person or agent only as a result of its membership in groups. A policy-aware infrastructure -- one that gives information creators and users the types of control over information we have all become accustomed to in the physical world such as the ability to assert and exercise privacy and intellectual property rights -- will make the Semantic Web into a vibrant and humane environment for sharing knowledge and collaborating on wide range of intellectual enterprises.

Evolution and Translation of Vocabularies

An important goal of the Semantic Web is to address the problem that in the course of scientific (or any) endeavor, one changes the vocabularies one uses to organize, discover, and communicate. A given vocabulary may be refined, resulting in a need for migration from old to new. Communication between distinct groups using different vocabularies creates the need to create common vocabularies which optimally suit all involved. Semantic Web techniques should make this difficult process of creating new common vocabularies as easy as possible. The Semantic Web already removes confusion by giving each term a globally unique URI. OWL ontologies and rules languages allow relationships between old and new terms to be expressed. There is, however, little experience with the serious management of such evolution.

The Semantic Web needs to incorporate versioning and provenance within its foundation. Human understanding changes and statements that we once thought were accurate are later described to be inaccurate. However, the original statement should not be deleted from our corpus of human knowledge. The Semantic Web should not be required to forget that a statement was once believed to be a true statement. Versioning is such a common approach to representing discrete states of understanding that it warrants explicit treatment in the Semantic Web.

Web of Trust

Trust in the human social context is based on constantly evolving and adapting information. Two parties may trust each other based on a history of mutual interaction, based on formal contracts that in turn rely on other established systems (e.g. legal and legislative), and based on risk analysis of a failure of any party to perform as agreed. A trust language for the Semantic Web that is capable of representing these complex and evolving relationships will be crucial to our future ability to build software that behaves more in the manner of an intelligent assistant than a rote rules processor.

Data dissemination and usage rights description languages are especially important if intellectual property of any value is to be placed in the Web. Rights description must be aligned with legislative decision-making yet must reflect differences between autonomous communities. Similarly, data dissemination and reuse agreements must be able to express the detailed nuances of acceptable practice that social communities adopt and evolve.

Data access rules should be able to express trust based on peer relationships without having to enumerate peers. The objective is a Web in which anyone can publish information and assert distribution policies over each object at various levels of granularity. The distribution policies can be permitted to change over time, making more of the data available to others if desired. The Web should recognize and allow the expression of privacy concerns and ad-hoc collaboration arrangements. The goal is that more content is recorded once people have confidence that the system will protect their property rights.

While the early policy-capable tools that exist on the Web today were designed primarily for the HTTP/HTML interactions between individual users and Web servers (often in a commercial context), policy awareness on the Semantic Web raises a new set of challenges. These challenges include identifying the privacy interests of those who contribute intellectual content (as opposed to just browsing it), developing a model to represent access control needs of heterogeneous communities of research collaborators, and modeling the special needs of the digital repositories being deployed by academic libraries.

Information Flow, Synchronization, and Collaborative Life

Many tools used with collaborating groups today instrument the flow of data, information, and knowledge. For instance; data in a typical software development environment has dependencies recorded in Makefiles, while an annotated history of the development of code, data, and documentation files is kept in source code control systems. Collaboration involves a series of meetings which have related documents, agendas and minutes. Not only is scientific data important, but the record ideally includes the steps, social and automatic, by which the associated information evolved. These steps may include for example sign-off, delegation, consensus decision and peer review. Semantic Web tools will permit more such information to be captured and will simplify the combination of information for different sources. The Semantic Web will leverage such information, enabling the extraction of existing relationships and the discovery of new relationships which, when available in common Semantic Web form, will allow more powerful analysis of the past, and will support the development of more powerful tools in the future.

One of the challenges we will meet is to strike a balance between requiring authors to do more at the outset to make information machine processable, insisting that everything the machine could use to answer a question be recognized and identified by the (human) questioner, and leaving large quantities of information inaccessible to the machine.

Conclusion

A global ubiquitous information infrastructure is the goal of the Semantic Web. The Semantic Web is the natural evolution of Web technology. It brings richer, more descriptive data to the current Web, which can be used more effectively by software and applications - making people more productive. The successful realization of such an infrastructure will be supported by a science of information management that will yield new generations of knowledge environments. Digital Libraries are an important component for enabling such an infrastructure. The identification of a Digital Library Research Agenda whose goal is to facilitate enabling this infrastructure is important step. I look forward to working with the workshop members in helping establish such an agenda.

References

Atkins, Daniel E., et. al., "Revolutionizing Science and Engineering Through Cyberinfrastructure: Report of the National Science Foundation Blue Ribbon Advisory Panel on Cyberinfrastructure," January 2003, Available at http://www.communitytechnology.org/nsf_ci_report/.

[HTTP] "RFC 2616: Hypertext Transfer Protocol -- HTTP/1.1", J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee, June 1999. Available at http://www.ietf.org/rfc/rfc2616.txt.

[URI] "RFC 2396: Uniform Resource Identifiers (URI): Generic Syntax", T. Berners-Lee, R. Fielding, L. Masinter, August 1998. Available at http://www.ietf.org/rfc/rfc2396.txt.

[XML] "Extensible Markup Language (XML) 1.0 (Second Edition)", T. Bray, J. Paoli, C.M. Sperberg-McQueen, E. Maler, 6 October 2000. This W3C Recommendation is available at http://www.w3.org/TR/2000/REC-xml-20001006.

[ACKERMAN] Ackerman, M., Darrell, T., & Weitzner, D. J. (2001). Privacy in context. Human-Computer Interaction, 16, pp. 167-176. (privacy design considerations in pervasive computing environments)

[WEITZNER] Weitzner, D (2000) United States Senate Commerce Committee Hearing on Online Privacy (on P3P)

[BERMAN] Berman, J. & Weitzner, D., Abundance and User Control: Renewing the Democratic Heart of the First Amendment in the Age of Interactive Media, 104 Yale L.J. 1619 (1995) (introducing the idea of giving users control over policy-related aspects of the user experience)