June 15 - 17, 2003   
Wequassett Inn, Cape Cod   
Chatham, Massachusetts   
NSF/JISC Workshop
 
General
Welcome
Background
Agenda
References
Important Dates
Participants List
   
OUTREACH
China 2004
Bangalore 2005
   
For Contributors
Call For Papers
Papers
Breakout Reports
Final Report
Opening Plenary Session
Supplementary Contributions
   
For Participants
Expense Form
Accommodation
Tourist info
Travel
   
Organization
Sponsors
Contacts
   

 

   
Papers  
   
Culture and Cyberinfrastructure: the need for a cultural informatics
 
   
   

Gregory Crane, Tufts University, Gregory.Crane@Tufts.edu
Download: PDF Version    WORD Version

What are we trying to do? – the Challenge of Cultural Communication

I would like to focus on the problem of cultural communication: understanding cultures, our own and those of others very different from us, and communicating our understanding both of ourselves and of others to the rest of the world.

The importance of such cultural communication cannot be overstated. The war on terror is a war about culture — while material causes such as oil or land may contribute to the problems that we face, the acts of violence that we confront are only the most acute (and overtly destructive) symptoms of a broader conflict between cultures in a changing world. Such cultural conflicts extend well beyond the West and Islam (e.g., ethnic terrorism in Rwanda and Congo) and have been, in one form or another, endemic to human history, but the threats to U. S. security have clearly escalated. And if the war on terror has influenced our view of cultures, we must remember that culture stands at the core of human existence. Preserving and disseminating cultural heritage is a core responsibility for any government.

The challenges posed by terrorism have risen high on the U. S. agenda. Programs such as DARPA’s Translingual Information Detection, Extraction and Summarization (TIDES) have acquired a new urgency, as intelligence agencies attempt to extract useful information from vast bodies of data. At the same time, the administration’s proposed “We the people initiative” (http://www.wethepeople.gov/) — a $100 million three year NEH effort – addresses the problem of articulating American history and culture. On its homepage Bruce Cole, NEH Chairman, explicitly cites the attacks on September 11as a motivating cause for this effort. Many U. S. government institutions, including the Library of Congress, the Institute for Museum and Library Services, the National Endowment for the Arts, the Smithsonian Institution, the National Park Service and others, constantly labor to improve and disseminate knowledge about American history and culture.

Nevertheless, the rise of information technology has not only opened up new ways of analyzing cultural heritage materials but has created the need for a new area of study. Possible labels for this new field include “computing and the humanities” and “computational humanities” but I would suggest “cultural informatics” or some other term that emphasizes culture in the broadest possible sense, including not only the humanities and cultural heritage materials but regarding all forms of current culture as well.

Consider two broad applications:

Cultural awareness: How do we track changes in cultural values and attitudes not only in the past but in real time? Such a task is immense, for it assumes systems that can manage hundreds, if not thousands, of languages and cultural groups. We need to stress the importance of culture and language alike — information extraction may track telephone numbers or military appointments (e.g., General X is now in command of facility Y) but in this case, multiple languages describe trans-cultural objects: the information is the same and the language serves as a semi-transparent delivery medium.
But the values which motivate humans most strongly – virtue, justice, even love and friendship – are not only fluid and resistant to reductive definition in any one culture, but vary substantially from one culture to another. Most inhabitants of the Western world shook their head when Osama Bin Laden cited a prophetic dream during a recorded video. Those of us who have worked with ancient Near Eastern or even classical Greek literature, however, realized that Bin Laden was following ancient habits of thought. We need instruments to assess culturally specific attitudes that are foreign to the discourse of international technocratic elites. We need to be able to assess how typical a particular sentiment may be (e.g., how commonly do conservative Saudis use dreams in this way? Is Bin Laden typical? Or is he self-consciously striking an archaic pose?)

Since small numbers of determined individual can potentially wreak immense havoc, we cannot predict where the next threat will originate. Security concerns thus augment broader interests in culture, demanding systems that can automatically analyse cultural materials in radically new ways.

Consider as one particular application of general semantic analysis a visualization system that tracked intensity of feelings about a particular topic across the globe. The system would take a range of textual materials from many different sources and languages and produce a measure of the feelings on a given topic. The system might use temperature gradients as a visualization metaphor, allowing viewers to track hostility towards the United States as it ebbs and flows across the world. Such a system could provide data for strategic planning, allowing the United States to identify problem areas as they emerge and before they generate violent attacks.

Such semantic analysis may be feasible for a constrained set of genres and languages, but a truly scalable system would not only have to manage a dizzying variety of languages but would also have to assess the relative rhetorical intensity of a given source: the same terminology can have radically different significance in a restrained bureaucratic document and in a hyperbolic partisan news story.

Spatial customization: Computing may become ubiquitous and the capitalism that shapes all of our lives pushes us to transcend our local surroundings, but many, if not most, human beings treasure a strong sense of place. The perceived rootlessness of capitalist and especially American life is arguably one of the most powerful concepts behind opposition to globalization in general and the United States in particular.

Connecting history to place has, historically, been a difficult task. Those who live in long established communities may have only the dimmest understanding about the generations who lived, struggled, suffered, laughed, walked and died in the areas through which they now walk each day. We can, however, imagine systems that would deliver customized information to individuals as they move through a particular space. A system may alert the visitor that an object of interest (a Greek revival house built in 1847, a statue commemorating a hero of the anti-slavery movement, the home of a jazz musician) is only a few steps away, provide a survey of the names and occupations of those who lived in a particular building, historic images of their current location or dynamic VRML representations of the location in a particular period.

Such dynamic spatially customized data clearly has broad military and commercial applications, but military and commercial needs, however acute or potentially lucrative, are subsets of cultural analysis. Note that, as with purely textual information extraction, some crucial questions may be relatively easy to identify (e.g., location of possible ambush points, identification of customers for a particular product) but long term success may depend upon access to much more diffuse information about local practice and knowledge.

How do we accomplish cultural communication today?

At present, thousands of researchers in a variety of disciplines study cultures past and present. The vast majority of these researchers distribute the results of their work in diffuse papers (such as this position paper) and monographs. While almost all make use of electronic text processing and search various resources, most content themselves with basic searches of fragmented data sources.

What is new and why do we think it will be successful?

Such venues as the Text Retrieval Evaluation Conference, the Document Understanding Conference, Automatic Context Extraction, Cross Language Evaluation Forum, etc. are now documenting the capabilities of a range of emerging language technologies. If we could simply apply to conventional stuudy the state of the art in technologies such as automatic multi- and single-document summarization, question answering, clustering, cross language information retrieval, named entity and relation detection, sense disambiguation, etc., we could revolutionize the ways in which we analyze culture.

Emerging language technologies have, however, been developed largely by engineers to work with modern genres (e.g., news publications produced and edited by professional journalists). We need to adapt existing techniques and develop new approaches to work with the diffuse genres from the cultures and languages of the world.

Assuming we are successful, what difference does it make?

Increased cross-cultural understanding/exchange can increase communication, reduce the impulse to violence and expand socio-economic connections. Thus, potential benefits include enhanced global peace and prosperity.

How long will it take and how will we know when we get there?

Cultural cyberinfrastructure requires two fundamental components.

First, we need knowledge resources. Some of these exist in digital form and can be federated into a useful resource network. Some digital sources will prove sufficiently inconsistent in form and/or quality that they need to be recreated. Most historical sources remain available only in print form. Many of the most useful potential sources are reference works that are expensive to enter (e.g., city directories) and have rich, but complex formats (e.g., dictionaries, grammars, encyclopedias). Automation can potentially reduce the costs of large-scale scanning to very low levels, but some tasks may require carefully entered and marked-up knowledge sources (c. $500-$1,000/mbyte). The DARPA TIDES program has studied the problem of boot-strapping new languages. Our initial estimate, based on our own experiences with a handful of languages, leads us to suggest that $100 million would probably allow us to provide consistent initial coverage 100 major culture/language clusters (thus allowing us to consider Indian/English, UK/English, US/English). Actual costs would depend upon factors such as the complexity of the language, the availability of core resources (e.g., bi-lingual corpora, lexica, etc.) in print and/or electronic form, etc. The time required will depend upon the range of services developed: cross-language information retrieval and very rough machine translation may require relatively little time, but each language/culture cluster will pose its own problems. Nevertheless, three years should be enough for a competent team to provide basic language/culture services. Assuming two years of preparation, three cycles of grant competitions for language/culture teams, and the usual extensions/delays, we should be able to see major results in five years and a solid 100 language/culture infrastructure in 10.

Second, we need researchers who can assess the latest findings of various cultural disciplines and of emerging information technologies. We do not yet have the institutional framework to train the researchers with professional training in the analysis of culture and the possibilities of emerging technologies, language and otherwise. Cultural disciplines have not developed a strong cultural informatics – and these disciplines may never, as currently constituted, have the resources to support such a sub-discipline.

In an ideal scenario, departments in computer science, information science and various cultural domain disciplines would all train students and support faculty concentrating on various elements of cultural informatics. NSF support for cultural informatics should be aimed at helping this discipline develop the scientific and engineering rigor that it needs. While it will take five to ten years to develop a solid core of new researchers, the results of DLI-2 support suggest that we should begin to see research results within one to two years.