A Post by Michael B. Spring

(A list of all posts by M.B. Spring)

Scholarship in a Digital World (December 1, 2007)

I just went back to read the report of a workshop on "Building the Infrastructure for CyberScholarship". The workshop was funded by NSF and sets out a roadmap for research for the next decade or so. The work is solid, but it left me feeling like I sometimes do with my students. Good answer, but the wrong question was asked. Let me be a little clearer about what I mean. The findings of the workshop make a lot of sense, but in some ways they are too driven by a shared blurred vision. Ok, still not clear. A number of workshop participants are notable researchers doing great work in particular areas -- and they have been for a decade or so. I get the sense that as the workshop went on, some of the participants were trying to understand the visions of others so as to prepare a plan for what needs to be done next. The problem is that they were talking about different aspects of a big problem and trying to develop solutions that solved all the problems. This is a situation in which I say to my students, "don't just do something, stand there", which is my second most favorite piece of advice. You guessed it, the first is "don't just stand there, do something". The secret is knowing which to do first.

OK, let me try to say a little bit about what I am thinking. First of all, we should be talking about scholarship, not cyberscholarship. I would hold that while some aspects of computational scholarship change in a digital environment, this is far form the top of the list of what people are talking about here. In talking about scholarship, what are the new opportunities provided? My guess is that there are about a dozen and that segmenting the problem into the component pieces, we have a better chance of building solutions that make sense. Without an effort to be comprehensive, here is my starting list, beginning with the low hanging fruit:

  1. Large Data Sets
    1. Large Symbolic Data Sets: We are entering an era when scientists have enormous data sets from multiple sources that we want to work with. I would place in this category things like the human genome project. Also in this category would be many of tge GIS data sets. These represent data sets of unparalelled size that are manipulable by computer processing.
    2. Aggregated Large Data Sets: I am thinking here of sets that are not necessarily collected as a large data set but that might be usefully processed as aggregate data sets. While the end use demands of these sets is similar to the first category, these represent a different kind of problem on the front end which is deciding on common semantics for the data or providing for tanslations in aggregation.
    3. Large Raster Data Sets: These may provide the largest sets of data, whether they be as esoteric as Hubble space telescope images or as mundane as MRI's of knees. There are significant problems related to image processing and normalization of image data that need to be addressed here.
  2. Digital Collections of Results
    1. Live Reports: Readers of research frequently wonder how a given conclusion in an article might chnage if data were manipulated differently. It is now possible to have live data associated with a final report such that what if questions could be asked
    2. Undiscovered Public Information: Much of the scholarly literature is located in silo's such that a search in one space fails to find infomration stored in another space. It should be possible to create massive cross domain indexes that reduce the potential of public infomration being undiscovered
    3. Collaborative Documents: It is possible in digital environments to have documents serve as the center of collaboration. As new document forms based on XML emerge, it will be possible for researchers to truly collaborate on some activity through the documents that support it.
  3. Aggregate Information
    1. Social Tagging: There are a number of recent developments that allow the astute observer to gather information about artifacts based on distributed and uncontrolled human observations. For example flickr allows us to harvest human tags associated with images without any central control point.
    2. Social Rating: From Delicious to linked in, users are expressing their opinions about and their ratings of various resources. These can tell us how people rate materials, botth directly and indirectly. This information can be mined and used in new ways.
    3. Social Systems: Social systems themselves provide an opportunity for study of new kinds of relationships and new forms of social behavior.
  4. Your Thing -- what have I left out??
    1. ???:
    2. ???:
    3. ???: