Development of a Parsing System

Next: Development of the Up: Methodology Previous: Development of a

Development of a Parsing System

Since a large proportion of potential balloters in the test had no direct access to CASCADE, a mail parser was developed to handle comments submitted by e-mail. The mail parser performed the administrative function of associating comments with the location in the source document to which they referred. Instructions were sent with the invitation to ballot that asked the balloters to use a format parseable by an automatic mail parser. Essentially each mailnote would consist of one or more comments and each comment was to begin with a line that identified the location to which the comment was related. General comments were also allowed, and were accumulated in a separate file. If notes contained anything other than a header before the beginning of the first comment, these parts were accumulated in a separate file.

Comments processed by the mail parser are processed in five stages:

the mailbox file was read and separated into mailnotes
mailnotes were parsed into comments
comments were analyzed to identify the proper location for insertion
comments were classified by type
comment links were placed in the source files.

A mailbox file consists of mailnotes each of which is composed of a header and one or more comments and/or objections. The mailbox file was broken into its component mailnotes and each was named and stored in a separate file. Each mailnote was in turn broken into its component comments and objections each of which was given a unique file name. The name of each comment file identified the sender's username, the arrival time of the note, and the location in the document referred to by the comment.

For the placement stage the section numbers were associated with the appropriate Scribe source file. Each of the files contained one chapter or appendix of the standard. Comments were inserted in the Scribe source files (as Scribe comments) at the section corresponding to the given location, so that the chapter editors could have them in view when editing the document. The comment anchors were marked as Scribe comments so that updated ASCII versions could be produced without losing the inserted comments.

The mail parser relates the ASCII file and the Scribe files by:

mapping page number and line number to section (This is done by building a table (the page-section index) that marks the page and line location of each chapter, section, subsection, etc. of the document.). The original ASCII file (the one sent to the balloters), i.e. the one they refer to, has to be kept for that purpose;
mapping chapter or appendix numbers to files via a chapter-file index that maps chapter numbers to Scribe file names;
mapping section, subsection, etc. to specific locations via an index of offsets into each Scribe file that relates section locations to ``safe'' character offsets in the file.

There are two cases in which parts of notes can not be linked to their corresponding locations. Either a location identifier does not comply with the format or the given location does not exist in the actual document. In both cases the comment is placed in a ``location not found'' file.

The anchors to the comments are represented as buttons. The button contains the name of the author of the comment to which the anchor is linked and an identification of the corresponding document section (see Figure 3).

Next: Development of the Up: Methodology Previous: Development of a

Michael Spring
Tue Apr 23 13:23:13 EDT 1996