Tag Archives: publication

Updating the ICE Annotation System: Tagging, Parsing and Validation

Authors: Deanna Wong, Steve Cassidy and Pam Peters

To appear in Corpora, expected publication in 2012. Manuscript available on request.

The textual markup scheme of the International Corpus of English (ICE) corpus project evolved continuously from 1989 on, more or less independent of the Text Encoding Initiative (TEI). It was intended to standardise the annotation of all the regional ICE corpora, in order to facilitate inter-comparisons of their linguistic content. However this goal has proved elusive because of gradual changes in the ICE annotation system, and additions to it made by those working on individual ICE corpora. Further, since the project pre-dates the development of XML-based markup standards, the format of the ICE markup does not match that in many modern corpora and can be difficult to manipulate. As a goal of the original project was interoperability of the various ICE corpora, it is important that the markup of existing and new ICE corpora can be converted into a common format that can serve their ongoing needs, while allowing older markup to be fully included. This paper describes the most significant variations in annotation, and focuses on several points of difficulty inherent in the system: especially the non-hierarchical treatment of the visual and structural elements of written texts, and of overlapping speech in spontaneous conversation. We report on our development of a parser to validate the existing ICE markup scheme and convert it to other formats. The development of this tool not only brings the Australian version into line with the current ICE standard, it also allows for proper validation of all annotation in any of the regional corpora. Once the corpora have been validated, they can be converted easily to a standardised XML format for alternate systems of corpus annotation, such as that developed by the TEI.

An RDF Realisation of LAF in the DADA Annotation Server

The Linguistic Annotation Framework defines a generalised graph based
model for annotation data intended as an interchange format for transfer
of annotations between tools.   The DADA system uses an RDF based representation
of annotation data and provides a web based annotation store.  The annotation
model in DADA can be seen as an RDF realisation of the LAF model. This paper
describes the relationship between the two models and makes some comments on
how the standard might be stated in a more format-neutral way.

Download PDF: An RDF Realisation of LAF in the DADA Annotation Server

Ingesting the Auslan Corpus into the DADA Annotation Store

Steve Cassidy and Trevor Johnston.

The DADA system is being developed to support collaborative access to and annotation of language resources over the web.  DADA provides a web accessible annotation store that delivers both a human browsable version of a corpus and a machine accessible API for reading and writing annotations.  DADA implements an abstract model of annotation suitable for storing many kinds of data from a wide range of language resources.  This paper describes the process of ingesting data from a corpus of Australian Sign Language (Auslan) into the DADA system.  We describe the format of the RDF data used by DADA and the issues raised in converting the ELAN annotations from the corpus.  Once ingested, the data is presented in a simple web interface and also via a Javascript client that makes use of an alternate interface to the DADA server.

Download PDF: Ingesting the Auslan Corpus into the DADA Annotation Store

A RESTful interface to Annotations on the Web

Annotation data is stored and manipulated in various formats and there have been a number of efforts to build generalised models of annotation to support sharing of data between tools. This work has shown that it is possible to store annotations from many different tools in a single canonical format and allow transformation into other formats as needed. However, moving data between formats is often a matter of importing or exporting from one tool to another. This paper describes a web-based interface to annotation data that makes use of an abstract model of annotation in its internal store but is able to deliver a variety of annotation formats to clients over the web.

Presented at the The 2nd Linguistic Annotation Workshop (The LAW II) at LREC2008, Marrakech.
Download PDF

An Evaluation of Portfolio Assessment in an Undergraduate Web Technology Unit

One of the perennial issues that is raised in student surveys is that of effective feedback. As part of our ongoing review of teaching, we identified feedback on assessment as a target area for 2007; this paper describes the evaluation of one strategy for improving this feedback that was implemented as part of an undergraduate unit.

Paper to be presented at the National UniServe Conference 2007, Sydney, Australia. Download PDF.

Version Control for RDF Triple Stores

RDF, the core data format for the Semantic Web, is increasingly being deployed both from automated sources and via human authoring either directly or through tools that generate RDF output. As individuals build up large amounts of RDF data and as groups begin to collaborate on authoring knowledge stores in RDF, the need for some kind of version management becomes apparent. While there are many version control systems available for program source code and even for XML data, the use of version control for RDF data is not a widely explored area. This paper examines an existing version control system for program source code, Darcs, which is grounded in a semi-formal theory of patches, and proposes an adaptation to directly manage versions of an RDF triple store.

Paper presented at ICSOFT 2007, Barcelona, Spain, July 2007. Download PDF