<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Steve Cassidy</title>
	<atom:link href="http://web.science.mq.edu.au/~cassidy/wordpress/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://web.science.mq.edu.au/~cassidy/wordpress</link>
	<description>The days run away...</description>
	<lastBuildDate>Mon, 07 Nov 2011 23:21:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Updating the ICE Annotation System: Tagging, Parsing and Validation</title>
		<link>http://web.science.mq.edu.au/~cassidy/wordpress/?p=350</link>
		<comments>http://web.science.mq.edu.au/~cassidy/wordpress/?p=350#comments</comments>
		<pubDate>Tue, 01 Mar 2011 00:28:10 +0000</pubDate>
		<dc:creator>Steve Cassidy</dc:creator>
				<category><![CDATA[Language Resources]]></category>
		<category><![CDATA[annotation]]></category>
		<category><![CDATA[publication]]></category>

		<guid isPermaLink="false">http://web.science.mq.edu.au/~cassidy/wordpress/?p=350</guid>
		<description><![CDATA[Authors: Deanna Wong, Steve Cassidy and Pam Peters To appear in Corpora, expected publication in 2012. Manuscript available on request. The textual markup scheme of the International Corpus of English (ICE) corpus project evolved continuously from 1989 on, more or less independent of the Text Encoding Initiative (TEI). It was intended to standardise the annotation [...]]]></description>
			<content:encoded><![CDATA[<p>Authors: Deanna Wong, Steve Cassidy and Pam Peters</p>
<p>To appear in <a href="http://www.euppublishing.com/journal/cor">Corpora</a>, expected publication in 2012. Manuscript available on request.</p>
<p>The textual markup scheme of the International Corpus of English (ICE) corpus project evolved continuously from 1989 on, more or less independent of the Text Encoding Initiative (TEI). It was intended to standardise the annotation of all the regional ICE corpora, in order to facilitate inter-comparisons of their linguistic content. However this goal has proved elusive because of gradual changes in the ICE annotation system, and additions to it made by those working on individual ICE corpora. Further, since the project pre-dates the development of XML-based markup standards, the format of the ICE markup does not match that in many modern corpora and can be difficult to manipulate. As a goal of the original project was interoperability of the various ICE corpora, it is important that the markup of existing and new ICE corpora can be converted into a common format that can serve their ongoing needs, while allowing older markup to be fully included. This paper describes the most significant variations in annotation, and focuses on several points of difficulty inherent in the system: especially the non-hierarchical treatment of the visual and structural elements of written texts, and of overlapping speech in spontaneous conversation. We report on our development of a parser to validate the existing ICE markup scheme and convert it to other formats. The development of this tool not only brings the Australian version into line with the current ICE standard, it also allows for proper validation of all annotation in any of the regional corpora. Once the corpora have been validated, they can be converted easily to a standardised XML format for alternate systems of corpus annotation, such as that developed by the TEI.</p>
]]></content:encoded>
			<wfw:commentRss>http://web.science.mq.edu.au/~cassidy/wordpress/?feed=rss2&#038;p=350</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Notes on Conversion of GrAF to RDF</title>
		<link>http://web.science.mq.edu.au/~cassidy/wordpress/?p=330</link>
		<comments>http://web.science.mq.edu.au/~cassidy/wordpress/?p=330#comments</comments>
		<pubDate>Fri, 18 Feb 2011 07:32:47 +0000</pubDate>
		<dc:creator>Steve Cassidy</dc:creator>
				<category><![CDATA[Language Resources]]></category>
		<category><![CDATA[annotation]]></category>
		<category><![CDATA[dada]]></category>
		<category><![CDATA[rdf]]></category>

		<guid isPermaLink="false">http://web.science.mq.edu.au/~cassidy/wordpress/?p=330</guid>
		<description><![CDATA[The Graph Annotation Format (GrAF) is the XML data exchange format developed for the model of linguistic annotation described in the ISO Linguistic Annotation Framework (LAF). LAF is the abstract model of annotations represented as a graph structure, GrAF is an XML serialisation of the model intended for moving data between different tools. Both were [...]]]></description>
			<content:encoded><![CDATA[<p>The Graph Annotation Format (<a href="http://www.cs.vassar.edu/~ide/papers/LAW.pdf">GrAF</a>) is the XML data exchange format developed for the model of linguistic annotation described in the ISO Linguistic Annotation Framework (LAF).  LAF is the abstract model of annotations represented as a graph structure, GrAF is an XML serialisation of the model intended for moving data between different tools.  Both were developed by Nancy Ide and Keith Suderman in Vasser with input from the community involved in the<a href="http://www.tc37sc4.org/WG1/wg1.htm"> ISO standardisation process</a> around linguistic data.<span id="more-330"></span></p>
<p>Like the other candidate universal annotation models (e.g. Annotation Graphs and the model embodied by our own DADA system), LAF is a directed graph model. In this case, the graph connects nodes which are associated with one or more annotations with edges representing relations between nodes, by default, the parent-child relation.  This is almost exactly the same as the DADA model, but the minor differences have been tripping me up for a while as I&#8217;ve tried to understand LAF enough to write conversion filters to ingest data into the DADA system.</p>
<p>A visit to Vassar last week was the ideal opportunity to clear up my understanding.  As a step towards updating our GrAF to DADA ingestion process which is implemented as <a href="https://bitbucket.org/stevecassidy/dada/src/938811302db9/xsl/graf2dada.xsl">an XSL stylesheet</a>, I decided to write a stylesheet to convert GrAF into a fairly literal RDF model.  This allowed me to think about the interpretation of GrAF structures independent of their translation to the current DADA model.  I was concerned mainly with the structural elements of GrAF, rather than the annotation meta-data; this is equally important, but can be dealt with separately.</p>
<p>I armed myself with the latest version of the ISO LAF documentation and a copy of the manually annotated sub-corpus (<a href="http://www.anc.org/MASC/Home.html">MASC</a>) of the American National Corpus. This is a nice sized data set where the automatically generated annotations have been manually checked and corrected.</p>
<p>GrAF is an XML format for standoff markup, meaning that the annotation is stored in a separate file to the source text rather than being embedded in the text as is normal in TEI for example.  A single text has a number of associated XML annotation files, each containing a different kind of annotation. In the MASC corpus these include Penn Treebank, part of speech and named entity annotations.   A single .anc file acts as a master reference and contains pointers to the raw text as well as the other XML files.</p>
<p>GrAF defines five main elements to represent annotation structures: nodes, annotations, edges, links and regions.  The graph structure is made up of nodes and edges while regions define the parts of the source document being annotated.  The link element relates a region to a node and the annotation element defines an annotation structure that can be attached to a node or an edge.</p>
<h2>Identifiers</h2>
<p>One thing that&#8217;s required for an RDF representation is that each entity is denoted by a unique identifier (a URI).  Most but not all of the GrAF elements have identifiers denoted by the xml:id attribute so we can re-use this in the RDF representation prefixed with a suitable base URI.  In choosing a base URI it makes sense to generate one that denotes the collection as a whole, something like http://www.anc.org/MASC/spoken/RindnerBonnie (not a working URI although DADA could make it so).  So the first node in the Penn Treebank annotation for this document which has the xml:id ptb-n00000 would have the RDF identifier http://www.anc.org/MASC/spoken/RindnerBonnie#ptb-n00000.</p>
<p>An implication of this is that all identifiers need to be unique within the collection of XML files.  The GrAF specification doesn&#8217;t mandate this and the use of xml:id attributes will only ensure that the identifier is unique within the XML file.  As it happens, many of the identifiers in the MASC corpus are made unique by being prefixed by the annotation set name (ptb in the example). Some are not unique however and so to generate useful RDF we need to either generate our own unique identifiers or fix the original data.</p>
<p>One entity that doesn&#8217;t have an identifier is the annotation element.  Annotations are connected to either a node or an edge and use a &#8216;ref&#8217; attribute to indicate what they are attached to. To represent these in RDF, we generate a unique identifier for each one.</p>
<h2>Types</h2>
<p>We need RDF types for each of the entities being represented. The GrAF XML namespace URI can be repurposed to generate names for the types, e.g. http://www.xces.org/ns/GrAF/1.0/Node, abbreviated as graf:Node.  We use capitalised names for RDF types as per the convention.</p>
<p>A second a more tricky type issue is that of denoting the different kinds of annotation that are used in the corpus.  LAF avoids any reference to types because there is no consensus on what constitutes a type in this context. Instead it has the idea of an annotation set which gives a name to a group of annotations, for example Penn Treebank or Framenet.  Each name as an associated URI defined in an annotationSet definition in either the annotation XML file or the corpus header.  These aren&#8217;t formal namespace URIs, just a URI that would provide some information about the kind of annotation being used.</p>
<p>An annotation has the following form in the XML file:</p>
<pre>    &lt;a label="vchunk" ref="vc-n0" as="xces"&gt;
        &lt;fs&gt;
            &lt;f name="voice" value="active"/&gt;
            &lt;f name="tense" value="SimPre"/&gt;
            &lt;f name="type" value="FVG"/&gt;
        &lt;/fs&gt;
    &lt;/a&gt;</pre>
<p>Something I&#8217;ve been a little confused about is the meaning of the &#8216;label&#8217; attribute.  Following some discussion, it seems that the label is a kind of annotation type and that we can think of it as being within a &#8216;namespace&#8217; defined by the annotation set label (&#8216;xces&#8217; in this case).  The three features listed in the feature set can also be thought of as being in the same namespace. Hence we can translate this to RDF as a resource of type graf:Annotation and introduce a property graf:type to denote the type of annotation:</p>
<pre>&lt;id35803001&gt; a graf:Annotation;
    graf:type xces:vchunk;
    graf:annotates &lt;id35993621&gt;
    xces:voice "active";
    xces:tense "SimPre";
    xces:type  "FVG" .</pre>
<p>Note that we don&#8217;t translate the feature structure node into an RDF resource, feature structures map well directly to RDF properties and there is no sense in which the feature structure element has any status other than as a container for feature value pairs in the XML serialisation.</p>
<p>This all works well in most cases but there are a few instances in the MASC data that cause trouble.  In a small number of files there is no annotation set associated with some annotations (eg. in data/written/116CUL032-vc.xml). This means that there is no namespace to associate with the feature names. In<a href="http://www.xces.org/ns/GrAF/0.99/graf-0.99.rng"> the GrAF schema</a>, the annotation set is marked as an optional attribute, so this is not an error.  However, some way of assigning a default namespace to bare features like this is needed to convert to RDF.   I&#8217;d argue that someone converting annotations to GrAF should be forced to make a decision and give a name (URI) to their annotation set; in this way, the ownership of annotations is clear and we won&#8217;t get confused between two uses of the same feature name by different people.</p>
<p>A second complication comes when a feature name or annotation set label is not a valid QName (XML element name). This makes the conversion to XML/RDF difficult although in some cases the name may still be a valid RDF identifier (URI).  One example in the MASC data is a feature xmlns:xsi (eg. in ﻿﻿data/written/110CYL072-logical.xml), obviously translated literally from an XML instance.   In this case, one could argue that the feature isn&#8217;t really an annotation on the source data and so shouldn&#8217;t be included, but it raises the issue of what a valid identifier should be.  I think there&#8217;s a strong case for requiring all identifiers to be qualified names in the sense described by the <a href="http://www.w3.org/TR/REC-xml-names/#dt-qualname">XML Namespace standard</a>, not just because I want to convert them easily to RDF, but because the concept of URI based names is so powerful in standards like this one.  We already have an emerging data category registry (<a href="http://www.isocat.org/">ISOCat</a>) for names in the linguistic annotation space; this requirement would mesh well with the ISOCat facility to register names and would facilitate sharing of feature names and definitions.</p>
<p>In the style-sheet I&#8217;m writing now, I gloss over these two issues by generating a fake namespace URI where needed.</p>
<h2>Edges</h2>
<p>In LAF, edges define relations between nodes and represent structural relations, mainly the parent-child relations needed to represent hierarchical structure.  Edges can also have annotations attached to them and the main use-case for this is the need for relationship types other than the default parent-child; a co-reference relationship between two nodes would be represented by an edge with an attached annotation containing the type name as a feature value.  Both of these cases are best represented in RDF by a regular relationship of an appropriate type.  In the MASC corpus, there aren&#8217;t any examples of edges with attached annotations so all edges are converted to child relations by the stylesheet.  As an illustration, a resource of type graf:Edge is also created; an annotation could be attached to this in the same way as it is to a node.</p>
<h2>Regions</h2>
<p>Regions are the means by which nodes in the graph are attached to the source media that is being annotated.  All regions in the MASC corpus are defined by two character offsets stored in the <em>anchors</em> attribute.   The main issue with regions is not their representation in RDF but the choice of this kind of means of indicating location.   I&#8217;ll leave that for another discussion as it doesn&#8217;t impact on the choices made here to generate RDF from GrAF.</p>
<h2>Results</h2>
<p>The most interesting result of this exercise is some insight into the design of GrAF and a better understanding on my part of the structures used in that format.  However, we can also apply the stylesheet to the data in the MASC corpus to get a set of RDF/XML files.  These can be fed into a triple store and queried with SPARQL.</p>
<p>To give an idea of the size of the data, the original XML files consists of 3505944 lines and 108M of text.  This translates to 3,935,634 triples.  I loaded this into a Sesame triple store and was able to browse the data easily using the workbench interface. Just as an illustration, a sample SPARQL query to find Penn Treebank annotations related by the child relation looks like:</p>
<pre>PREFIX PTB:&lt;http://www.cis.upenn.edu/~treebank/&gt;
PREFIX graf:&lt;http://www.xces.org/ns/GrAF/1.0/&gt;
select ?parent ?plabel ?clabel
where {
        ?parent graf:child ?child .
        ?pann graf:annotates ?parent .
        ?pann graf:type PTB:tok .
        ?pann PTB:msd ?plabel .
        ?cann graf:annotates ?child .
        ?cann graf:type PTB:tok .
        ?cann PTB:msd ?clabel .
}</pre>
<p>This runs reasonably quickly via the workbench web interface and returns a long list of results such as:</p>
<table>
<thead>
<tr>
<th>Parent</th>
<th>Plabel</th>
<th>Clabel</th>
</tr>
</thead>
<tbody>
<tr>
<td>&lt;http://example.org/Article247_327/ptb-n00252&gt;</td>
<td>&#8220;PRP$&#8221;</td>
<td>&#8220;NN&#8221;</td>
</tr>
<tr>
<td>&lt;http://example.org/Article247_327/ptb-n00806&gt;</td>
<td>&#8220;PRP$&#8221;</td>
<td>&#8220;JJ&#8221;</td>
</tr>
<tr>
<td>&lt;http://example.org/Article247_327/ptb-n00973&gt;</td>
<td>&#8220;PRP$&#8221;</td>
<td>&#8220;JJ&#8221;</td>
</tr>
<tr>
<td>&lt;http://example.org/Article247_327/ptb-n00973&gt;</td>
<td>&#8220;PRP$&#8221;</td>
<td>&#8220;NNP&#8221;</td>
</tr>
<tr>
<td>&lt;http://example.org/Article247_327/ptb-n00370&gt;</td>
<td>&#8220;PRP$&#8221;</td>
<td>&#8220;JJ&#8221;</td>
</tr>
<tr>
<td>&lt;http://example.org/Article247_327/ptb-n00370&gt;</td>
<td>&#8220;PRP$&#8221;</td>
<td>&#8220;JJ&#8221;</td>
</tr>
</tbody>
</table>
<h2>Summary</h2>
<p>This has been a useful exercise in understanding the structure of GrAF and hopefully illustrating some of the advantages of an RDF translation, in particular the usefulness of proper identifiers for each of the objects being described.  I&#8217;ll take what I&#8217;ve learned here and modify the current GrAF ingestion scripts that are used to load annotations into the DADA triple store. Once that&#8217;s done I should be able to publish a sample DADA linked data interface to the MASC corpus. Watch this space for a link.</p>
<p>The stylesheet can be found in the DADA source tree: <a href="https://bitbucket.org/stevecassidy/dada/raw/ed91bc01605f/xsl/graf2rdf.xsl">graf2rdf.xsl</a> or check the <a href="https://bitbucket.org/stevecassidy/dada/">DADA project on Bitbucket</a> for a more recent version.</p>
]]></content:encoded>
			<wfw:commentRss>http://web.science.mq.edu.au/~cassidy/wordpress/?feed=rss2&#038;p=330</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>DADA Project Update</title>
		<link>http://web.science.mq.edu.au/~cassidy/wordpress/?p=323</link>
		<comments>http://web.science.mq.edu.au/~cassidy/wordpress/?p=323#comments</comments>
		<pubDate>Sun, 06 Feb 2011 10:13:24 +0000</pubDate>
		<dc:creator>Steve Cassidy</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[annotation]]></category>
		<category><![CDATA[dada]]></category>

		<guid isPermaLink="false">http://web.science.mq.edu.au/~cassidy/wordpress/?p=323</guid>
		<description><![CDATA[The DADA project is developing software for managing language resources and exposing them on the web. Language resources are digital collections of language as audio, video and text used to study language and build technology systems. The project has been going for a while with some initial funding from the ARC to build the basic infrastructure and later from [...]]]></description>
			<content:encoded><![CDATA[<p>The DADA project is developing software for managing language resources and exposing them on the web. Language resources are digital collections of language as audio, video and text used to study language and build technology systems. The project has been going for a while with some initial funding from the ARC to build the basic infrastructure and later from Macquarie University for some work on the Auslan corpus of Australian Sign Language collected by Trevor Johnston. Recently we have two projects which DADA will be part of, and so the pace of development has picked up a little. <span id="more-323"></span></p>
<p>The <a href="http://www.ausnc.org.au/">Australian National Corpus (AusNC) </a>is an effort to build a centralised collection of resources of language in Australia.  The core idea is to take whatever existing collections we can get permission to publish and make them available under a common technical infrastructure.  Using some funding from <a href="http://www.hcsnet.edu.au/">HCSNet</a> we build a small demonstration site that allowed free text search on two collections: the Australian Corpus of English and the Corpus of Oz Early English. We now have some funding to continue this work and expand both the size of the collection and the capability of the infrastructure that will support it. What we&#8217;ve already done is to separate the text in these corpora from their meta-data (descriptions of each text) and the annotation (denoting things within the texts).  While the pilot allows searching on the text the next steps will allow search using the meta-data (look for this in texts written after 1900) and the annotation (find this in the titles of articles).  This project is funded by the Australian National Data Service (ANDS) and is a collaboration with <a href="http://www.griffith.edu.au/arts-languages-criminology/school-languages-linguistics/staff/dr-michael-haugh">Michael Haug</a>h at Griffith.</p>
<p>The Big Australian Speech Corpus, more recently renamed <a href="http://austalk.edu.au/">AusTalk</a>, is an ARC funded project to collect speech and video from 1000 Australian speakers for a new freely available corpus.  The project involves many partners around the country each of who will have a &#8216;black box&#8217; recording station to collect audio and stereo video of subjects reading words and sentences, being interviewed and doing the Map task &#8211; a game designed to elicit natural speech between two people.   Our part of the project is to provide the server infrastructure that will store the audio, video and annotation data that will make up the corpus.  DADA will be part of this solution but the main driver is to be able to provide a secure and reliable store for the primary data as it comes in from the collection sites.  An important feature of the collection is the meta-data that will describe the subjects in the recording.  Some annotation of the data will be done automatically, for example some forced alignment of the read words and sentences.  Later, we will move on to support manual annotation of some of the data &#8211; for example transcripts of the interviews and map task sessions.   All of this will be published via the DADA server infrastructure to create a large, freely available research collection for Australian English.</p>
<p>Since the development of DADA now involves people outside Macquarie, we have started using a <a href="https://bitbucket.org/stevecassidy/dada">public bitbucket repository for the code</a>.  As of this writing the code still needs some tidying and documentation to enable third parties to be able to install and work on it, but we hope to have that done within a month.   The public DADA demo site is down at the moment due to network upgrades at Macquarie (it&#8217;s only visible inside MQ) &#8211; I hope to have that fixed soon with some new sample data sets loaded up for testing. 2011 looks like it will be a significant year for DADA. We hope to end this year with a number of significant text, audio and video corpora hosted on DADA infrastructure and providing useful services to the linguistics and language technology communities.</p>
]]></content:encoded>
			<wfw:commentRss>http://web.science.mq.edu.au/~cassidy/wordpress/?feed=rss2&#038;p=323</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>An RDF Realisation of LAF in the DADA Annotation Server</title>
		<link>http://web.science.mq.edu.au/~cassidy/wordpress/?p=300</link>
		<comments>http://web.science.mq.edu.au/~cassidy/wordpress/?p=300#comments</comments>
		<pubDate>Mon, 22 Feb 2010 03:50:18 +0000</pubDate>
		<dc:creator>Steve Cassidy</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[annotation]]></category>
		<category><![CDATA[publication]]></category>
		<category><![CDATA[rdf]]></category>

		<guid isPermaLink="false">http://web.science.mq.edu.au/~cassidy/wordpress/?p=300</guid>
		<description><![CDATA[The Linguistic Annotation Framework defines a generalised graph based model for annotation data intended as an interchange format for transfer of annotations between tools.   The DADA system uses an RDF based representation of annotation data and provides a web based annotation store.  The annotation model in DADA can be seen as an RDF realisation of [...]]]></description>
			<content:encoded><![CDATA[<p>The Linguistic Annotation Framework defines a generalised graph based<br />
model for annotation data intended as an interchange format for transfer<br />
of annotations between tools.   The DADA system uses an RDF based representation<br />
of annotation data and provides a web based annotation store.  The annotation<br />
model in DADA can be seen as an RDF realisation of the LAF model. This paper<br />
describes the relationship between the two models and makes some comments on<br />
how the standard might be stated in a more format-neutral way.</p>
<p>Download PDF:<a href="http://web.science.mq.edu.au/~cassidy/wordpress/wp-content/uploads/2010/02/paper.pdf"> An RDF Realisation of LAF in the DADA Annotation Server</a></p>
]]></content:encoded>
			<wfw:commentRss>http://web.science.mq.edu.au/~cassidy/wordpress/?feed=rss2&#038;p=300</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Ingesting the Auslan Corpus into the DADA Annotation Store</title>
		<link>http://web.science.mq.edu.au/~cassidy/wordpress/?p=298</link>
		<comments>http://web.science.mq.edu.au/~cassidy/wordpress/?p=298#comments</comments>
		<pubDate>Mon, 22 Feb 2010 03:48:48 +0000</pubDate>
		<dc:creator>Steve Cassidy</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[annotation]]></category>
		<category><![CDATA[auslan]]></category>
		<category><![CDATA[publication]]></category>
		<category><![CDATA[rdf]]></category>

		<guid isPermaLink="false">http://web.science.mq.edu.au/~cassidy/wordpress/?p=298</guid>
		<description><![CDATA[Steve Cassidy and Trevor Johnston. The DADA system is being developed to support collaborative access to and annotation of language resources over the web.  DADA provides a web accessible annotation store that delivers both a human browsable version of a corpus and a machine accessible API for reading and writing annotations.  DADA implements an abstract [...]]]></description>
			<content:encoded><![CDATA[<p>Steve Cassidy and <a href="http://web.mac.com/trevor.a.johnston/Site/Welcome.html">Trevor Johnston</a>.</p>
<p>The DADA system is being developed to support collaborative access to and annotation of language resources over the web.  DADA provides a web accessible annotation store that delivers both a human browsable version of a corpus and a machine accessible API for reading and writing annotations.  DADA implements an abstract model of annotation suitable for storing many kinds of data from a wide range of language resources.  This paper describes the process of ingesting data from a corpus of Australian Sign Language (Auslan) into the DADA system.  We describe the format of the RDF data used by DADA and the issues raised in converting the ELAN annotations from the corpus.  Once ingested, the data is presented in a simple web interface and also via a Javascript client that makes use of an alternate interface to the DADA server.</p>
<p>Download PDF: <a href="http://portal.acm.org/citation.cfm?id=1698409">Ingesting the Auslan Corpus into the DADA Annotation Store</a></p>
]]></content:encoded>
			<wfw:commentRss>http://web.science.mq.edu.au/~cassidy/wordpress/?feed=rss2&#038;p=298</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Arduino &amp; Physical Computing</title>
		<link>http://web.science.mq.edu.au/~cassidy/wordpress/?p=282</link>
		<comments>http://web.science.mq.edu.au/~cassidy/wordpress/?p=282#comments</comments>
		<pubDate>Mon, 25 May 2009 06:57:11 +0000</pubDate>
		<dc:creator>Steve Cassidy</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[arduino]]></category>
		<category><![CDATA[talk]]></category>

		<guid isPermaLink="false">http://web.science.mq.edu.au/~cassidy/wordpress/?p=282</guid>
		<description><![CDATA[I gave a talk last week introducing the Arduino platform to some MQ students and staff. It seemed to go well and there is a bit of interest in carrying on with a regular meetup in the Electronics labs &#8211; more details to come when we organise a time. Meanwhile, here are my slides from [...]]]></description>
			<content:encoded><![CDATA[<p>I gave a talk last week introducing the Arduino platform to some MQ students and staff. It seemed to go well and there is a bit of interest in carrying on with a regular meetup in the Electronics labs &#8211; more details to come when we organise a time.   Meanwhile, here are my slides from the talk, not that they&#8217;re very informative by themselves but I wanted to try out <a href="http://www.slideshare.net/">slideshare</a>.<span id="more-282"></span></p>
<div id="__ss_1474634" style="width: 425px; text-align: left;"><a style="font:14px Helvetica,Arial,Sans-serif;display:block;margin:12px 0 3px 0;text-decoration:underline;" title="Arduino and Physical Computing" href="http://www.slideshare.net/stevecassidy/arduino-and-physical-computing?type=powerpoint">Arduino and Physical Computing</a><object width="425" height="355" data="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=arduino-090522082625-phpapp01&amp;rel=0&amp;stripped_title=arduino-and-physical-computing" type="application/x-shockwave-flash"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=arduino-090522082625-phpapp01&amp;rel=0&amp;stripped_title=arduino-and-physical-computing" /><param name="allowfullscreen" value="true" /></object></p>
<div style="font-size: 11px; font-family: tahoma,arial; height: 26px; padding-top: 2px;">View more <a style="text-decoration:underline;" href="http://www.slideshare.net/">Keynote presentations</a> from <a style="text-decoration:underline;" href="http://www.slideshare.net/stevecassidy">Steve Cassidy</a>.</div>
</div>
<p>Some of the projects that we demo&#8217;d on friday were:</p>
<ul>
<li>James&#8217; tweet powered colour changing LEDs</li>
<li>James&#8217; temperature sensitive LED &#8211; breath on it and make it glow red.</li>
<li>Steve&#8217;s LCD twitter display &#8211; display your latest twitter post on a small LCD screen.</li>
<li>Steve&#8217;s Lego/Arduino hybrid driving lego motors with an Arduino motor shield</li>
<li>Steve&#8217;s three wheeled bot with IR distance sensors visualised using a processing program on the desktop</li>
<li>The <a href="http://diydrones.com/profiles/blog/show?id=705844%3ABlogPost%3A44817">Blimpduino</a> that floated around the room as I talked &#8211; not finished yet and inflated for the first time on Friday.</li>
</ul>
<p>Send me mail or leave a comment if you&#8217;re interested in playing with some of this stuff. We should have a few boards available when we meet in the lab, so you don&#8217;t need your own gear.</p>
]]></content:encoded>
			<wfw:commentRss>http://web.science.mq.edu.au/~cassidy/wordpress/?feed=rss2&#038;p=282</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using Robots to Teach Programming</title>
		<link>http://web.science.mq.edu.au/~cassidy/wordpress/?p=276</link>
		<comments>http://web.science.mq.edu.au/~cassidy/wordpress/?p=276#comments</comments>
		<pubDate>Wed, 28 Jan 2009 06:13:13 +0000</pubDate>
		<dc:creator>Steve Cassidy</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[project]]></category>

		<guid isPermaLink="false">http://www.ics.mq.edu.au/~cassidy/?p=276</guid>
		<description><![CDATA[This is a project idea for an Honours student or similar. Please contact me if you&#8217;d like to follow this up. I&#8217;ve been having fun with arduino boards lately; these are small single chip development boards which have input output lines that can read sensors and control motors etc. They are programmed in Wiring which [...]]]></description>
			<content:encoded><![CDATA[<p>This is a project idea for an Honours student or similar.  Please contact me if you&#8217;d like to follow this up. </p>
<p>I&#8217;ve been having fun with <a href="http://www.arduino.cc/">arduino</a> boards lately; these are small single chip development boards which have input output lines that can read sensors and control motors etc. They are programmed in <a href="http://wiring.org.co/">Wiring</a> which is really C with some sugar and libraries added.  I&#8217;ve been thinking that the arduino would make a nice platform to stimulate some interest in beginning programmers as a break from the usual run of problems that we set them.  This project would focus on developing a set of exercises suitable for a first or second programming class (I&#8217;m thinking COMP125) to develop some of the ideas explored there (data structures, simple algorithms) in a concrete context.  Part of the project would be building a suitable platform (I fancy a <a href="http://diydrones.com/profiles/blog/show?id=705844%3ABlogPost%3A44817">Blimpduino</a>) and then perhaps evaluating the use of the platform with real live first year students.</p>
]]></content:encoded>
			<wfw:commentRss>http://web.science.mq.edu.au/~cassidy/wordpress/?feed=rss2&#038;p=276</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A RESTful interface to Annotations on the Web</title>
		<link>http://web.science.mq.edu.au/~cassidy/wordpress/?p=271</link>
		<comments>http://web.science.mq.edu.au/~cassidy/wordpress/?p=271#comments</comments>
		<pubDate>Sun, 14 Sep 2008 12:22:36 +0000</pubDate>
		<dc:creator>Steve Cassidy</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[annotation]]></category>
		<category><![CDATA[publication]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://www.ics.mq.edu.au/~cassidy/?p=271</guid>
		<description><![CDATA[Annotation data is stored and manipulated in various formats and there have been a number of efforts to build generalised models of annotation to support sharing of data between tools. This work has shown that it is possible to store annotations from many different tools in a single canonical format and allow transformation into other [...]]]></description>
			<content:encoded><![CDATA[<p>Annotation data is stored and manipulated in various formats and there have been a number of efforts to build generalised models of annotation to support sharing of data between tools.  This work has shown that it is possible to store annotations from many different tools in a single canonical format and allow transformation into other formats as needed.  However, moving data between formats is often a matter of importing or exporting from one tool to another.  This paper describes a web-based interface to annotation data that makes use of an abstract model of annotation in its internal store but is able to deliver a variety of annotation formats to clients over the web.</p>
<p>Presented at the <a href="http://verbs.colorado.edu/LAW2008/">The 2nd Linguistic Annotation Workshop (The LAW II)</a> at LREC2008, Marrakech.<br />
<a href='http://www.ics.mq.edu.au/~cassidy/wordpress/wp-content/uploads/2008/09/paper.pdf'>Download PDF</a></p>
]]></content:encoded>
			<wfw:commentRss>http://web.science.mq.edu.au/~cassidy/wordpress/?feed=rss2&#038;p=271</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sparql Endpoint for Python WSGI</title>
		<link>http://web.science.mq.edu.au/~cassidy/wordpress/?p=269</link>
		<comments>http://web.science.mq.edu.au/~cassidy/wordpress/?p=269#comments</comments>
		<pubDate>Thu, 21 Aug 2008 12:07:14 +0000</pubDate>
		<dc:creator>Steve Cassidy</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[rdf]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://www.ics.mq.edu.au/~cassidy/?p=269</guid>
		<description><![CDATA[As part of DADA (and yes, that page is a bit out of date) I wanted to provide a Sparql endpoint to allow experimentation with querying the raw RDF annotation data. So far, we&#8217;ve built everything using Redland in Python but it seems there is no exsiting Sparql endpoint implementation for this combination. The Sparql [...]]]></description>
			<content:encoded><![CDATA[<p>As part of <a href="http://www.clt.mq.edu.au/Research/Projects/dada/">DADA</a> (and yes, that page is a bit out of date) I wanted to provide a <a href="http://www.w3.org/TR/rdf-sparql-protocol/">Sparql endpoint</a> to allow experimentation with querying the raw RDF annotation data.  So far, we&#8217;ve built everything using Redland in Python but it seems there is no exsiting Sparql endpoint implementation for this combination.   The Sparql protocol document is long but as far as I can tell the core of the protocol is a simple GET request with an encoded Sparql query, results are returned as raw XML in the special Sparql result format or as RDF/XML if the return type is a graph.  This proves to be very easy to implement on top of Redland since it&#8217;s query operator returns exactly those result types.</p>
<p>So, I present <a href='http://www.ics.mq.edu.au/~cassidy/wordpress/wp-content/uploads/2008/08/sparqlendpoint-01.zip'>SparqlEndpoint-0.1</a>, a python module that provides a WSGI conformant implementation of a Sparql Endpoint for Redland.  It almost certainly doesn&#8217;t implement all of the protocol standard and it can be improved no end, for example by making it independant of the RDF backend it queries (eg. using RDFlib).</p>
<p>I&#8217;m not putting up a demo endpoint just yet as I&#8217;m having severe performance issues with my development server in combination with Redland.  The triple store is growing rapidly to the millions of triples and the result is a huge latency (tens of minutes) to perform some queries.  Given some recent discussion on the Redland list I&#8217;m wondering whether a jump to one of the RDF specific stores is the thing to do. This would probably mean rewriting my code in Java but based on the <a href="http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/">Berlin Sparql Benchmark</a> numbers, Sesame and Jena have the kind of performance I need (sub second query response times on 100M triples).</p>
<p>Well, enough of that. If you are interested in SparqlEndpoint please download and take a look. If there is interest I&#8217;m happy to share it and host development somewhere accessible.</p>
]]></content:encoded>
			<wfw:commentRss>http://web.science.mq.edu.au/~cassidy/wordpress/?feed=rss2&#038;p=269</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>An Evaluation of Portfolio Assessment in an Undergraduate Web Technology Unit</title>
		<link>http://web.science.mq.edu.au/~cassidy/wordpress/?p=267</link>
		<comments>http://web.science.mq.edu.au/~cassidy/wordpress/?p=267#comments</comments>
		<pubDate>Mon, 03 Sep 2007 00:04:31 +0000</pubDate>
		<dc:creator>Steve Cassidy</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[publication]]></category>
		<category><![CDATA[teaching]]></category>

		<guid isPermaLink="false">http://www.ics.mq.edu.au/~cassidy/index.php/2007/09/03/an-evaluation-of-portfolio-assessment-in-an-undergraduate-web-technology-unit/</guid>
		<description><![CDATA[One of the perennial issues that is raised in student surveys is that of effective feedback. As part of our ongoing review of teaching, we identified feedback on assessment as a target area for 2007; this paper describes the evaluation of one strategy for improving this feedback that was implemented as part of an undergraduate [...]]]></description>
			<content:encoded><![CDATA[<p>One of the perennial issues that is raised in student surveys is that of effective feedback. As part of our ongoing review of teaching, we identified feedback on assessment as a target area for 2007; this paper describes the evaluation of one strategy for improving this feedback that was implemented as part of an undergraduate unit.</p>
<p>Paper to be presented at the <a href="http://science.uniserve.edu.au/workshop/conference.html">National UniServe Conference 2007</a>, Sydney, Australia. Download <a href="http://www.ics.mq.edu.au/~cassidy/wordpress/wp-content/uploads/2007/09/cassidy-schwitter-final.pdf">PDF</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://web.science.mq.edu.au/~cassidy/wordpress/?feed=rss2&#038;p=267</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

