Tag Archives: rdf

Notes on Conversion of GrAF to RDF

The Graph Annotation Format (GrAF) is the XML data exchange format developed for the model of linguistic annotation described in the ISO Linguistic Annotation Framework (LAF). LAF is the abstract model of annotations represented as a graph structure, GrAF is an XML serialisation of the model intended for moving data between different tools. Both were developed by Nancy Ide and Keith Suderman in Vasser with input from the community involved in the ISO standardisation process around linguistic data. Continue reading

An RDF Realisation of LAF in the DADA Annotation Server

The Linguistic Annotation Framework defines a generalised graph based
model for annotation data intended as an interchange format for transfer
of annotations between tools.   The DADA system uses an RDF based representation
of annotation data and provides a web based annotation store.  The annotation
model in DADA can be seen as an RDF realisation of the LAF model. This paper
describes the relationship between the two models and makes some comments on
how the standard might be stated in a more format-neutral way.

Download PDF: An RDF Realisation of LAF in the DADA Annotation Server

Ingesting the Auslan Corpus into the DADA Annotation Store

Steve Cassidy and Trevor Johnston.

The DADA system is being developed to support collaborative access to and annotation of language resources over the web.  DADA provides a web accessible annotation store that delivers both a human browsable version of a corpus and a machine accessible API for reading and writing annotations.  DADA implements an abstract model of annotation suitable for storing many kinds of data from a wide range of language resources.  This paper describes the process of ingesting data from a corpus of Australian Sign Language (Auslan) into the DADA system.  We describe the format of the RDF data used by DADA and the issues raised in converting the ELAN annotations from the corpus.  Once ingested, the data is presented in a simple web interface and also via a Javascript client that makes use of an alternate interface to the DADA server.

Download PDF: Ingesting the Auslan Corpus into the DADA Annotation Store

Sparql Endpoint for Python WSGI

As part of DADA (and yes, that page is a bit out of date) I wanted to provide a Sparql endpoint to allow experimentation with querying the raw RDF annotation data. So far, we’ve built everything using Redland in Python but it seems there is no exsiting Sparql endpoint implementation for this combination. The Sparql protocol document is long but as far as I can tell the core of the protocol is a simple GET request with an encoded Sparql query, results are returned as raw XML in the special Sparql result format or as RDF/XML if the return type is a graph. This proves to be very easy to implement on top of Redland since it’s query operator returns exactly those result types.

So, I present SparqlEndpoint-0.1, a python module that provides a WSGI conformant implementation of a Sparql Endpoint for Redland. It almost certainly doesn’t implement all of the protocol standard and it can be improved no end, for example by making it independant of the RDF backend it queries (eg. using RDFlib).

I’m not putting up a demo endpoint just yet as I’m having severe performance issues with my development server in combination with Redland. The triple store is growing rapidly to the millions of triples and the result is a huge latency (tens of minutes) to perform some queries. Given some recent discussion on the Redland list I’m wondering whether a jump to one of the RDF specific stores is the thing to do. This would probably mean rewriting my code in Java but based on the Berlin Sparql Benchmark numbers, Sesame and Jena have the kind of performance I need (sub second query response times on 100M triples).

Well, enough of that. If you are interested in SparqlEndpoint please download and take a look. If there is interest I’m happy to share it and host development somewhere accessible.

Version Control for RDF Triple Stores

RDF, the core data format for the Semantic Web, is increasingly being deployed both from automated sources and via human authoring either directly or through tools that generate RDF output. As individuals build up large amounts of RDF data and as groups begin to collaborate on authoring knowledge stores in RDF, the need for some kind of version management becomes apparent. While there are many version control systems available for program source code and even for XML data, the use of version control for RDF data is not a widely explored area. This paper examines an existing version control system for program source code, Darcs, which is grounded in a semi-formal theory of patches, and proposes an adaptation to directly manage versions of an RDF triple store.

Paper presented at ICSOFT 2007, Barcelona, Spain, July 2007. Download PDF

More RDF Query/Path Stuff

The RDF query/manipulation proposals are coming out of the
woodwork on www-rdf-rules:

  • XR does RDF extraction from XML
  • Rx4RDF is ” a specification and reference implementation for querying, transforming
    and updating W3C’s RDF by specifying a deterministic mapping of the RDF
    model to the XML data model defined by XPath.”