RDF, the core data format for the Semantic Web, is increasingly being deployed both from automated sources and via human authoring either directly or through tools that generate RDF output. As individuals build up large amounts of RDF data and as groups begin to collaborate on authoring knowledge stores in RDF, the need for some kind of version management becomes apparent. While there are many version control systems available for program source code and even for XML data, the use of version control for RDF data is not a widely explored area. This paper examines an existing version control system for program source code, Darcs, which is grounded in a semi-formal theory of patches, and proposes an adaptation to directly manage versions of an RDF triple store.
I’ve been using screencasts again this year in COMP249 (Web Technology) and have settled on a fairly stable way of producing them using Camtasia on Windows. This post is here as a container for the videos that I’ve produced this year so they can have a life outside of COMP249 as that website is updated.
I have a PhD scholarship available for a project in applying Semantic Web technologies (RDF, Sparql, Annotea) to the Linguistic Annotation problem. Here’s an outline:
Shared collaborative distributed annotation using semantic web technologies.
The Semantic Web augments the current Web with machine-processable information enabling humans and machines to work in cooperation; in our context, we are using it as the basis of a linguistic annotation system that is used by language researchers to annotate language resources. This project will look at the issues raised when we allow many people to collaborate on authoring these annotations and making shared annotations available to a community of researchers. This crosses a number of existing areas of research including the semantic web and social computing, and extends the range of interactions available to researchers over the web.
Of course, as usual there is scope for variation on this theme, if you’re interested in this problem space and want to pursue a PhD in Australia, please get in touch. The scholarship is open to Australians and International students.
Update: Unfortunately this scholarship is no longer available, however Macquarie does have an active scholarships program and from time to time new scholarships are available that could cover this research area. Please check the Macquarie scholarships page for current details and feel free to contact me if you’d like to discuss options.
This is just to welcome any COMP249 (Web Technology) students who might visit following my link from the lecture notes. You’re all welcome to look around at my truly random thoughts.
Here’s an excellent video talking about text, hypertext, touching on the internals of HTML and XML and how Web 2.0 has changed the role of the reader. The web is using us, to tag, classify and label the stuff we write so that we can find it.
So today I make my TV debut! A few weeks ago a film crew from Channel 10 came to shoot a segment for the CSIRO/Channel 10 kids science show SCOPE. The episode, on sound, airs today at 4pm.
I had great fun making the segment, I’ve never done anything like this before and it’s amazing how much work goes in to producing such a short piece. I can’t wait to see how it turns out.
If you watched the show and are interested in having a look at speech you might want to download one of the programs I used in the show. The WaveSurfer tool (you want to get the Binary release for windows from this page). Wavesurfer will let you record your voice and see the spectrogram patterns like the ones I was looking at on the show (you’ll need a microphone for your computer, a cheap headset will do). To get a good looking display, select “New” from the File menu and choose “Demonstration” when asked what configuration to use. Then press the red record button and speak into the microphone.
Here’s an experiment to try: record yourself saying “hid”, “hod”, “head”, “had”. Look at the spectrogram of each word and see if you can tell the difference. Look particularly for the brigher bands in the display — these are called formants and they’re different for every vowel sound.
Another experiment: record two children and an adult saying the same word, for example “SCOPE”. Can you tell the difference between them? Which looks more similar, the children’s voices or one of the children and the adult?
Please leave a comment if you’ve seen the show!
John Udell is taggins some of his del.icio.us links to podcasts with transcriptavailable, transcripts have been generated manually. This could be a nice source of data for experiments with information retrieval from podcasts.
Sort of relatedly, I just discovered LibriVox which hosts volunteer recordings of out of copyright literary works (eg. Project Gutenberg books). I sampled War of the Worlds and the quality seems great. Worth a browse.
My brothers are way more productive than me when it comes to generating cool websites. Via various routes we’ve all ended up working on the web, Patrick on web design and more recently selling baby gifts, Mike on online music stores and other bits of new media goodness. Me, I just teach it and build sites and tools for relatively small groups of people. Between the three of us we could take over the world.
This is a potential project idea for an Honours or Masters student. It might also form the core of a PhD project.
I have an ongoing project looking at processing speech recorded in meeting rooms. There are a number of student projects which could be built around this data. Here are some possible projects suitable for Honours
- Tracking speakers through a series of meetings. Given a series of meetings with a stable but changing group of people, we would like to model the speakers in the meetings and for any meeting decide who is present and mark their speech turns. This would involve working with audio signals and building speaker models as well as working with the results of an existing speech segmentation system.
- Integrating multiple microphone signals to improve speaker segmentation. This will involve quite a bit of low level audio processing so would be suitable for someone who had an interest in numerical algorithms. There is some existing code to build upon so you wouldn’t be starting from scratch.
Annotation – Spoken Word Services is another project that is providing web based annotation of audio recordings, this time in a learning environment.