The Linked Data/RDF/SPARQL Documentation Challenge finally convinced me that there is something to blog about that actually might be of interest to others (although it is rather unlikely that I will be doing this on a regular basis). Now, with this post’s title I do not want to imply that either RDF or SPARQL are trivial, but that it is dead simple to set up a PHP environment to get your feet wet.
I took my first steps fiddling with Linked Data using RDFLib for Python, but for some reason I always end up using PHP as the programming language of my choice. For me, there are basically two options when it comes to processing RDF with PHP5 (there obviously are more, such as RAP, I just never dug deep into them).
One of them is ARC2, and I used it for my first experiments. For several reasons I was never quite happy with it, though. It is closely tied to a MySQL-Database, and the last time I looked, it was e.g. not possible to execute SPARQL queries against in-memory graphs. This is due to the fact that in-memory graphs are basically php arrays, which I didn’t like either. I always ended up with far too many isset calls, leading to verbose and ugly code.
Fortunately, at some point I discovered that librdf has language bindings in PHP, which turned out to be plain awesome. I started writing a simple object oriented wrapper but eventually stumbled across this, so I didn’t even have to do that. I have contacted the author, he hasn’t been working on this since 2006 and doesn’t plan to do so in the future. Since I found it extremely useful (and am amazed that there seems to be no wide-spread usage), I created a git(hub) repository for the wrapper. So far, it only contains the original code.
Anyways, in the end, this is all you need to get started with RDF and SPARQL using PHP5 on Ubuntu, 10.10 in my case:
$ sudo apt-get install php5-librdf librdf-storage-mysql
$ pear install http://reallylongword.org/projects/librdf-php/LibRDF-1.0.0.tgz
If you don’t like PEAR, you can of course also manually download the tarball or clone the git repository.
This installs the librdf module for PHP5, a storage backend for MySQL and the object oriented wrapper. Now all we need is some data. Let’s use, for no particular reason, Richard Cyganiak’s FOAF file. For a start, let’s find out who Richard knows by loading that file into a graph and executing a simple SPARQL query on it:
<?php
require_once('LibRDF/LibRDF.php');
// All models, i.e. graphs, reside in a storage. This defaults to
// memory.
$store = new LibRDF_Storage();
$model = new LibRDF_Model($store);
// Load some data into the model. The format must explicitly be
// declared for the parser, but using e.g. ARC's format detector
// should be easy to implement. Anyways, in this case we're
// dealing with an RDF/XML document:
$model->loadStatementsFromURI(
new LibRDF_Parser('rdfxml'),
'http://richard.cyganiak.de/foaf.rdf');
// Create a SPARQL query
$query = new LibRDF_Query("
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name1 ?name2
WHERE
{
?person1 foaf:knows ?person2 .
?person1 foaf:name ?name1 .
?person2 foaf:name ?name2 .
}
", null, 'sparql');
// Execute the query. The results of a SPARQL SELECT provide
// array access by using the variables used in the query as keys:
$results = $query->execute($model);
foreach ($results as $result) {
echo $result['name1'] . " knows " . $result['name2'] . "\n";
}
Here is what we get:
Richard Cyganiak knows Roman Szmidt
Richard Cyganiak knows Jan Kretzschmar
Richard Cyganiak knows Jörg Meltzer
Richard Cyganiak knows Chris Bizer
Richard Cyganiak knows Manuel Schulze
(...)
If you don’t feel like learning SPARQL at all, you’re still not lost, at least for simple queries. librdf provides a couple of convenient methods – findStatements, getSource, getArc and getTarget – to traverse a graph that you can alternatively use through the wrapper:
<?php
require_once('LibRDF/LibRDF.php');
$store = new LibRDF_Storage();
$model = new LibRDF_Model($store);
$model->loadStatementsFromURI(
new LibRDF_Parser('rdfxml'),
'http://richard.cyganiak.de/foaf.rdf');
$foafKnows = new LibRDF_URINode("http://xmlns.com/foaf/0.1/knows");
$foafName = new LibRDF_URINode("http://xmlns.com/foaf/0.1/name");
$results = $model->findStatements(null, $foafKnows, null);
foreach ($results as $result) {
$person1 = $result->getSubject();
$person2 = $result->getObject();
$name1 = $model->getTarget($person1, $foafName);
$name2 = $model->getTarget($person2, $foafName);
echo "$name1 knows $name2\n";
}
This produces the exact same output as with the SPARQL query above.
So far, we have loaded our sample data from the web right into memory. But what if we wanted to persist the model to our own database? This is actually also extremely simple. All that has to be done is to not create the storage in memory, but in one of the available persistent backends. Let’s use MySQL:
// Create a new MySQL storage. The second parameter is NOT the
// name of the MySQL database to use, but the name of the
// triplestore. This makes it possible to create several
// triplestores within one MySQL database. The third parameter is
// a string containing the options for the actual MySQL database.
// They should speak for themselves, except for "new='yes'". If
// this option is given, the necessary table structure is created and
// any existing triples are dropped. You probably only want to use
// it in some kind of setup or installation procedure.
$store = new LibRDF_Storage("mysql", "db",
"new='no',
host='localhost',
database='mydatabase',
user='foo',
password='bar'");
Now we can proceed as above. Statements loaded into the model will now end up in the MySQL database, and queries executed against the model will query that database.
As challenged, we have achieved the following:
- Install an RDF store from a package management system on a computer running either Apple’s OSX or Ubuntu Desktop.
- Install a code library (again from a package management system) for talking to the RDF store in either PHP, Ruby or Python.
- Programatically load some real-world data into the RDF datastore using either PHP, Ruby or Python.
- Programatically retrieve data from the datastore with SPARQL using using either PHP, Ruby or Python.
- Convert retrieved data into an object or datatype that can be used by the chosen programming language (e.g. a Python dictionary).
All of this with easy to install software that runs on standard LAMP systems and just a couple of lines of straight-forward code. Of course there are a lot more details to write about, and I might do so at some point, but I’ll wrap it up for now. Let me know what you think or ask questions in the comments.