Monday, August 2, 2010

Tabular Data from RDF

RDF data usually needs to be transformed into a tabular form for export to traditional relational database systems. While on the surface they may look quite different, RDF data and tabular data are just different ways of organizing the same thing.  Understanding a little about RDF, how it compares to tables, and how to query it is all that's needed to begin experimenting with generating tables from RDF data, which you can do right here from this posting.

RDF
An RDF data set is a collection of statements of interest about something. Each statement consists of three things, a subject, a predicate, and an object, and are collectively known as a triple. Triples are just like simple natural language statements. In the natural language statement "Jane Smith's haircolor's blonde.", "Jane Smith" is the subject, "haircolor" is the predicate, and "blonde" is the object. Together they make an assertion of fact about the world.

RDF Compared to Tables
In a table, like those within relational databases, the subject, predicate, and object are modeled as a row, column, and intersecting cell, respectively. For example, if we also made the statement that "John Doe's haircolor's brown" we could represent these statements in a simple table as:

person       haircolor   
---------    ---------
Jane Smith   blonde   
John Doe     brown  

Queries
A SPARQL SELECT query can be used to transform RDF data into a table. For example, a 'Persons' table can be obtained from some example RDF data by executing the following SPARQL SELECT query:

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix upo:  <http://mydomain.org/myupperontology/myversion/>
prefix o:  <http://mydomain.org/myontology/myversion/>

SELECT ?name ?haircolor ?country ?region
FROM <http://www.compass-point.net/data/ontology.ttl>
WHERE
{
    ?id upo:hasReferenceName ?name . 
    ?id rdf:type o:Person . 
    ?id upo:hasLocation ?loc . 
    ?loc rdf:type upo:Country . 
    ?loc upo:hasReferenceName ?country . 
    ?id o:hasHairColor ?hc . 
    ?hc upo:hasReferenceName ?haircolor . 
    ?loc o:countryIn ?reg . 
    ?reg upo:hasReferenceName ?region .
}

When executed against the RDF data, this query results in the following 'Persons' table:

--------------------------------------------------------------
| name         | haircolor | country | region                |
==============================================================
| "Jane Smith" | "Blonde"  | "China" | "Asia Pacific Region" |
| "John Doe"   | "Brown"   | "UK"    | "UK / Europe Region"  |
--------------------------------------------------------------

In the previous query, note the following:
  • The variables declaration, the line  "SELECT ?name ?haircolor ?country ?region" defines the columns of the returned table as 'name', 'haircolor', 'country', and 'region'. 
  • The portion after the word WHERE specifies a set of triple patterns that a SPARQL processer must find within the RDF data for there to be a variable match. For example, when taken by themselves the first two lines are two triple patterns that a SPARQL processor would use to find all subjects for which both the following statements have been made
  1. A statement asserting it hasReferenceName
  2. A statement asserting it is of type Person.
  In effect, a SPARQL SELECT query transforms RDF data into a table.


Kicking the Tires
You can try this out for yourself. Select and copy the following SPARQL SELECT query, paste it into this SPARQL Processor, choose "text output", and then click the "get results" button.

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix upo:  <http://mydomain.org/myupperontology/myversion/>
prefix o:  <http://mydomain.org/myontology/myversion/>

SELECT ?name ?type ?country
FROM <http://www.compass-point.net/data/ontology.ttl>
WHERE
{
    ?id upo:hasReferenceName ?name . 
    ?id rdf:type ?type . 
    ?type rdfs:subClassOf o:Party . 
    ?id upo:hasLocation ?loc . 
    ?loc o:countryIn o:UKR .     
    ?loc upo:hasReferenceName ?country
}

This query results in a "Parties" table populated with only those parties in the UK / Europe Region.


-----------------------------------------------------------------------------------
| name          | type                                                 | country  |
===================================================================================
| "MusPub Inc." | <http://mydomain.org/myontology/myversion/Publisher> | "France" |
| "John Doe"    | <http://mydomain.org/myontology/myversion/Person>    | "UK"     |
-----------------------------------------------------------------------------------

To create other queries, refer to the example RDF data when crafting your query and then modify both the parts after the word SELECT and within the brackets after the word WHERE. Note that the triple patterns within the WHERE brackets must be separated by a space-period-space and the FROM line must stay the same to access the example RDF data.



What's next
Subsequent articles will branch from here down both practical and conceptual paths, covering
  • How to export from "VeroServe Data" to both Reporting and Integration Services, 
  • How to make data bindings to RDF data in a "VeroMark" semantic client user interface,
  • How "VeroServe Data" supports encapsulation of data, data structure, and ultimately business logic in one place through the use of RDF.

1 comment:

  1. Thanks for the explanation, Russell. I ran my first SPARQL query today! :) ~Chris

    ReplyDelete