How to Publish Linked Data on the Web?


Authors:: Chris Bizer (Web-based System Group, Freie Universität Berlin, Germany); Richard Cyganiak (Web-based System Group, Freie Universität Berlin, Germany); Tom Heath (Knowledge Media Institute, The Open University, Milton Keynes, UK)

Version:: 2007-07-16 / 10:45
This is an outdated draft! The latest version of the tutorial can be found at http://sites.wiwiss.fu-berlin.de/suhl/bizer/pub/LinkedDataTutorial/

Abstract

This document provides a tutorial on how to publish Linked Data on the Web. After a general overview of the concept of Linked Data, we describe several practical recipes for publishing information as Linked Data on the Web.

1. Introduction: Linked Data on the Web
2. Basic Principles
1. Web Architecture
2. The RDF Data Model
3. How to name data items?
4. Which vocabularies should I use to represent information?
5. What should I return as RDF description for a URI?
6. How to set links to other data items?
7. Recipes for Serving Information as Linked Data?
8. Testing and Debugging Linked Data
9. Discovering Linked Data on the Web
10. Further Reading

Appendix A: Example HTTP Session
Appendix B: How to get Yourself onto the Web of Data
Appendix C: Changes

1. Introduction: Linked Data on the Web

The goal of Linked Data is to enable people to share structured data on the Web as easily as they can share documents today.

The term Linked Data was coined by Tim Berners-Lee in his Linked Data architecture note. The term refers to a set of best practices on how to publish and interlink structured data on the Web. The basic assumption behind Linked Data is that the value and usefulness of data increases the more it is interlinked with other data. Therefore, typed links between data items from different data sources are central to Linked Data.

The basic tenets of Linked Data are to:

use the RDF data model to publish structured data on the Web
use RDF links to interlink data items from different data sources

Applying both principles leads to the creation of a data commons on the Web. This data commons is often called the Web of Data or Semantic Web.

The Web of Data can be accessed using Linked Data browsers, just as the traditional Web of documents is accessed using HTML browsers. However, instead of following links between HTML pages, Linked Data browsers enable users to navigate between different data sources by following RDF links. This allows the user to start off in one data source, and then move through a potentially endless Web of data sources connected by RDF links. For instance, while looking at data about a person from one source, a user might be interested in information about the person's home town. By following an RDF link, the user can navigate into another data set providing information about that town.

The Web of Data can also be crawled by following RDF links, just as the traditional document Web is crawled by following hypertext links. Based on the crawled data, search engines can provide sophisticated query capabilities, similar to the query capabilities provided by relational databases today. As query results are themselves structured data, and not just links to HTML pages, they can be processed within various types of applications, thus enabling a new class of applications based on the Web of Data.

The glue that holds together the traditional document Web are hypertext links between HTML pages. The glue of the data web are RDF links. An RDF link simply states that one data item has some kind of relationship to another data item, and these relationships can have different types. For instance, an RDF link that connects data about people can state that two people know each other; an RDF link that connects information about a person with information about publications within a bibliographic database might state that a person is the author of a specific paper.

Having provided a background to Linked Data concepts, the rest of this document is structured as follows: Section 2 outlines the basic principles of Linked Data. Section 3 provides practical advice on how to name data items with URI references. Section 4 discusses terms from well-known vocabularies and data sources which should be reused to represent information. Section 6 gives practical advice on how to generate RDF links between data from different data sources. Section 7 presents several complete recipes for publishing different types of information as Linked Data on the Web using existing Linked Data publishing tools. Section 8 discusses testing and debugging Linked Data sources. Finally Section 9 gives an overview about alternative discovery mechanism for Linked Data on the Web.

2. Basic Principles

This chapter describes the basic principles of Linked Data. As Linked Data is closely aligned to the general architecture of the Web, we first summarize the basic principles of this architecture. Then we give an overview about the RDF data model and recommend how the data model should be used in the Linked Data context.

2.1. Web Architecture

This section summarizes the basic principles of the Web Architecture and introduces terminology such as resource and representation. For more detailed information please refer to the Architecture of the World Wide Web, Volume One W3C Recommendation and the current findings of the W3C Technical Architecture Group (TAG).

Resources

To publish data on the Web, we first have to identify the items of interest in our domain. They are the things whose properties and relation to each other we want to describe in the data. Within Web Architecture terminology, all items of interest are called resources.

Current drafts findings of the W3C Technical Architecture Group (TAG) distinguish between two kinds of resources: Information resources and non-information resources (also called 'other resources'). This distinction is quite important in a Linked Data context. All the resources we find on the traditional document Web, such as documents, images, and other media files, are information resources. Many of the things we want to share data about, like people, physical products, places, proteins, scientific concepts, and so on, are not and are therefore called non-information resources. As a rule of thumb, all “real-world objects” that exist outside of the Web are non-information resources.

Resource Identifiers

Resources are identified using Uniform Resource Identifiers (URIs). In the context of Linked Data, we restrict ourselves to using HTTP URIs only and avoid other URI schemes such as URNs ans DOIs. HTTP URIs make good names for two reasons: They provide a simple way to create globally unique names without centralized management; and URIs work not just as a name but also as a means of accessing information about a resource over the Web. The choice to prefer HTTP over other URI schemes is discussed at length in the W3C TAG draft finding URNs, Namespaces and Registries.

Representations

Information resources can have representations. A representation is a stream of bytes in a certain format, such as HTML, RDF/XML, or JPEG. For example, an invoice is an information resource. It could be represented as an HTML page, as a printable PDF document, or as an RDF document. A single information resource can have many different representations, e.g. in different formats, resolution qualities, or natural languages.

Dereferencing HTTP URIs

URI Dereferencing is the process of looking up a URI on the Web in oder to get information about the referenced resource. The W3C TAG draft finding about Dereferencing HTTP URIs introduced a distinction on how URIs identifying information resources and non-information resources are dereferenced:

Information Resources: When a URI identifying an information resource is dereferenced, the server of the URI owner usually generates a new representation, a new snapshot of the information resource's current state, and sends it back to the client using the HTTP response code 200 OK.
Non-Information Resources cannot be dereferenced directly. Therefore Web architecture uses a trick to enable URIs identifying non-information resources to be dereferenced: Instead of sending a representation of the resource, the server sends the client the URI of a information resource which describes the non-information resource using the HTTP response code 303 See Other. This is called a 303 redirect. In a second step, the client dereferences this new URI and gets a representation describing the original non-information resource.

Note: There are two approaches that data publishers can use to provide clients with URIs of information resources describing a non-information resources: Hash URIs and 303 redirects. This document focuses mostly on the 303 redirect approach. See section 4.3 of Cool URIs for the Semantic Web for a discussion of the tradeoffs between both approaches.

Content Negotiation

As we noted earlier, a resource may be available in different representation formats. HTTP has a powerful mechanism that allows a server to select the most appropriate representation for a client request: content negotiation. Whenever a client dereferences an HTTP URI, it can send along an HTTP Accept header to indicate what representation format it prefers. If the client wants a HTML representation of a resource, it sends an Accept: text/html, application/xhtml+xml header. If the client wants an RDF/XML representation of a resource, it sends an Accept: application/rdf+xml header. The server then selects the best match from the available representations or generates the desired content on demand, and sends it back to the client using the HTTP response code 200 OK.

With content negotiation, URIs identifying non-information resources can be made to work in both HTML browsers and Linked Data browsers. This is helpful when people happen to come across a Linked Data URI in their normal Web browser.

Content negotiation and 303 redirects play together. Therefore, a data source often serves three URIs for each non-information resource, for instance:

http://www4.wiwiss.fu-berlin.de/factbook/resource/Russia (URI reference identifying the non-information resource Russia)
http://www4.wiwiss.fu-berlin.de/factbook/data/Russia (information resource describing the non-information resource Russia and having RDF/XML representation)
http://www4.wiwiss.fu-berlin.de/factbook/page/Russia (information resource describing the non-information resource Russia and having XHTML representation)

The picture below depicts how dereferencing a HTTP URI identifying a non-information resource plays together with content negotiation:

The client asks to GET a representation of a URI identifying a Non-Information Resource. In our case a vocabulary URI. As the client wants a RDF/XML representation of the resource, it sends an Accept: application/rdf+xml header along with the request.
The server recognizes the URI to identify a non-information resource. As the server can not return a representation of this resource, it answers using the HTTP 303 See Other response code and sends the client the URI of an information resource describing the non-information resource. In our case: RDF content location.
The client now asks the server to GET a representation to this information resource, requesting again application/rdf+xml.
The server sends the client a RDF/XML document containing a description of the original resource vocabulary URI.

A complete example of a HTTP session for dereferencing a URI identifying a non information resource is given in Appendix A

URI Aliases

Within an open environment like the Web it often happens that multiple information providers talk about the same non-information resource, for instance a geographic location or a famous person. As they do not know about each other, they introduce different URIs for identifying the same real-world object. For instance: DBpedia a data source providing information that has been extracted from Wikipedia uses the URI http://dbpedia.org/resource/Berlin to identify Berlin. Geonames is a data source providing information about millions a geographic locations uses the URI http://sws.geonames.org/2950159 to identify Berlin. As both URIs refer to the same non-information resource, they are called URI aliases. URI aliases are common on the Web of Data, as it can not realistically be expected that all information providers agree on the same URIs to identify a non-information resources. URI aliases provide an important social function to the Web of Data as they are dereferenced to different descriptions of the same non-information resource and thus allow different views and opinions to be expressed. In order to still be able to track that different information providers speak about the same non-information resource, it is common practice that information providers set owl:sameAs links to URI aliases they know about. This practice is explained in section 5 in more detail.

Data Items

Within this tutorial we use a new term, which is not part of the standard Web Architecture terminology but useful within the Linked Data context: Data item. The term data items refers to the description of a non-information resource that a client obtains by dereferencing a specific URI that identifies this non-information resource. For example: Deferencing the URI http://dbpedia.org/resource/Berlin asking for application/rdf+xml gives you a data item that is equal to the RDF description of http://dbpedia.org/resource/Berlin within the information resource http://dbpedia.org/data/Berlin. Using this new term makes sense in a Linked Data context as it is common practice to use multiple URI aliases to refer to the same non-information resource and as different URI aliases dereference to different descriptions of the resource. When you interpret the Web of Data as a set of interlinked databases, a data item would equal a record in a specific database.

2.2. The RDF Data Model

When publishing Linked Data on the Web, we represent information about resources using the Resource Description Framework (RDF). RDF provides a data model that is extremely simple on the one hand but strictly tailored towards Web architecture on the other hand.

In RDF, a description of a resource is represented as a number of triples. The three parts of each triple are called its subject, predicate, and object. A triple mirrors the basic structure of a simple sentence. For example: Chris (subject) has the email address (predicate) chris@bizer.de (object).

The subject of a triple is the URI of the described resource. The object can either be a simple literal value, like a string, number, or date; or the URI of another resource that is somehow related to the subject. The predicate, in the middle, indicates what kind of relation exists between subject and object, e.g. is this the name or date of birth (in the case of a literal), or is this the employer or someone the person knows (in the case of another resource). The predicate is a URI too. These predicate URIs come from vocabularies, collections of URIs that can be used to represent information about a certain domain. Please refer to Section 4 for more information about which vocabularies to use in a Linked Data context.

A set of RDF triples can be seen as an RDF graph. The URIs occurring as subject and object URIs are the nodes in the graph, and each triple is a directed arc that connects the subject to the object.

Two principle types of RDF triples can be distinguished: Literal triples and RDF links.

Literal Triples have an RDF literal such as a string, number, or date as object. Literal triples are used to describe the properties of resources. For instance, literal triples are used to describe the name or date of birth of a person.
RDF Links represent typed links between two resources. RDF links consist of three URI references. The URIs in the subject and the object position of the link identify the interlinked resources. The URI in the predicate position defines the type of the link. For instance, an RDF link can state that a person is employed by an organization. Another RDF links can state that the persons knows certain other people.

RDF links form the foundation for the Web of Data as dereferencing the URIs that form the parts of the link will give us descriptions of the interlinked resources. These descriptions contain additional RDF links which consist of other URIs that in turn can also be dereferenced, and so on. This is how individual resource descriptions are woven into the Web of Data. This is also how the Web of Data can be navigated using a Linked Data browser or crawled by the robot of a search engine.

Let's take an RDF browser like Disco or Tabulator as an example. The surfer uses to browser to display information about Richard from his FOAF profile. Richard has identified himself with the URI http://richard.cyganiak.de/foaf.rdf#cygri. The browser dereferences this URI over the Web asking for content type application/rdf+xml and displays the retrieved information (click here to have Disco do this). Within his profile, Richard says that he is based near Berlin, using the DBpedia URI http://dbpedia.org/resource/city/Berlin as URI alias for the non-information resource Berlin. As the surfer is interested in Berlin, he instructs the browser to dereference this URI by clicking on it. The browser now dereferences this URI asking for application/rdf+xml.

Dereferencing URIs Step One

After being redirected with a HTTP 303 response code, the browser retrieves an RDF graph describing Berlin in more detail. A part of this graph is shown below. The graph contains a literal triple stating that Berlin has 3.405.259 inhabitants and another RDF link to a non-information resource representing a list of German cities.

Dereferencing URIs Step Two

As both RDF graphs share the URI http://dbpedia.org/resource/Berlin, they naturally merge together, as shown below.

Dereferencing URIs Step Three

The surfer might also be interested in other German cities. Therefore he lets the browser dereference the URI identifying the list. The retrieved RDF graph contains more RDF links to German cities. For instance, Hamburg and München as shown below.

Dereferencing URIs Step Four

Seen from a Web perspective, the most valuable links are those that connect a resource to “external” data published by other data sources, because they link up different islands of data into a Web. Technically, such an external RDF link is a RDF triple which has a subject URI from one data source with an object URI from another data source. The box below contains different external RDF links taken from various data sources on the Web.

Examples of External RDF Links

# RDF Link taken from DBpedia
<http://dbpedia.org/resource/Berlin> owl:sameAs <http://sws.geonames.org/2950159> .  
<http://dbpedia.org/resource/Tim_Berners-Lee> owl:sameAs <http://www4.wiwiss.fu-berlin.de/dblp/resource/person/100007> . 
 
# RDF links taken from Tim Berners-Lee's FOAF profile
<http://www.w3.org/People/Berners-Lee/card#i> owl:sameAs <http://dbpedia.org/resource/Tim_Berners-Lee> .
<http://www.w3.org/People/Berners-Lee/card#i> foaf:knows <http://www.w3.org/People/Connolly/#me>
 
# RDF links taken from Richard Cyganiaks's FOAF profile
<http://richard.cyganiak.de/foaf.rdf#cygri> foaf:knows <http://www.w3.org/People/Berners-Lee/card#i> .
<http://richard.cyganiak.de/foaf.rdf#cygri> foaf:topic_interest <http://dbpedia.org/resource/Semantic_Web> .

Benefits of using the RDF Data Model in the Linked Data Context

The main benefits of using the RDF data model in a Linked Data context are that:

Clients can look up every URI within an RDF graph over the Web to retrieve additional information.
Information from different sources merges naturally.
The data model enables you to set links between data items over the Web.
Combined with schema languages such as RDF-S or OWL, the data model allows you to use as much or as little structure as you need, meaning that you can represent tightly structured data as well as semi-structured data.

RDF Features Best Avoided in the Linked Data Context

In order to make it easier for client to merge and query your data, we recommend not to use the full expressivity of the RDF data model, but a subset of the RDF features. Especially:

We discourage the use of blank nodes. It is impossible to set external RDF links to a blank node, and merging of data from different sources becomes much more difficult when blank nodes are used. Therefore, all resources of any importance should be named using URIs references. Note that the current FOAF specification has also dropped blank nodes in favour of URI references (see rdf:about="#me" within their example, and Tim Berners-Lee's Give yourself a URI post on the topic).
We discourage the use of RDF reification as the semantics of reification are unclear and as reified statements are rather cumbersome to query with the SPARQL query language. Metadata can be attached to the information resource instead, as shown below.
You should think twice before using RDF collections or RDF containers as they do not work well together with SPARQL. Does your application really need a collection or a container or can the information also be expressed useing multiple triples having the same predicate? The second option makes SPARQL queries straight forward.

3. How to name data items?

Resources are named with URI references. When publishing Linked Data, you should put some effort into choosing good URIs for your resources.

On the one hand, they should be good names that other publishers can use confidently to link to your resources in their own data. On the other hand, you will have to put technical infrastructure in place to make them dereferenceable, and this may put some constraints on what you can do.

This section lists, in loose order, some things to keep in mind.

Use HTTP URIs for everything. The http:// scheme is the only URI scheme that is widely supported in today's tools and infrastructure. All other schemes require extra effort for resolver web services, dealing with identifier registrars, and so on. The arguments in favour of using HTTP are discussed at length in several places, e.g. in Names and addresses by Norman Walsh, and URNs, Namespaces and Registries (draft) by the W3C TAG.
Define your URIs in an HTTP namespace under your control, where you actually can make them dereferenceable.
Keep implementation cruft out of your URIs. Short, mnemonic names are better. Consider these two examples:
- http://dbpedia.org/resource/Berlin
- http://www4.wiwiss.fu-berlin.de:2020/demos/dbpedia/cgi-bin/resources.php?id=Berlin
Try to keep your URIs stable and persistent. Changing your URIs later on will break existing links, so it may be worth putting some extra thought into them early on.
The URIs you can choose are constrained by your technical environment. If your server is called demo.serverpool.wiwiss.example.org and getting another domain name is not an option, then your URIs will have to begin with http://demo.serverpool.wiwiss.example.org/. If you cannot run your server on port 80, then your URIs may have to begin with http://demo.serverpool.example.org:2020/. If possible you should clean up those URIs by adding some URI rewriting rules to the configuration of your webserver.
We often end up with three URIs related to a single non-information resource:
1. an identifier for the resource,
2. an identifier for a related information resource suitable to HTML browsers (with a web page representation),
3. an identifier for a related information resource suitable to RDF browsers (with an RDF/XML representation).
Here are several ideas for choosing these related URIs:
1. http://dbpedia.org/resource/Berlin
2. http://dbpedia.org/page/Berlin
3. http://dbpedia.org/data/Berlin
1. http://id.dbpedia.org/Berlin
2. http://pages.dbpedia.org/Berlin
3. http://data.dbpedia.org/Berlin
1. http://dbpedia.org/Berlin
2. http://dbpedia.org/Berlin.html
3. http://dbpedia.org/Berlin.rdf
You will often use some kind of primary key inside your URIs, to make sure that all your URIs are unique. If you can, use a key that is meaningful inside your domain. For example, when dealing with books, making the ISBN number part of the URI is better than using the primary key of an internal database table. This also makes equivialence mining to derive RDF links easier.

Examples of cool URIs:

4. Which vocabularies should I use to represent information?

In order to make it as easy as possible for client applications to process your data, you should reuse terms from well-known vocabularies wherever possible. You should only define new terms yourself if you can not find required terms in existing vocabularies.

4.1 Reusing existing terms

A set of well-known vocabularies has evolved in the Semantic Web community. Please check whether your data can be represented using terms from these vocabularies before defining any new terms:

Friend-of-a-Friend ( FOAF), vocabulary for describing people.
Dublin Core (DC) defines general metadata attributes. See also their new domains and ranges draft.
Semantically-Interlinked Online Communities (SIOC), vocabulary for representing online communities.
Description of a Project (DOAP), vocabulary for describing projects.
Simple Knowledge Organization System (SKOS), vocabulary for representing taxonomies and loosely structured knowledge.
Music Ontology provides terms fo describing artists, albums and tracks.
Review Vocabulary, vocabulary for representing reviews.
Creative Commons (CC), vocabulary for describing license terms.

A more extensive list of well-know vocabularies is maintained by the Linking Open Data community project within the ESW Wiki. A listing of the 100 most common RDF namespaces (August 2006) is provided by UMBC eBiquity Group.

It is common practice to mix terms from different vocabularies. We especially recommend the use of rdfs:label and foaf:depiction properties whenever possible as these terms are supported by many client applications.

If you need URI references for identifying general-purpose concepts like geographic places, research areas, general topics, artists, books or CDs, you should consider using concept URIs from data sources within the W3C Linking Open Data community project, for instance Geonames, DBpedia, Musicbrainz, dbtune or the RDF Book Mashup. The benefits of using concept URIs from these data sources are two-fold:

The URIs are dereferenceable, meaning that a description of the concept can be retrieved from the Web. For instance, using the DBpedia URI http://dbpedia.org/page/Doom to identify the computer game Doom gives you an extensive description of the game including abstracts in 10 different languages and various classifications.
The URIs are interlinked with URIs from other data sources. For instance, you can navigate from the DBpedia URI http://dbpedia.org/resource/Berlin to data about Berlin provided by Geonames and EuroStat. Therefore, by using concept URIs form these datasets, you interlink your data with a rich and fast-growing network of other data sources.

A more extensive list of datasets with dereferenceable URIs is maintained by the Linking Open Data community project within the ESW Wiki.

Good examples of how terms from different well-known vocabularies are mixed in one document and how existing concept URIs are reused are given by the FOAF profiles of Tim Berners-Lee and Ivan Herman.

4.2 How to define your terms?

You should only define terms that are not already defined within well-known vocabularies. In particular this means not defining completely new vocabularies from scratch, but instead extending existing vocabularies to represent your data as required.

You can define vocabularies using the RDF Vocabulary Description Language 1.0: RDF Schema or the Web Ontology Language (OWL). In open environments like the Web where multiple parties publish vocabulary definitions and refer to vocabulary definitions from other people it is essential that vocabulary definitions:

Provide for both humans and machines. At this stage in the development of the Web of Data, more people will be coming across your code than machines, even though the Web of Data is meant for machines in the first instance. Don't forget to add prose, e.g. rdfs:comment for each term invented. Always provide a label for each term using the rdfs:label property.
Make term URIs dereferenceable. It is essential that term URIs are dereferenceable so that clients can look up the definition of a term. Therefore you should make term URIs dereferenceable following the W3C Best Practice Recipes for Publishing RDF Vocabularies.
Make use of other people's terms. Using other people's terms, or providing mappings to them, helps to promote the level of data interchange on the Web of Data, in the same way that hypertext links built the traditional document Web.
State all important information explicitly. For example, state all ranges and domains explicitly. Remember: humans can often do guesswork, but machines can't. Don't leave important information out!
Do not create over-constrained, brittle models; leave some flexibility for growth. For instance, if you use full-featured OWL to define your vocabulary, you might state things that lead to unintended consequences and inconsistencies when somebody else references your term within a different vocabulary definition. Therefore, unless you know exactly what you are doing, use RDF-Schema to define vocabularies.

The following example contains a definition of a class and a property following the rules above. The example uses the Turtle syntax. Namespace declarations are omitted.

# Definition of the class "Lover"
<http://sites.wiwiss.fu-berlin.de/suhl/bizer/pub/LinkedDataTutorial/LoveVocabulary#Lover> 
         rdf:type rdfs:Class ;
         rdfs:label "Lover"@en ;
         rdfs:label "Liebender"@de ;
         rdfs:comment "A person who loves somebody."@en ;
         rdfs:comment "Eine Person die Jemanden liebt."@de ;
         rdfs:subClassOf foaf:Person .
 
# Definition of the property "loves"
<http://sites.wiwiss.fu-berlin.de/suhl/bizer/pub/LinkedDataTutorial/LoveVocabublary#love> 
         rdf:type rdf:Property ;
         rdfs:label "love"@en ;
         rdfs:label "lieben"@de ;
         rdfs:comment "Relation between a lover and a loved person."@en ;
         rdfs:comment "Beziehung zwischen einem Liebenden und einer geliebten Person."@de ;
         rdfs:subPropertyOf foaf:knows ;
         rdfs:range foaf:Person ;
         rdfs:domain <http://sites.wiwiss.fu-berlin.de/suhl/bizer/pub/LinkedDataTutorial/LoveVocabulary#Lover> .

5. What should I return as RDF description for a URI?

So, assuming we have expressed all our data in RDF triples: What triples should go into the RDF representation that is returned (after a 303 redirect) in response to dereferencing the URI of a data item?

The description: The representation should include all triples from your dataset that have the resource's URI as the subject. This is the immediate description of the resource.
Backlinks: The representation should also include all triples from your dataset that have the resource's URI as the object. This is redundant, as these triples can already be retrieved from their subject URIs, but it allows us to traverse links in either direction.
Related descriptions: You may include any additional information about related resources that may be of interest in typical usage scenarios. For example, you may want to send information about the author along with information about a book, because many clients interested in the book may also be interested in the author. This shouldn't be done excessively, thus returning a megabyte of RDF can be considered bad in most cases.
Metadata: The representation should contain any metadata you want to attach to your published data, such as a URI identifying the author and licensing information. These should be recorded as RDF descriptions of the information resource that describes a data item; that is, the subject of the RDF triples should be the URI of the information resource. Attaching meta-information to that information resource, rather than attaching it to the described resource itself or to specific RDF statements about the resource (as with RDF reification) plays nicely together with using Named Graphs and the SPARQL query language within Linked Data client applications. In order to enable information consumers to use your data under clear legal terms, each RDF document should contain a license under which the content can be used. Please refer to Creative Commons or Talis for standard licenses).
Syntax: There are various ways to serialize RDF descriptions. Your data source should at least provide RDF descriptions as RDF/XML is the only official syntax for RDF. As RDF/XML is not very human-readable, your data source could additionally provide Turtle descriptions when asked for MIME-type application/x-turtle. In some rare cases where your think people might want to use your data together with XML technologies such as XSLT or XQuery, you might additionally also want to serve a TriX serialization, as TriX works better with these technologies than RDF/XML.

The following example shows the Turtle representation of the information resource http://sites.wiwiss.fu-berlin.de/suhl/bizer/pub/LinkedDataTutorial/ChrisAboutRichard describing Richard (namespace declarations are omitted):

# Metadata and Licensing Information
<http://sites.wiwiss.fu-berlin.de/suhl/bizer/pub/LinkedDataTutorial/ChrisAboutRichard>
      dc:author <http://www.bizer.de#chris> ;
      dc:date "2007-07-13"^^xsd:date ;
      cc:license <http://web.resource.org/cc/PublicDomain> .
 
# The description
<http://richard.cyganiak.de/foaf.rdf#cygri> 
      foaf:name "Richard Cyganiak" ;
      foaf:topic_interest <http://dbpedia.org/resource/Category:Databases> ;
      foaf:topic_interest <http://dbpedia.org/resource/MacBook_Pro> .
 
# Backlinks
<http://www.bizer.de#chris> foaf:knows <http://richard.cyganiak.de/foaf.rdf#cygri> .
<http://www4.wiwiss.fu-berlin.de/is-group/resource/projects/Project3> doap:developer 
      <http://richard.cyganiak.de/foaf.rdf#cygri> .

6. How to set RDF Links to other Data Items?

RDF links enable Linked Data browsers and crawlers to navigate between data sources and to discover additional data.

Which RDF properties are used as predicates within RDF links completely depends on the application domain. Commonly used linking properties within the domain of describing people are for instance foaf:knows, foaf:based_near and foaf:topic_interest . Examples of combining these properties with property values from DBpedia, the DBLP bibliography and the RDF Book Mashup are found in Tim Berners-Lee's and Ivan Herman's FOAF profiles.

It is common practice to use the owl:sameAs property for stating that another data source contains additional information about a specific non-information resource. An owl:sameAs link indicates that two URI references actually refer to the same thing. Therefore, owl:sameAs is used to map between different URI aliases (see section 2.1). Examples of using owl:same to indicate that two URIs talk about the same thing are again Tim's FOAF profile which states that http://www.w3.org/People/Berners-Lee/card#i identifies the same resource as http://www4.wiwiss.fu-berlin.de/bookmashup/persons/Tim+Berners-Lee and http://www4.wiwiss.fu-berlin.de/dblp/resource/person/100007. Other usage examples are found within DBpedia and the Berlin DBLP server.

RDF links can be set manually, which is usually the case for FOAF profiles, or they can be generated by automated linking algorithms. This approach is usually taken to interlink large datasets consisting of thousands of data items.

6.1 Setting RDF Links Manually

To manually set RDF links you first need an idea about the datasets you want to link to. In order to get an overview of different datasets that can be used as linking targets please refer to the Linking Open Data Dataset list. Once you have identified particular datasets as suitable linking targets, you can manually search within these for the URI references you want to link to. If a data source doesn't provide a search interface, for instance a SPARQL endpoint or a HTML Web form, you can use Linked Data browsers like Tabulator or Disco to explore the dataset and find the right URIs.

You can use services such as Uriqr or Sindice to search for existing URIs and to choose the most-popular one if you find several candidates. Uriqr allows you to find URIs for people you know, simply by searching for their name. Results are ranked according to how heavily a particular URI is referenced in RDF documents on the Web, but you will need to apply a little bit of human intelligence in picking the most appropriate URI to use. Sindice indexes the Semantic Web and can tell you which sources mention a certain URI. Therefore the the service can help you to choose the most popular URI for a concept.

Remember that data sources might use HTTP-303 redirects to redirect clients from URIs identifying non-information resources to URIs identifying information resources that describe the non-information resources. In this case, make sure that you link to the URI reference identifying the non-information resource, and not the document about it.

6.2 Auto-generating RDF Links

The approach described above clearly does not scale to situations where large datasets are to be interlinked, for instance linking 70 000 places in DBpedia to their corresponding entries in Geonames. In such cases, it makes sense to use an automated record linkage algorithm to generate RDF links between data sources.

Record Linkage is a well-known problem in the databases community. The Linking Open Data Project collects material related to using record linkage algorithms in the Linked Data context on the Equivialence Mining wiki page.

There is still a lack of good, easy-to-use tools to auto-generate RDF links. Therefore it is common practice to implement dataset-specific record linkage algorithms to generate RDF links between data sources. In the following we describe two classes of such algorithms:

Simple pattern-based Algorithms

Within various domains, there are generally accepted naming schemata. For instance, in the publication domain there are ISBN numbers, in the financial domain there are ISIN identifiers. If these identifiers are used as part of HTTP URIs identifying particular data items, it is possible to use extremely simple pattern-based algorithms to generate RDF links between data items.

An example of a data source using ISBN numbers as part of its URIs is the RDF Book Mashup, which for instance uses the URI http://www4.wiwiss.fu-berlin.de/bookmashup/books/0747581088 to identify the book 'Harry Potter and the Half-blood Prince'. Having the ISBN number in these URIs made it easy for DBpedia to generate owl:sameAs links between books within DBpedia and the Book Mashup. DBpedia simply used the following pattern-based algorithm:

Iterate over all books within DBpedia that have an ISBN number.
Create a owl:sameAs link between the URI of a book in DBpedia and the corresponding Book Mashup URI using the following pattern for Book Mashup URIs: http://www4.wiwiss.fu-berlin.de/bookmashup/books/{ISBN number}.

Running this algorithm against all books within DBpedia resulted in 9000 RDF links which were merged with the DBpedia dataset. For instance, the resulting link for the Harry Potter book is:

http://dbpedia.org/resource/Harry_Potter_and_the_Half-Blood_Prince owl:sameAs
http://www4.wiwiss.fu-berlin.de/bookmashup/books/0747581088

More complex property-based Algorithms

In cases where no common identifiers across datasets exist, it is necessary to employ more complex property-based linkage algorithms. We will outline two algorithms below:

Interlinking DBpedia and Geonames. Information about geographic places appear in DBpedia as well as in the Geonames database. In order to identify places that appear in both datasets, the Geonames team uses a property-based heuristic that is based on article title together with semantic information like latitude and longitude, but also country, administrative division, feature type, population and categories. Running this heuristic against both data sources resulted in 70500 correspondences which were merged as Geonames owl:sameAs links with the DBpedia dataset.
Interlinking Jamendo and MusicBrainz. Please refer to Yves Raimond's blog post about his approach to interlinking Jamendo and MusicBrainz.

7. Recipes for Serving Information as Linked Data?

This chapter provides practical recipes for publishing different types of information as Linked Data on the Web. Information has to fulfil the following minimal requirements to be considered "published as Linked Data on the Web":

things must be identified with dereferenceable HTTP URIs.
if such a URI is dereferenced asking for the MIME-type application/rdf+xml, a data source must return a RDF/XML description of the data item.
if a URI which identifies a non-information resource is dereferenced, a data source must return an HTTP response containing a HTTP 303 redirect to a information resource describing the non-information resource.
Beside of RDF links to data items within the same data source, RDF descriptions should also contain RDF links to data items provided by other data sources, so that clients can navigate the Web of Data as a whole by following RDF links.

Which of the following recipes fits your needs depends on various factors, such as:

How much data to you want to serve? If you only want to publish several hundred RDF triples, you might want to serve them as a static RDF file using Recipe 7.1. If your dataset is larger, you might want to load it into a proper RDF store and put the Pubby Linked Data interface in front of it as described in Recipe 7.3.
How is your data currently stored? If you information is stored in a relational database, you can use D2R Server as described in Recipe 7.2. If the information is already available through some other kind of API, you might implement a wrapper around this API as described Recipe 7.4. If your information is represented in some other format such as BibTEX, CSV, or Microsoft Excel, you will have to convert it to RDF first as described in Recipe 7.3.
How often does your information change? If you information changes quite frequently, you might not want to convert it RDF, but prefer approaches which generate RDF views on your data, such as D2R Server (Recipe 7.2), or wrappers (Recipe 7.4).

After you have published your information as Linked Data, you should ensure that there are external RDF links pointing at data items from your dataset, so that RDF browser and crawlers can find your data. There are two basic ways of doing this:

Add several RDF links to your FOAF profile that point at central data items within your dataset. Assuming that somebody else in the world knows you and references your FOAF profile, your new dataset is now reachable by following RDF links.
Convince the owner of a related data source that is already well interlinked to auto-generate RDF links to data items within your dataset. Or to make it easier for the owner of the other dataset, create the RDF links yourself and send them to her so that she just has to merge them with her dataset. A project that is extremely open to setting RDF links to other data sources is the DBpedia community project. Just announce your data source on the DBpedia mailing list or send a set of RDF links to the list.

7.1 Serving Static RDF Files

This section still needs work.

Recipie depends on:

Size of file/how many resources
talk about existing resources or new non-information resources
how much access you have to your server configuration

Your file should contain: Descritiption, Backlinks, Metainfo ....

... In the following section we will detail how this can be done, and later discuss strategies for publishing larger, more complex data sets from static files.

Choosing URIs and putting your file at the right place on your sever

Talk about # vs. /
Talk about location and metainfo in document.

Setting up the right MIME types for HTTP responses

Todo: httpconf examples

Setting up content negotiation

Depends on if new non-information resource or not.

Todo: http.conf examples

Serving Larger Static RDF Files

Whilst static RDF files are well suited to serving small, simple data sets, they can also be used for serving larger amounts of more complex data. For example, data that changes relatively infrequently may be stored in a relational database from which daily or weekly dumps to a static RDF file are made. If the RDF file you want to serve is bigger than a typical FOAF file (let's say 1 Megabyte for instance) and you expect that information consumers usually don't want the whole file but just some part of it, the software package Pubby can be of assistance. Pubby allows you to serve, as separate RDF documents, smaller portions of a larger data set contained in one RDF file. In addition, it provides content negotiated and HTTP303-redirect-compliant URIs for such datasets. Pubby is configured to load RDF files into memory and to server them as Linked Data using the conf:loadRDFPubby configuration option. Use of Pubby to serve Linked Data from larger static files provides a low cost route to adoption for those with larger but relatively static data sets.

7.2 Serving Relational Databases

If your data is stored in a relational database it is usually a good idea to leave it there and just publish a Linked Data view on your existing legacy database.

A tool for serving Linked Data views on relational databases is D2R Server. D2R server relies on a declarative mapping between the schemata of the database and the target RDF terms. Based on this mapping, D2R Server serves a Linked Data view on your database and provides a SPARQL endpoint for the database.

There are several D2R Servers online, for example Berlin DBLP Bibliography Server, Hannover DBLP Bibliography Server, Web-based Systems @ FU Berlin Group Server or the EuroStat Countries and Regions Server.

Publishing a relational database on the Web as Linked Data using D2R Server typically involves the following steps:

Download and install the server software as described in the Quick Start section of the D2R Server homepage.
Have D2R Server auto-generate a D2RQ mapping from the schema of your database (see Quick Start).
Customize the mapping by replacing auto-generated terms with terms from well-known and publicly accessible RDF vocabularies.
Add your new data source to the ESW Wiki datasets list in the category Linked Data and SPARQL endpoint list and set several RDF links from your FOAF profile to the URIs of central data items within your new data source so that crawlers can discover your data.

7.3 Serving other Types of Information

If you information is currently represented in formats such as BibTEX, CSV, or Microsoft Excel and you want to serve the information as Linked Data on the Web it is usually a good idea to do the following:

Convert your data into RDF using an RDFizing tool. There are two locations where such tools are listed: ConverterToRdf maintained in the ESW Wiki, and RDFizers maintained by the SIMILE team.
After conversion, store your data in a RDF repository. A list of RDF repositories is maintained in the ESW Wiki.
Ideally the chosen RDF repository should come with a Linked Data interface which takes care of making your data Web accessible. As many RDF repositories have not implemented Linked Data interfaces yet, you can also choose a repository that provides a SPARQL endpoint and put Pubby as a Linked Data interface in front of your SPARQL endpoint.

The approach described above is taken by the DBpedia project, among others. The project uses PHP scripts to extract structured data from Wikipedia pages. This data is then converted to RDF and stored in a OpenLink Virtuoso repository which provides a SPARQL endpoint. In order to get a Linked Data view, Pubby is put in front of the SPARQL endpoint.

7.4 Implementing Wrappers around existing Applications or Web APIs

Large numbers of Web applications have started to make their data available on the Web through Web APIs. Examples of data sources providing such APIs include eBay, Amazon, Yahoo, Google and Google Base. An more comprehensive API list is found at Programmable Web. Different APIs provide diverse query and retrieval interfaces and return results using a number of different formats such as XML, JSON or ATOM. This leads to three general limitations of Web APIs:

their content can not be crawled by search engines
Web APIs can not be accessed using generic data browsers
Mashups must be implemented against a fixed number of data sources. [@@SUGGESTION: "Implementing mashups becomes increasingly complex as the number of data sources grows" - is this the intended meaning]

These limitations can be overcome by implementing Linked Data wrappers around APIs. In general, Linked Data wrappers do the following:

They assign HTTP URIs to data items that should be exposed to the Web.
When one of these URIs is dereferenced asking for application/rdf+xml, the wrapper rewrites the client's request into a request against the underlying API.
The results of the API request are transformed to RDF and send back to the client.

Examples of Linked Data Wrappers include:

RDF Book Mashup

The RDF Book Mashup makes information about books, their authors, reviews, and online bookstores available as RDF on the Web. The RDF Book Mashup assigns a HTTP URI to each book that has an ISBN number. Whenever one of these URIs is dereferenced, the Book Mashup requests data about the book, its author as well as reviews and sales offers from the Amazon API and the Google Base API. This data is then transformed into RDF and returned to the client.

The RDF Book Mashup is implemented as a small PHP script which can be used as a template for implementing similar wrappers around other Web APIs. More information about the Book Mashup and the relationship of Web APIs to Linked Data in general is available in The RDF Book Mashup: From Web APIs to a Web of Data (Slides).

SIOC Exporters for WordPress, Drupal, phpBB

The SIOC project has developed Linked Data wrappers for several popular blogging engines, content management systems and discussion forums. See SIOC Exporters for an up-to-date list of their wrappers. The project also provides a PHP Export API which enables developers to create further SIOC export tools without the need to get into technical details about how information is represented in RDF.

8. Testing and Debugging Linked Data

After you have published information as Linked Data on the Web, you should test whether your information can be accessed correctly.

An easy way of testing is to see whether your information displays correctly within different Linked Data browsers and whether the browsers can follow RDF links within your data. Therefore, take several URIs from your dataset and enter them into the navigation bar of the following Linked Data browsers:

Tabulator. If Tabulator is very slow at displaying your data it might be an indicator that your RDF graphs are too big. Tabulator also does some basic inferencing over Web data. Therefore, if Tabulator shows strange behavior, it might be an indicator that the RDF-S or OWL schemata you are using contain inconsistent rdfs:subClassOf and rdfs:subPropertyOf declarations.
Disco. The Disco browser uses a 2 second time-out when retrieving data from the Web. Therefore, it might be an indicator that your server is too slow, if Disco does not display your data correctly.

If you run into problems, you should do the following:

Test with cURL whether dereferencing your URIs leads to correct HTTP responses. Richard Cyganiak has published a tutorial on Debugging Semantic Web sites with cURL which leads you through the process.
Use the W3C's RDF Validation service to make sure that your service provides valid RDF/XML.

If you can not figure out yourself what is going wrong, ask on the Linking Open Data mailing list for help.

9. Discovering Linked Data on the Web

The standard way of discovering Linked Data on the Web is by following RDF Links within data the client already knows. In order to further ease discovery information providers can decide to support additional discovery mechanisms.

Ping the Semantic Web

Ping the Semantic Web is a registry service for RDF documents on the Web, which is used by several other services and client applications. Therefore, you can improve the discoverability of your data by registering your URIs with Ping The Semantic Web.

HTML LINK Auto-Discovery

It makes also sense in many cases to set links from a existing webpages to RDF data, for instance from your personal home page to your FOAF profile. Such links can be set using the HTML <link> element in the <head> of your HTML page.

<link rel="alternate" type="application/rdf+xml" 
 href="link_to_the_RDF_version" />

HTML <link> elements are use by browser extensions like Piggybank or Semantic Radar to discover RDF data on the Web.

Dataset List on the ESW Wiki

In order to make it easy not only for machines but also for humans to discover your data, you should add your dataset to the Dataset List on the ESW Wiki. Beside of a link to project, please put some example URIs from your dataset into the Wiki, so that people have starting points for browsing your dataset.

10. Further Reading

For more information about Linked Data please refer to:

Overview Material and Theoretical Background

Tim Berners-Lee: Linked Data (architecture note outlining the basic principels of Linked Data)
Tim Berners-Lee: Browsable Data (slides about Linked Data)
Tim Berners-Lee: Links on the Semantic Web (blog post stating the critical role of RDF links for the Semantic Web).
LinkedData.org
Wikipedia page on Linked Data
ESW Wikipage on Linked Data
W3C Semantic Web Activity Homepage
W3C Semantic Web Frequently Asked Questions
Paul Miller: Linked Data BOF and Linked Data once again ( blog posts about the Linked Data sessions at WWW2007)
Paul Miller: Opening the Silos - sustainable models for open data (video from the XTech 2007 - Open Data track)
Li Ding, Tim Finin: Characterizing the Semantic Web on the Web (ISWC2006 paper trying to measure the size of the Semantic Web, a bit outdated in the meantime).

Technical Documentation

Sauermann et al.: Cool URIs for the Semantic Web (tutorial on URI dereferencing and content-negotiation)
Alistair Miles et al.: Best Practice Recipes for Publishing RDF Vocabularies (W3C draft on serving RDF vocabularies according to the Linked Data principles)
Richard Cyganiak: Debugging Semantic Web sites with cURL (tutorial on how to test Semantic Web sites)
Semantic Web Tools (listing of RDF stores and development tools)
Christian Bizer, Daniel Westphal: Developers Guide to Semantic Web Toolkits (another list of RDF development tools)

Projects and Practical Experience with Publishing Linked Data

Christian Bizer et al.: Interlinking Open Data on the Web (poster) (Overview about the W3C SWEO Linking Open Data project)
W3C SWEO Linking Open Data Project (community effort that has interlinked data sources such as Geonames, DBpedia, Musicbrainz, ..., see its List of public Linked Data sources on the Web)
Christian Bizer et al.: The RDF Book Mashup: From Web APIs to a Web of Data (Slides). (SFSW2007 paper)
Christian Bizer et al: DBpedia - Querying Wikipedia like a Database (Slides). (WWW2007 presentation)
Sören Auer, Jens Lehmann: What have Innsbruck and Leipzig in common? Extracting Semantics from Wiki Content (Paper at ESWC 2007).

Linked Data Clients

Tim Berners-Lee et al.: Tabulator: Exploring and Analyzing Linked Data on the Semantic Web (Slides) (SWUI2007 paper about the Tabulator Linked Data browser and its data retrieval algorithms)
Tabulator RDF Browser (Linked Data browser implemented by Tim Berners-Lee et al.)
DISCO Hyperdata Browser (a simple Linked Data browser provided by Freie Universität Berlin)
OpenLink Data Web Browser (Linked Data browser provided by OpenLink)
Objectviewer (Linked Data browser provided by SemanticWebCentral)
Semantic Web Client Library (Java framework for building Linked Data clients)
Semantic Web client for SWI Prolog (This is a small Prolog program allowing to see the semantic-web as a single graph, and to browse it, similar to the Semantic Web Client Library)
Tabulator AJAR RDF library for Javascript (retrieval engine underlying the Tabulator browser)
OpenLink Ajax Toolkit (includes both a Data Access Layer of RDF, SQL, and XML called Ajax Database Connectivity and a collection of RDF aware controls covering: Graph Visualizers, TimeLines, Tag Clouds, Pivot Tables, and more)
See also List of Linked Data Clients (maintained by the W3C Linking Open Data project)

Appendix A: Example HTTP Session

This is an example HTTP session where a Linked Data browser tries to dereference the URI http://dbpedia.org/resource/Berlin, a URI for the city of Berlin, published by the DBpedia project.

To obtain a representation, the client connects to the dbpedia.org server and issues an HTTP GET request:

GET /resource/Berlin HTTP/1.1

Host: dbpedia.org

Accept: text/html;q=0.5, application/rdf+xml

The Accept: header indicates that the client would take either HTML or RDF; but the q=0.5 quality value for HTML tells the server that it prefers RDF. The server could answer:

HTTP/1.1 303 See Other

Location: http://dbpedia.org/data/Berlin

This is a 303 redirect, which tells us that the requested resource is a non-information resource, and related information can be found at the URI given in the Location: response header. Next we will try to dereference that URI. Note that if our Accept: header had indicated a preference for HTML, we would have been redirected to another URI.

GET /data/Berlin HTTP/1.1

Host: dbpedia.org

Accept: text/html;q=0.5, application/rdf+xml

The server could answer:

HTTP/1.1 200 OK

Content-Type: application/rdf+xml;charset=utf-8



<?xml version="1.0"?>

<rdf:RDF

    xmlns:units="http://dbpedia.org/units/"

    xmlns:foaf="http://xmlns.com/foaf/0.1/"

    xmlns:geonames="http://www.geonames.org/ontology#"

    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"

...

The 200 status code tells us that the response contains the representation of an information resource. The Content-Type: header tells us that the representation is in RDF/XML format. The rest of the message contains the representation. Only the beginning is shown.

Appendix B: How to get yourself onto the Web of Data

This section still needs work.

A great way to get started with publishing Linked Data on the Web is to serve a static RDF file; this can work well for small amounts of relatively simple data. One common example of this practice is providing a Friend-of-a-Friend (FOAF) file alongside (and interlinked with) your HTML home page.

This section provides step-by-step instructions on how to get yourself onto the Web of Data.

Services such as FOAF-a-Matic [http://www.ldodds.com/foaf/foaf-a-matic] will generate a basic FOAF description for you, which can then be saved as a static file and hosted on your own web space. This provides a great starting point to which you can later add additional data if you choose. After generating a FOAF file using FOAF-a-Matic, you'll need to decide where this will be hosted. One common convention is to name the file foaf.rdf and place it in the same directory as your home page. For example, Richard Cyganiak's FOAF file is located at <http://richard.cyganiak.de/foaf.rdf>; this is the URI of the RDF document, the document describes Richard, who is identified by the URI <http://richard.cyganiak.de/foaf.rdf#cygri>.

Change bNode to URI reference

Todo: cite Tim's post about "You deverve a URI" somewhere here.

This is an example of a hash URI, whereby something Richard is identified by a URI made up of a fragment identifier appended to the URI of a document [@@just a document? should we say information resource? could it be e.g. an image??]. By default FOAF-a-Matic will generate a hash URI of the form <http://your-chosen-host.tld/foaf.rdf#me> to identify you. This has the advantage of not requiring you to mint an additional URI to identify yourself, beyond that created by FOAF-a-Matic, however it does bring some disadvantages.

At the most basic level, identifying yourself using a URI that is dependent on the filename of your FOAF file can be problematic if you later decide to move or rename your FOAF file. Secondly, exactly what a hash URI identifies depends on the content type of the resource returned when the URI is dereferenced. If an HTML document is returned then the URI identifies a section of [@@or anchor within?] that document. If an RDF document is returned then the fragment identifies a thing [@@any thing? just non-information resources?]. Whilst this is perfectly valid from a technical point of view, it does require that Semantic Web clients do some additional work to determine what is identified by a hash URI, compared to the other form of URIs: slash URIs.

If you have some degree of control over the server where your data is hosted, you may wish to avoid the drawbacks of hash URIs by using slash URIs and content negotiation. These two concepts are described in some detail in Sections 2.3 and 3 of this document [@@do we need to say more here?]. Basic recipes for performing content negotiation on Apache web servers with mod_rewrite installed are outlined in the document "Best Practice Recipes for Publishing RDF Vocabularies" [http://www.w3.org/TR/swbp-vocab-pub/], which despite the name applies equally to serving RDF instance data and RDF vocabularies. However, it should be noted that these recipes do not provide content negotiation which is sensitive to q values in HTTP Accept: headers [@@as discussed somewhere? (other than FredG's blog post!)], therefore more comprehensive solutions should be implemented [@@refer to example scripts?].

Tell people where you life

Todo: Talk about based_near and DBpedia, Geonames
Talk about geo-coordinates.

How to link to your friends

FOAF-a-Matic enables you to create a number of foaf:knows relations in your FOAF file, stating that you know particular people, or more precisely that you know a thing of type foaf:Person, who is the owner of a certain mailbox. These relationships generally look something like:

...
<foaf:knows>
<foaf:Person>
<foaf:mbox_sha1sum>362ce75324396f0aa2d3e5f1246f40bf3bb44401</foaf:mbox_sha1sum>
<foaf:name>Dan Brickley</foaf:name>
<rdfs:seeAlso rdf:resource="http://danbri.org/foaf.rdf"/>
</foaf:Person>
</foaf:knows>
...

In these foaf:knows relationships generated by FOAF-a-Matic, people you know are represented by blank nodes, as FOAF-a-Matic has no way of knowing the appropriate URIs with which to identify the people you know. This is valid RDF, but isn't good Linked Data. rdfs:seeAlso statements provide pointers to locations where additional RDF data may be found, but the semantics of this relationship are very weak. To overcome this issue and create a Web of Linked Data, the blank nodes in your FOAF file should be replaced by URIrefs that identify the people you know. Therefore our previous example should be reworked as follows:

...
<foaf:knows>
<foaf:Person rdf:about="http://danbri.org/foaf.rdf#danbri">
<foaf:mbox_sha1sum>362ce75324396f0aa2d3e5f1246f40bf3bb44401</foaf:mbox_sha1sum>
<foaf:name>Dan Brickley</foaf:name>
<rdfs:seeAlso rdf:resource="http://danbri.org/foaf.rdf"/>
</foaf:Person>
</foaf:knows>
...

How to link to your publications

Todo: Talk about DBLP and Book Mashup and how to set links to them.

Appendix C: Changes

2007-07-14: Moved FOAF example into Appendix B. Updated images in section two.
2007-07-12: Small edits across the document. Sindice added.
2007-07-11: Initial version of this document.