Berlin SPARQL Benchmark (BSBM) - Explore Use Case

Authors:
Chris Bizer (Web-based Systems Group, Freie Universität Berlin, Germany)
Andreas Schultz (Institut für Informatik, Freie Universität Berlin, Germany)
 
This version:
http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/spec/20101129/ExploreUseCase/
Latest version:
http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/spec/ExploreUseCase/
Publication Date: 11/29/2010

Abstract

This document defines the Explore use case of the Berlin SPARQL Benchmark (BSBM) for measuring the performance of storage systems that expose SPARQL endpoints. The benchmark is built around an e-commerce use case, where a set of products is offered by different vendors and different consumers have posted reviews about products. The query mix of the Explore use case illustrates the search and navigation pattern of a consumer looking for a product.

Table of Contents

1. Introduction

The SPARQL Query Language for RDF and the SPARQL Protocol for RDF are implemented by a growing number of storage systems and are used within enterprise and open web settings. As SPARQL is taken up by the community there is a growing need for benchmarks to compare the performance of storage systems that expose SPARQL endpoints via the SPARQL protocol. Such systems include native RDF stores, Named Graph stores, systems that map relational databases into RDF, and SPARQL wrappers around other kinds of data sources.

The Berlin SPARQL Benchmark (BSBM) defines a suite of benchmarks for comparing the performance of these systems across architectures. The benchmark is built around an e-commerce use case in which a set of products is offered by different vendors and consumers have posted reviews about products. The benchmark query mix of the Explore use case illustrates the search and navigation pattern of a consumer looking for a product. All queries conform to the SPARQL 1.0 standard.

The Berlin SPARQL Benchmark was designed along three goals: First, the benchmark should allow the comparison of different storage systems that expose SPARQL endpoints across architectures. Testing storage systems with realistic workloads of use case motivated queries is a well established benchmarking technique in the database field and is for instance implemented by the TPC benchmarks. The Berlin SPARQL Benchmark should apply this technique to systems that expose SPARQL endpoints. As an increasing number of Semantic Web applications do not rely on heavyweight reasoning but focus on the integration and visualization of large amounts of data from autonomous data sources on the Web, the Berlin SPARQL Benchmark should not be designed to require complex reasoning but to measure the performance of queries against large amounts of RDF data.

The rest of this document is structured as follows: Section 2 defines the schema of benchmark dataset and describes the rules that are used by the data generator for populating the dataset according to the chosen scale factor. Section 3 defines the benchmark queries. Sections 4 defines how a system under test is verified against the qualification dataset.

2. Benchmark Dataset

The benchmark dataset is described in the BSBM dataset document.

3. Benchmark Queries

This section defines a suite of benchmark queries and a query mix.

The benchmark queries are designed to emulate the search and navigation pattern of a consumer looking for a product. A product search includes the following steps:

  1. Generic search for a given set of generic product properties.
  2. More specific search for products with a given set of product properties.
  3. Find similar products for a given product.
  4. Retrieve detailed information about several specific products.
  5. Retrieve reviews for given products.
  6. Get background information about reviewers.
  7. Retrieve offers for given products.
  8. Check information about vendors and their delivery conditions.
  9. Export the chosen offer into another information system which uses a different schema.

There are three representations of the benchmark query set: One for the Triple and one for the Named Graphs data model as well as a pure SQL version for the relational representation given in section 2.2.4. All query sets have the same semantics.

3.4 Query Mix

Complete Query Mix

The complete query mix consists of 25 queries that simulate a product search by a single consumer. The query sequenze is given below:

  1. Query 1: Find products for a given set of generic features.
  2. Query 2: Retrieve basic information about a specific product for display purposes.
  3. Query 2: Retrieve basic information about a specific product for display purposes.
  4. Query 3: Find products for a given more specific set of features.
  5. Query 2: Retrieve basic information about a specific product for display purposes.
  6. Query 2: Retrieve basic information about a specific product for display purposes.
  7. Query 4: Find products matching two different sets of features.
  8. Query 2: Retrieve basic information about a specific product for display purposes.
  9. Query 2: Retrieve basic information about a specific product for display purposes.
  10. Query 5: Find products that are similar to a given product.
  11. Query 7: Retrieve in-depth information about a specific product including offers and reviews.
  12. Query 7: Retrieve in-depth information about a specific product including offers and reviews.
  13. Query 5: Find products having a label that contains a specific string.
  14. Query 7: Retrieve in-depth information about a specific product including offers and reviews.
  15. Query 7: Retrieve in-depth information about a specific product including offers and reviews.
  16. Query 8: Give me recent German reviews for a specific product.
  17. Query 9: Get information about a reviewer.
  18. Query 9: Get information about a reviewer.
  19. Query 8: Give me recent German reviews for a specific product.
  20. Query 9: Get information about a reviewer.
  21. Query 9: Get information about a reviewer .
  22. Query 10: Get offers for a given product which fulfill specific requirements.
  23. Query 10: Get offers for a given product which fulfill specific requirements.
  24. Query 11: Get all information about an offer.
  25. Query 12: Export information about an offer into another schemata.

3.2 SPARQL Queries for the Triple Data Model

Each query is defined by the following components:

Query 1: Find products for a given set of generic features.

Use Case Motivation: A consumer is looking for a product and has a general idea about what he wants.

SPARQL Query:


PREFIX bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT DISTINCT ?product ?label
WHERE {
?product rdfs:label ?label .
?product a %ProductType% .
?product bsbm:productFeature %ProductFeature1% .
?product bsbm:productFeature %ProductFeature2% .
?product bsbm:productPropertyNumeric1 ?value1 .
FILTER (?value1 > %x%)

}
ORDER BY ?label
LIMIT 10

Parameters:

Parameter Description
%ProductType% A randomly selected Class URI from the class hierarchy (one level above leaf level).
%ProductFeature1%
%ProductFeature2%
Two different, randomly selected feature URIs that correspond to the chosen product type.
%x% A number between 1 and 500

Query Properties:

Query 2: Retrieve basic information about a specific product for display purposes

Use Case Motivation: The consumer wants to view basic information about products found by query 1.

SPARQL Query

PREFIX bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

SELECT ?label ?comment ?producer ?productFeature ?propertyTextual1 ?propertyTextual2 ?propertyTextual3
 ?propertyNumeric1 ?propertyNumeric2 ?propertyTextual4 ?propertyTextual5 ?propertyNumeric4
WHERE {
%ProductXYZ% rdfs:label ?label .
%ProductXYZ% rdfs:comment ?comment .
%ProductXYZ% bsbm:producer ?p .
?p rdfs:label ?producer .
%ProductXYZ% dc:publisher ?p .
%ProductXYZ% bsbm:productFeature ?f .
?f rdfs:label ?productFeature .
%ProductXYZ% bsbm:productPropertyTextual1 ?propertyTextual1 .
%ProductXYZ% bsbm:productPropertyTextual2 ?propertyTextual2 .
%ProductXYZ% bsbm:productPropertyTextual3 ?propertyTextual3 .
%ProductXYZ% bsbm:productPropertyNumeric1 ?propertyNumeric1 .
%ProductXYZ% bsbm:productPropertyNumeric2 ?propertyNumeric2 .
OPTIONAL { %ProductXYZ% bsbm:productPropertyTextual4 ?propertyTextual4 }
OPTIONAL { %ProductXYZ% bsbm:productPropertyTextual5 ?propertyTextual5 }
OPTIONAL { %ProductXYZ% bsbm:productPropertyNumeric4 ?propertyNumeric4 }
}

Parameters:

Parameter Description
%ProductXYZ% A product URI (randomly selected)

Query Properties:

Query 3: Find products having some specific features and not having one feature.

Use Case Motivation: After looking at information about some products, the consumer has a more specific idea what we wants. Therefore, he asks for products having several features but not having a specific other feature.

SPARQL Query:

PREFIX bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ?product ?label
WHERE {
?product rdfs:label ?label .
?product a %ProductType% .
?product bsbm:productFeature %ProductFeature1% .
?product bsbm:productPropertyNumeric1 ?p1 .
FILTER ( ?p1 > %x% )
?product bsbm:productPropertyNumeric3 ?p3 .
FILTER (?p3 < %y% )
OPTIONAL {
?product bsbm:productFeature %ProductFeature2% .
?product rdfs:label ?testVar }
FILTER (!bound(?testVar))
}
ORDER BY ?label
LIMIT 10

Parameters:

Parameter Description
%ProductType% A randomly selected Class URI from the class hierarchy (leaf level).
%ProductFeature1%
%ProductFeature2%
Three different, randomly selected product feature URI that correspond to the chosen product type.
%x%
%y%
Two random numbers between 1 and 500

Query Properties:

Query 4: Find products matching two different sets of features.

Use Case Motivation: After looking at information about some products, the consumer has a more specific idea what we wants. Therefore, he asks for products matching either one set of features or another set.

SPARQL Query:

PREFIX bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT DISTINCT ?product ?label ?propertyTextual
WHERE {
{
?product rdfs:label ?label .
?product rdf:type %ProductType% .
?product bsbm:productFeature %ProductFeature1% .
?product bsbm:productFeature %ProductFeature2% .
?product bsbm:productPropertyTextual1 ?propertyTextual .
?product bsbm:productPropertyNumeric1 ?p1 .
FILTER ( ?p1 > %x% )
} UNION {
?product rdfs:label ?label .
?product rdf:type %ProductType% .
?product bsbm:productFeature %ProductFeature1% .
?product bsbm:productFeature %ProductFeature3% .
?product bsbm:productPropertyTextual1 ?propertyTextual .
?product bsbm:productPropertyNumeric2 ?p2 .
FILTER ( ?p2> %y% )
}
}
ORDER BY ?label
OFFSET 5
LIMIT 10

Parameters:

Parameter Description
%ProductType% A randomly selected Class URI from the class hierarchy (leaf level).
%ProductFeature1%
%ProductFeature2%
%ProductFeature3%
Three different, randomly selected product feature URI that correspond to the chosen product type.
%x%
%y%
Two random numbers between 1 and 500

Query Properties:

Query 5: Find product that are similar to a given product.

Use Case Motivation: The consumer has found a product that fulfills his requirements. He now wants to find products with similar features.

SPARQL Query:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>

SELECT DISTINCT ?product ?productLabel
WHERE {
?product rdfs:label ?productLabel .
FILTER (%ProductXYZ% != ?product)
%ProductXYZ% bsbm:productFeature ?prodFeature .
?product bsbm:productFeature ?prodFeature .
%ProductXYZ% bsbm:productPropertyNumeric1 ?origProperty1 .
?product bsbm:productPropertyNumeric1 ?simProperty1 .
FILTER (?simProperty1 < (?origProperty1 + 120) && ?simProperty1 > (?origProperty1 – 120))
%ProductXYZ% bsbm:productPropertyNumeric2 ?origProperty2 .
?product bsbm:productPropertyNumeric2 ?simProperty2 .
FILTER (?simProperty2 < (?origProperty2 + 170) && ?simProperty2 > (?origProperty2 – 170))
}
ORDER BY ?productLabel
LIMIT 5

Parameters:

Parameter Description
%ProductXYZ% A product URI (randomly selected)

Query Properties:

Query 6: Find products having a label that contains a specific string. (Not used in the query mix anymore)

Use Case Motivation: The consumer remembers parts of a product name from former searches. He wants to find the product again by searching for the parts of the name that he remembers.

SPARQL Query:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>

SELECT ?product ?label
WHERE {
?product rdfs:label ?label .
?product rdf:type bsbm:Product .
FILTER regex(?label, "%word1%")
}

Parameters:

Parameter Description
%word1%
A word from the list of words that were used in the dataset generation.

Query Properties:

Query 7: Retrieve in-depth information about a specific product including offers and reviews.

Use Case Motivation: The consumer has found a products which fulfills his requirements. Now he wants in-depth information about this product including offers from German vendors and product reviews if existent.

SPARQL Query:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rev: <http://purl.org/stuff/rev#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

SELECT ?productLabel ?offer ?price ?vendor ?vendorTitle ?review ?revTitle
?reviewer ?revName ?rating1 ?rating2
WHERE {
%ProductXYZ% rdfs:label ?productLabel .
OPTIONAL {
?offer bsbm:product %ProductXYZ% .
?offer bsbm:price ?price .
?offer bsbm:vendor ?vendor .
?vendor rdfs:label ?vendorTitle .
?vendor bsbm:country <http://downlode.org/rdf/iso-3166/countries#DE> .
?offer dc:publisher ?vendor .
?offer bsbm:validTo ?date .
FILTER (?date > %currentDate% )
}
OPTIONAL {
?review bsbm:reviewFor %ProductXYZ% .
?review rev:reviewer ?reviewer .
?reviewer foaf:name ?revName .
?review dc:title ?revTitle .
OPTIONAL { ?review bsbm:rating1 ?rating1 . }
OPTIONAL { ?review bsbm:rating2 ?rating2 . }
}
}

Parameters:

Parameter Description
%ProductXYZ% A product URI (randomly selected)
%currentDate% A date within the validFrom validTo range of the offers (same date for all queries within a run).

Query Properties:

Query 8: Give me recent reviews in English for a specific product.

Use Case Motivation: The consumer wants to read the 20 most recent English language reviews about a specific product.

SPARQL Query:

PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX rev: <http://purl.org/stuff/rev#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?title ?text ?reviewDate ?reviewer ?reviewerName ?rating1 ?rating2 ?rating3 ?rating4
WHERE {
?review bsbm:reviewFor %ProductXYZ% .
?review dc:title ?title .
?review rev:text ?text .
FILTER langMatches( lang(?text), "EN" )
?review bsbm:reviewDate ?reviewDate .
?review rev:reviewer ?reviewer .
?reviewer foaf:name ?reviewerName .
OPTIONAL { ?review bsbm:rating1 ?rating1 . }
OPTIONAL { ?review bsbm:rating2 ?rating2 . }
OPTIONAL { ?review bsbm:rating3 ?rating3 . }
OPTIONAL { ?review bsbm:rating4 ?rating4 . }
}
ORDER BY DESC(?reviewDate)
LIMIT 20

Parameters:

Parameter Description
%ProductXYZ% A product URI (randomly selected)

Query Properties:

Query 9: Get information about a reviewer.

Use Case Motivation: In order to decide whether to trust a review, the consumer asks for any kind of information that is available about the reviewer.

SPARQL Query:

PREFIX rev: <http://purl.org/stuff/rev#>

DESCRIBE ?x
WHERE { %ReviewXYZ% rev:reviewer ?x }

Parameters:

Parameter Description
%ReviewXYZ% A review URI (randomly selected)

Query Properties:

Query 10: Get offers for a given product which fulfill specific requirements.

Use Case Motivation: The consumer wants to buy from a vendor in the United States that is able to deliver within 3 days and is looking for the cheapest offer that fulfills these requirements.

SPARQL Query:

PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

SELECT DISTINCT ?offer ?price
WHERE {
?offer bsbm:product %ProductXYZ% .
?offer bsbm:vendor ?vendor .
?offer dc:publisher ?vendor .
?vendor bsbm:country <http://downlode.org/rdf/iso-3166/countries#US> .
?offer bsbm:deliveryDays ?deliveryDays .
FILTER (?deliveryDays <= 3)
?offer bsbm:price ?price .
?offer bsbm:validTo ?date .
FILTER (?date > %currentDate% )
}
ORDER BY xsd:double(str(?price))
LIMIT 10

Parameters:

Parameter Description
%ProductXYZ% A product URI (randomly selected)
%currentDate% A date within the validFrom-validTo range of the offers (same date for all queries within a run).

Query Properties:

Query 11: Get all information about an offer.

Use Case Motivation: After deciding on a specific offer, the consumer wants to get all information that is directly related to this offer.

SPARQL Query:

SELECT ?property ?hasValue ?isValueOf
WHERE {
{ %OfferXYZ% ?property ?hasValue }
UNION
{ ?isValueOf ?property %OfferXYZ% }
}

Parameters:

Parameter Description
%OfferXYZ% An offer URI (randomly selected)

Query Properties:

Query 12: Export information about an offer into another schemata.

Use Case Motivation: After deciding on a specific offer, the consumer wants to save information about this offer on his local machine using a different RDF schema.

SPARQL Query:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rev: <http://purl.org/stuff/rev#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
PREFIX bsbm-export: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/export/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

CONSTRUCT { %OfferXYZ% bsbm-export:product ?productURI .
%OfferXYZ% bsbm-export:productlabel ?productlabel .
%OfferXYZ% bsbm-export:vendor ?vendorname .
%OfferXYZ% bsbm-export:vendorhomepage ?vendorhomepage .
%OfferXYZ% bsbm-export:offerURL ?offerURL .
%OfferXYZ% bsbm-export:price ?price .
%OfferXYZ% bsbm-export:deliveryDays ?deliveryDays .
%OfferXYZ% bsbm-export:validuntil ?validTo }
WHERE { %OfferXYZ% bsbm:product ?productURI .
?productURI rdfs:label ?productlabel .
%OfferXYZ% bsbm:vendor ?vendorURI .
?vendorURI rdfs:label ?vendorname .
?vendorURI foaf:homepage ?vendorhomepage .
%OfferXYZ% bsbm:offerWebpage ?offerURL .
%OfferXYZ% bsbm:price ?price .
%OfferXYZ% bsbm:deliveryDays ?deliveryDays .
%OfferXYZ% bsbm:validTo ?validTo }

Parameters:

Parameter Description
%OfferXYZ% An offer URI (randomly selected)

Query Properties:

3.3 SPARQL Queries for the Named Graph Data Model

The queries for the Named Graphs data model have the same semantics as the queries for the triple data model. The queries do not specify the IRIs of the named graphs in the RDF Dataset using the FROM NAMED clause, but assume that the query is executed against the complete RDF Dataset.

This is still work in progress ...
Todo: Rewrite all queries for Named Graphs. Two examples are already found below:

Query 2: Retrieve basic information about a specific product for display purposes

SPARQL Query

PREFIX bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?label ?comment ?producer ?productFeature ?propertyTextual1 ?propertyTextual2
 ?propertyNumeric1 ?propertyNumeric2 ?propertyTextual4 ?propertyTextual5 ?propertyNumeric4
WHERE {
GRAPH ?graph {
%ProductXYZ% rdfs:label ?label .
%ProductXYZ% rdfs:comment ?comment .
%ProductXYZ% bsbm:producer ?p .
?p rdfs:label ?producer .
%ProductXYZ% bsbm:productFeature ?f .
?f rdfs:label ?productFeature .
%ProductXYZ% bsbm:productPropertyTextual1 ?propertyTextual1 .
%ProductXYZ% bsbm:productPropertyTextual2 ?propertyTextual2 .
%ProductXYZ% bsbm:productPropertyNumeric1 ?propertyNumeric1 .
%ProductXYZ% bsbm:productPropertyNumeric2 ?propertyNumeric2 .
OPTIONAL { %ProductXYZ% bsbm:productPropertyTextual4 ?propertyTextual4 }
OPTIONAL { %ProductXYZ% bsbm:productPropertyTextual5 ?propertyTextual5 }
OPTIONAL { %ProductXYZ% bsbm:productPropertyNumeric4 ?propertyNumeric4 }
}
GRAPH localhost:provenanceData {
?graph dc:publisher ?p .
}
}

Query 7: Retrieve in-depth information about a specific product including offers and reviews.

SPARQL Query:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rev: <http://purl.org/stuff/rev#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

SELECT ?productLabel ?offer ?price ?vendor ?vendorTitle ?review ?revTitle
?reviewer ?revName ?rating1 ?rating2
WHERE {
GRAPH ?producerGraph {
%ProductXYZ% rdfs:label ?productLabel .
}
OPTIONAL {
GRAPH ?vendorGraph {
?offer bsbm:product %ProductXYZ% .
?offer bsbm:price ?price .
?offer bsbm:vendor ?vendor .
?vendor rdfs:label ?vendorTitle .
?offer bsbm:validTo ?date .
FILTER (?date > %currentDate% )
}
}
OPTIONAL {
GRAPH ?ratingSiteGraph {
?review bsbm:reviewFor %ProductXYZ% .
?review rev:reviewer ?reviewer .
?reviewer foaf:name ?revName .
?review dc:title ?revTitle .
OPTIONAL { ?review bsbm:rating1 ?rating1 . }
OPTIONAL { ?review bsbm:rating2 ?rating2 . }
}
}
GRAPH localhost:provenanceData {
?vendorGraph dc:publisher ?vendor .
}
}

3.4 SQL Queries for the Relational Data Model

This section will contain a SQL representation of the benchmark queries in order to be able to compare the performance of stores that expose SPARQL endpoints to the performance of classic SQL-based RDBMS. Since there is no exact counterpart to some SPARQL specific query forms like DESCRIBE, the SQL queries aren't semantically completely equivalent.

Query 1: Find products for a given set of generic features.

Use Case Motivation: A consumer is looking for a product and has a general idea about what he wants.

SQL Query:

SELECT distinct nr, label
FROM product p, producttypeproduct ptp
WHERE p.nr = ptp.product AND ptp.productType=@ProductType@
AND propertyNum1 > @x@
AND p.nr IN (SELECT distinct product FROM productfeatureproduct WHERE productFeature=@ProductFeature1@)
AND p.nr IN (SELECT distinct product FROM productfeatureproduct WHERE productFeature=@ProductFeature2@)
ORDER BY label
LIMIT 10;

Parameters:

Parameter Description
@ProductType@ A randomly selected Class ID from the class hierarchy (one level above leaf level).
@ProductFeature1@
@ProductFeature2@
Two different, randomly selected feature IDs that correspond to the chosen product type.
@x@ A number between 1 and 500

Query 2: Retrieve basic information about a specific product for display purposes

Use Case Motivation: The consumer wants to view basic information about products found by query 1.

SQL Query

SELECT pt.label, pt.comment, pt.producer, productFeature, propertyTex1, propertyTex2, propertyTex3, 
propertyNum1, propertyNum2, propertyTex4, propertyTex5, propertyNum4
FROM product pt, producer pr, productfeatureproduct pfp
WHERE pt.nr=@ProductXYZ@ AND pt.nr=pfp.product AND pt.producer=pr.nr;

Parameters:

Parameter Description
@ProductXYZ@ A product ID (randomly selected)

Query 3: Find products having some specific features and not having one feature.

Use Case Motivation: After looking at information about some products, the consumer has a more specific idea what we wants. Therefore, he asks for products having several features but not having a specific other feature.

SQL Query:

SELECT p.nr, p.label
FROM product p, producttypeproduct ptp
WHERE p.nr=ptp.product
AND productType=@ProductType@
AND propertyNum1>@x@
AND propertyNum3<@y@
AND @ProductFeature1@ IN (SELECT productFeature FROM productfeatureproduct WHERE product=p.nr)
AND @ProductFeature2@ NOT IN (SELECT productFeature FROM productfeatureproduct WHERE product=p.nr)
ORDER BY p.label
LIMIT 10;

Parameters:

Parameter Description
@ProductType@ A randomly selected Class ID from the class hierarchy (leaf level).
@ProductFeature1@
@ProductFeature2@
Three different, randomly selected product feature ID that correspond to the chosen product type.
@x@
@y@
Two random numbers between 1 and 500

Query 4: Find products matching two different sets of features.

Use Case Motivation: After looking at information about some products, the consumer has a more specific idea what we wants. Therefore, he asks for products matching either one set of features or another set.

SQL Query:

SELECT distinct p.nr, p.label, p.propertyTex1
FROM product p, producttypeproduct ptp
WHERE p.nr=ptp.product AND ptp.productType=@ProductType@
AND p.nr IN (SELECT distinct product FROM productfeatureproduct WHERE productFeature=@ProductFeature1@)
AND ((propertyNum1>@x@ AND p.nr IN (SELECT distinct product FROM productfeatureproduct WHERE productFeature=@ProductFeature2@)
) OR (propertyNum2>@y@ AND p.nr IN (SELECT distinct product FROM productfeatureproduct WHERE productFeature=@ProductFeature3@)))
ORDER BY label
LIMIT 10
OFFSET 5;

Parameters:

Parameter Description
@ProductType@ A randomly selected Class ID from the class hierarchy (leaf level).
@ProductFeature1@
@ProductFeature2@
@ProductFeature3@
Three different, randomly selected product feature IDs that correspond to the chosen product type.
@x@
@y@
Two random numbers between 1 and 500

Query 5: Find product that are similar to a given product.

Use Case Motivation: The consumer has found a product that fulfills his requirements. He now wants to find products with similar features.

SQL Query:

SELECT distinct p.nr, p.label
FROM product p, product po,
(Select distinct pfp1.product FROM productfeatureproduct pfp1, (SELECT productFeature FROM productfeatureproduct WHERE product=@ProductXYZ@) pfp2 WHERE pfp2.productFeature=pfp1.productFeature) pfp
WHERE p.nr=pfp.product AND po.nr=@ProductXYZ@ AND p.nr!=po.nr
AND p.propertyNum1<(po.propertyNum1+120) AND p.propertyNum1>(po.propertyNum1-120)
AND p.propertyNum2<(po.propertyNum2+170) AND p.propertyNum2>(po.propertyNum2-170)
ORDER BY label
LIMIT 5;

Parameters:

Parameter Description
@ProductXYZ@ A product ID (randomly selected)

Query 6: Find products having a label that contains a specific string.

Use Case Motivation: The consumer remembers parts of a product name from former searches. He wants to find the product again by searching for the parts of the name that he remembers.

SQL Query:

SELECT nr, label
FROM product
WHERE label like "%@word1@%";

Parameters:

Parameter Description
@word1@
A word from the list of words that were used in the dataset generation.

Query 7: Retrieve in-depth information about a specific product including offers and reviews.

Use Case Motivation: The consumer has found a products which fulfills his requirements. Now he wants in-depth information about this product including offers from German vendors and product reviews if existent.

SQL Query:

SELECT *
FROM (select label from product where nr=@ProductXYZ@) p left join
((select o.nr as onr, o.price, v.nr as vnr, v.label from offer o, vendor v where @ProductXYZ@=o.product AND
o.vendor=v.nr AND v.country='DE' AND o.validTo>'@currentDate@') ov right join
(select r.nr as rnr, r.title, pn.nr as pnnr, pn.name, r.rating1, r.rating2 from review r, person pn where r.product=@ProductXYZ@ AND
r.person=pn.nr) rpn on (1=1)) on (1=1);

Parameters:

Parameter Description
@ProductXYZ@ A product ID (randomly selected)
@currentDate@ A date within the validFrom validTo range of the offers (same date for all queries within a run).

Query 8: Give me recent reviews in English for a specific product.

Use Case Motivation: The consumer wants to read the 20 most recent English language reviews about a specific product.

SQL Query:

SELECT r.title, r.text, r.reviewDate, p.nr, p.name, r.rating1, r.rating2, r.rating3, r.rating4
FROM review r, person p
WHERE r.product=@ProductXYZ@ AND r.person=p.nr
AND r.language='en'
ORDER BY r.reviewDate desc
LIMIT 20;

Parameters:

Parameter Description
@ProductXYZ@ A product ID (randomly selected)

Query 9: Get information about a reviewer.

Use Case Motivation: In order to decide whether to trust a review, the consumer asks for any kind of information that is available about the reviewer.

SQL Query:

SELECT p.nr, p.name, p.mbox_sha1sum, p.country, r2.nr, r2.product, r2.title
FROM review r, person p, review r2
WHERE r.nr=@ReviewXYZ@ AND r.person=p.nr AND r2.person=p.nr;

Parameters:

Parameter Description
@ReviewXYZ@ A review ID (randomly selected)

Query 10: Get offers for a given product which fulfill specific requirements.

Use Case Motivation: The consumer wants to buy from a vendor in the United States that is able to deliver within 3 days and is looking for the cheapest offer that fulfills these requirements.

SQL Query:

SELECT distinct o.nr, o.price
FROM offer o, vendor v
WHERE o.product=@ProductXYZ@
AND o.deliveryDays<=3 AND v.country='US'
AND o.validTo>'@currentDate@' AND o.vendor=v.nr
Order BY o.price
LIMIT 10;

Parameters:

Parameter Description
@ProductXYZ@ A product ID (randomly selected)
@currentDate@ A date within the validFrom-validTo range of the offers (same date for all queries within a run).

Query 11: Get all information about an offer.

Use Case Motivation: After deciding on a specific offer, the consumer wants to get all information that is directly related to this offer.

SQL Query:

Select product, producer, vendor, price, validFrom, validTo, deliveryDays, offerWebpage, publisher, publishDate
from offer
where nr=@OfferXYZ@;

Parameters:

Parameter Description
@OfferXYZ@ An offer ID (randomly selected)

Query 12: Export information about an offer into another schemata.

Use Case Motivation: After deciding on a specific offer, the consumer wants to save information about this offer on his local machine using a different RDF schema.

SQL Query:

Select p.nr As productNr, p.label As productlabel, v.label As vendorname, v.homepage As vendorhomepage,
o.offerWebpage As offerURL, o.price As price, o.deliveryDays As deliveryDays, o.validTo As validTo
From offer o, product p, vendor v
Where o.nr=@OfferXYZ@ AND o.product=p.nr AND o.vendor=v.nr;

Parameters:

Parameter Description
@OfferXYZ@ An offer ID (randomly selected)

4. Qualification Dataset and Tests

Before the performance of a SUT is measured, it has to be verified that the SUT returns correct results for the benchmark queries.

For testing whether a SUT returns correct results, the BSBM benchmark provides a qualification dataset and a qualification tool which compares the query results of a SUT with the correct query results. At the moment, the qualification tool verifies only the results of SELECT queries. The results of DESCRIBE and CONSTRUCT queries (queries 9 and 12) are not checked.

A BSBM qualification test is conducted in the two-step procedure described below:

  1. Load the qualification dataset into the SUT (look under 2. for generator-instructions), then run a qualification test against the SUT. The qualification test is supported by the Test Driver by running it with the -q parameter.  Example:
    $ java -cp bin:lib/* benchmark.testdriver.TestDriver -q http://SUT/sparql

    where http://SUT/sparql specifies the SPARQL endpoint
    This will create a qualification file named "run.qual" (different file name can be specified with the "-qf" parameter) which is used in step 2. Also the run.log (if logging is set to "ALL" in the log4j.xml file) contains all queries with full result text, so single queries can be examined later on.
  2. Use the Qualification tool from the benchmark.qualification package. It has the following options:
Option Description
-rc Only check the amount of results returned and not the result content.
-ql <qualification log file name> Specify the file name to write the qualification test results into.

You also need a correct qualification file to test your own qualification file against. At this moment we only offer a qualification file that was generated for a 100M dataset. The 100m dataset is generated with the following command:
$ ./generate -fc -pc 284826 (Unix, CygWin)

or

java -cp bin;lib\* -Xmx256M benchmark.generator.Generator -fc -pc 284826 (Windows)
This will also generate the test driver data in the "td_data" directory, which needs to be in place when the test driver is run against the 100M dataset. If there is a need to also qualify against smaller datasets, please contact us and we will gladly add more options.

Then run the Qualification test. Example:
$ java -cp bin:lib/* benchmark.qualification.Qualification correct_100m.qual run.qual

where run.qual is the qualification file generated by the Test Driver in qualification mode
This generates by default a log file called "qual.log" with the following content: A variation does not always mean an error. For example if the SUT returns an  xsd:dateTime value in another (correct) format than expected. So variations should always be inspected in the verbose log file which was generated by the Test Driver (run.log). For comparison means a Test Driver log file is available of a correct run.

5. References

For more information about RDF and SPARQL Benchmarks please refer to:

ESW Wiki Page about RDF Benchmarks

Other SPARQLBenchmarks

Papers about RDF and SPARQL Benchmarks

Appendix A: Changes

Appendix B: Acknowledgements

The work on the BSBM Benchmark Version 3 is funded through the LOD2 project