Berlin SPARQL Benchmark (BSBM) Specification - V2.0

Authors:: Chris Bizer (Web-based Systems Group, Freie Universität Berlin, Germany); Andreas Schultz (Institut für Informatik, Freie Universität Berlin, Germany)
This version:: http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/spec/20080912/
Latest version:: http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/spec
Publication Date: 09/12/2008

Abstract

This document defines the Berlin SPARQL Benchmark (BSBM) for measuring the performance of storage systems that expose SPARQL endpoints. Such systems include native RDF stores, Named Graph stores, systems that map relational databases into RDF, and SPARQL wrappers around other kinds of data sources. The benchmark is built around an e-commerce use case, where a set of products is offered by different vendors and different consumers have posted reviews about products. The benchmark query mix illustrates the search and navigation pattern of a consumer looking for a product.

1. Introduction
2. Benchmark Dataset
3. Benchmark Queries
4. Performance Metrics
5. Rules for Running the Benchmark and Reporting Results
6. Benchmark Scenarios
7. Reporting Results
8. Qualification Dataset and Tests
9. Data Generator and Test Driver
Appendix A: Changes
Appendix B: Thanks

1. Introduction

The SPARQL Query Language for RDF and the SPARQL Protocol for RDF are implemented by a growing number of storage systems and are used within enterprise and open web settings. As SPARQL is taken up by the community there is a growing need for benchmarks to compare the performance of storage systems that expose SPARQL endpoints via the SPARQL protocol. Such systems include native RDF stores, Named Graph stores, systems that map relational databases into RDF, and SPARQL wrappers around other kinds of data sources.

The Berlin SPARQL Benchmark (BSBM) defines a suite of benchmarks for comparing the performance of these systems across architectures. The benchmark is built around an e-commerce use case in which a set of products is offered by different vendors and consumers have posted reviews about products. The benchmark query mix illustrates the search and navigation pattern of a consumer looking for a product.

The Berlin SPARQL Benchmark (BSBM) consists of:

benchmark dataset, which is scalable to different sizes based on a scale factor. There are three representations of the benchmark dataset: The first version represents the scenario data using the RDF triple data model, the second version represents the data using the Named Graphs data model, the third version represents the data uses the relational data model. All three representations have the same semantics.
12 use-case-motivated benchmark queries. There are three representations of the queries: One for the RDF data model, one for the Named Graphs data model and one for the relational data model.
2 query mixes, consisting of use-case-motivated sequences of queries.
performance metrics together with rules on how to run the benchmark and measure the metrics .
a data generator and a test driver for the benchmark. Both available under GNU license.

The Berlin SPARQL Benchmark was designed along three goals: First, the benchmark should allow the comparison of different storage systems that expose SPARQL endpoints across architectures. Testing storage systems with realistic workloads of use case motivated queries is a well established benchmarking technique in the database field and is for instance implemented by the TPC benchmarks. The Berlin SPARQL Benchmark should apply this technique to systems that expose SPARQL endpoints. As an increasing number of Semantic Web applications do not rely on heavyweight reasoning but focus on the integration and visualization of large amounts of data from autonomous data sources on the Web, the Berlin SPARQL Benchmark should not be designed to require complex reasoning but to measure the performance of queries against large amounts of RDF data.

This document defines version 2 of the BSBM Benchmark. Compared to version 1, which has been released in September 2008, version 2 contains two additional benchmark queries, a relational representation of the benchmark dataset and a SQL representation of the benchmark queries. In addition, the data generation rules and the benchmark queries were fine-tuned based on the experience from running version 1 against various stores.

The rest of this document is structured as follows: Section 2 defines the schema of benchmark dataset and describes the rules that are used by the data generator for populating the dataset according to the chosen scale factor. Section 3 defines the benchmark queries. Section 4 describes the performance metrics that are calculated by the test driver. Section 5 defines the rules that must be met when running the benchmark. Section 6 describes different benchmarking scenarios. Section 7 specifies the content of the BSBM full-disclosure report. Sections 8 and 9 define how a system under test is verified against the qualification dataset and describe the usage of the data generator and test driver.

2. Benchmark Dataset

This section defines the logical schema of the BSBM benchmark dataset (2.1) and the RDF triple, Named Graphs and relational representation of this schema. Section 2.3 defines the data generation rules that are used by the data generator to populate the dataset according to a given scale factor.

2.1. Logical Schema

This section defines the logical schema for the benchmark dataset. The dataset is based on an e-commerce use case, where a set of products is offered by different vendors and different consumers have posted reviews about these products on various review sites.

2.1.1 Namespaces

Prefix	Namespace
rdf:	http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs:	http://www.w3.org/2000/01/rdf-schema#
foaf:	http://xmlns.com/foaf/0.1/
dc:	http://purl.org/dc/elements/1.1/
xsd:	http://www.w3.org/2001/XMLSchema#
rev:	http://purl.org/stuff/rev#
bsbm:	http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/
bsbm-inst:	http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/

2.2.2 Classes and Properties

Meta data Properties

dc:publisher (Resource: Vendor, Producer, ...)
dc:date (literal: xsd:date)

The meta data properties are used to capture the information source and the publication date of each instance.

Class Product

rdfs:label (literal: String)
rdfs:comment (literal: String)
rdf:type (resource: ProductType)
bsbm:producer (resource: Producer)
bsbm:productPropertyTextualX (literal: String, there are different productPropertyTextual properties, some are optional)
bsbm:productPropertyNumericX (literal: Number there are different productPropertyNumeric properties, some are optional)
bsbm:productFeature (Resource: ProductFeature)

Comment: Products are described by different sets of product properties and product features.

Example RDF Instance:

dataFromProducer001411:Product00001435443 
 rdf:type bsbm:Product;
 rdf:type bsbm-inst:ProductType001342;
 rdfs:label "Canon Ixus 20010" ;
 rdfs:comment "Mit ihrer hochwertigen Verarbeitung, innovativen Technologie und faszinierenden Erscheinung 
 verkörpern Digital IXUS Modelle die hohe Kunst des Canon Design." ;
 bsbm:producer bsbm-inst:Producer001411 ;
 bsbm:productFeature bsbm-inst:ProductFeature003432 ;
 bsbm:productFeature bsbm-inst:ProductFeature103433 ;
 bsbm:productFeature bsbm-inst:ProductFeature990433 ;
 bsbm:productPropertyTextual1 "New this year." ;
 bsbm:productPropertyTextual2 "Special Lens with special focus." ;
 bsbm:productPropertyNumeric1 "1820"^^xsd:Integer ;
 bsbm:productPropertyNumeric2 "140"^^xsd:Integer ;
 bsbm:productPropertyNumeric3 "17"^^xsd:Integer ;
 dc:publisher dataFromProducer001411:Producer001411 ;
 dc:date "2008-02-13"^^xsd:date .

Class ProductType

rdfs:label (literal: String)
rdfs:comment (literal: String)
rdfs:subClassOf (resource: ProductType)

Comment: Product types form an irregular subsumption hierarchy (depth 3-5).

Example RDF Instance:

bsbm-inst:ProductType011432
 rdf:type bsbm:ProductType ;
 rdfs:label "Digital Camera" ;
 rdfs:comment "A camera that records pictures electronically rather than on film." ;
 rdfs:subClassOf bsbm-inst:ProductType011000
 dc:publisher bsbm-inst:StandardizationInstitution01 ;
 dc:date "2008-02-13"^^xsd:date .

Class ProductFeature

rdfs:label (literal: String)
rdfs:comment (literal: String)

Comment: The set of possible product features for a specific product depends on the product type. Each product type in the hierarchy has a set of associated product features, which leads to some features being very generic and others being more specific.

Example RDF Instance:

bsbm-inst:ProductFeature103433
 rdf:type bsbm:ProductFeature ;
 rdfs:label "Wide Screen TFT-Display" ;
 rdfs:comment "Wide Screen TFT-Display." ;
 dc:publisher bsbm-inst:StandardizationInstitution01 ;
 dc:date "2008-02-13"^^xsd:date .

Class Producer

rdfs:label (literal: String)
rdfs:comment (literal: String)
foaf:homepage (URL)
bsbm:country (ISO3166 country URI)

Example RDF Instance:

dataFromProducer001411:Producer001411
 rdf:type bsbm:Producer ;
 rdfs:label "Canon" ;
 rdfs:comment "Canon is a world leader in imaging products and solutions for the digital home and office." ;
 foaf:homepage <http://www.canon.com/>
 bsbm:country <http://downlode.org/rdf/iso-3166/countries#US> ; 
 dc:publisher dataFromProducer001411:Producer001411 ;
 dc:date "2008-02-13"^^xsd:date .

Class Vendor

rdfs:label (literal: String)
rdfs:comment (literal: String)
foaf:homepage (URL)
bsbm:country (ISO3166 country URI)

Example RDF Instance:

dataFromVendor001400:Vendor001400
 rdf:type bsbm:Vendor ;
 rdfs:label "Cheap Camera Place" ;
 rdfs:comment "We sell the cheapest cameras." ;
 foaf:homepage <http://www.cameraplace.com/>
 bsbm:country <http://downlode.org/rdf/iso-3166/countries#GB> ; 
 dc:publisher dataFromVendor001400:Vendor001400 ;
 dc:date "2008-02-03"^^xsd:date .

Class Offer

bsbm:product (resource: Product)
bsbm:vendor (resource: Vendor)
bsbm:price (literal: price with currency data type)
bsbm:validFrom (literal: Date)
bsbm:validTo (literal: Date)
bsbm:deliveryDays (Literal: business days)
bsbm:offerWebpage (URL of vendor's HTML page containing the offer)

Example RDF Instance:

dataFromVendor001400:Offer2413
 rdf:type bsbm:Offer ;
 bsbm:product dataFromProducer001411:Product00001435443 ; 
 bsbm:vendor dataFromVendor001400:Vendor001400 ;
 bsbm:price "31.99"^^bsbm:USD ;
 bsbm:validFrom "2008-02-12"^^xsd:date ;
 bsbm:validTo "2008-02-20"^^xsd:date ; 
 bsbm:deliveryDays "7"^^xsd:Integer ;
 bsbm:offerWebpage <http://vendor001400.com/offers/Offer2413> 
 dc:publisher dataFromVendor001400:Vendor001400 ;
 dc:date "2008-02-13"^^xsd:date .

Class Person

foaf:name (literal: String)
foaf:mbox_sha1sum (literal: email address)
bsbm:country (ISO3166 country URI)

Example RDF Instance:

dataFromRatingSite0014:Reviewer1213
 rdf:type foaf:Person ;
 foaf:name "Jenny324" ; 
 foaf:mbox_sha1sum "4749d7c44dc4c0adf66c1319d42b89e18df6df76" ;
 bsbm:country <http://downlode.org/rdf/iso-3166/countries#DE> ; 
 dc:publisher dataFromRatingSite0014:RatingSite0014 ;
 dc:date "2007-10-13"^^xsd:date .

Class Review

bsbm:reviewFor (resource: Product)
rev:reviewer (resource: foaf:Person)
bsbm:reviewDate (literal: Date datatype)
dc:title (literal: String)
rev:text (literal: String)
bsbm:rating1 (literal: Number ranging from 1 to 10, optional property)
bsbm:rating2 (literal: Number ranging from 1 to 10, optional property)
bsbm:rating3 (literal: Number ranging from 1 to 10, optional property)
bsbm:rating4 (literal: Number ranging from 1 to 10, optional property)

Example RDF Instance:

dataFromRatingSite0014:Review022343
 rdf:type rev:Review ;
 bsbm:reviewFor dataFromProducer001411:Product00001435443 ; 
 rev:reviewer dataFromRatingSite0014:Reviewer1213 ;
 bsbm:reviewDate "2007-10-10"^^xsd:date ; 
 dc:title "This is a nice small camera"@en ;
 rev:text "Open your wallet, take out a credit card. No, I'm not going to ask you to order one just yet ..."@en 
 bsbm:rating1 "5"^^xsd:Integer ;
 bsbm:rating2 "4"^^xsd:Integer ; 
 bsbm:rating3 "3"^^xsd:Integer ;
 bsbm:rating4 "4"^^xsd:Integer ;
 dc:publisher dataFromRatingSite0014:RatingSite0014 ;
 dc:date "2007-10-13"^^xsd:date .

2.2. Triple, Named Graphs and Relational Representation

In order to compare the performance of systems that expose SPARQL endpoints, but use different internal data models, there are three different representations of the benchmark dataset as well as different versions of the benchmark queries.

pure RDF triple representation
Named Graphs representation
relational representation

2.2.1 Triple Representation

Within the triple representation of the dataset, the publisher and the publication data is captured for each instance by a dc:publisher and a dc:date triple.

Examples:

dataFromVendor001400:Offer2413
 rdf:type bsbm:Offer ;
 bsbm:product dataFromProducer001411:Product00001435443 ; 
 bsbm:vendor dataFromVendor001400:Vendor001400 ;
 bsbm:price "31.99"^^bsbm:USD ;
 bsbm:validFrom "2008-02-12"^^xsd:date ;
 bsbm:validTo "2008-02-20"^^xsd:date ; 
 bsbm:deliveryDays "7"^^xsd:Integer ;
 bsbm:offerWebpage <http://vendor001400.com/offers/Offer2413> 
 dc:publisher dataFromVendor001400:Vendor001400 ;
 dc:date "2008-02-13"^^xsd:date .
dataFromVendor001400:Offer2414
 rdf:type bsbm:Offer ;
 bsbm:product dataFromProducer001411:Product00001435444 ; 
 bsbm:vendor dataFromVendor001400:Vendor001400 ;
 bsbm:price "23.99"^^bsbm:USD ;
 bsbm:validFrom "2008-02-10"^^xsd:date ;
 bsbm:validTo "2008-02-22"^^xsd:date ; 
 bsbm:deliveryDays "7"^^xsd:Integer ;
 bsbm:offerWebpage <http://vendor001400.com/offers/Offer2414> 
 dc:publisher dataFromVendor001400:Vendor001400 ;
 dc:date "2008-02-13"^^xsd:date .

2.2.2. Named Graphs Representation

Within the Named Graph version of the dataset, all information that originates from a specific producer, vendor or rating site is put into a distinct named graph. There is one additional graph that contains provenance information (dc:publisher, dc:date) for all other graphs.

Example (using the TriG syntax):

dataFromVendor001400:Graph-2008-02-13 {

 dataFromVendor001400:Offer2413
 rdf:type bsbm:Offer ;
 bsbm:product dataFromProducer001411:Product00001435443 ; 
 bsbm:vendor dataFromVendor001400:Vendor001400 ;
 bsbm:price "31.99"^^bsbm:USD ;
 bsbm:validFrom "2008-02-12"^^xsd:date ;
 bsbm:validTo "2008-02-20"^^xsd:date ; 
 bsbm:deliveryDays "7"^^xsd:Integer ;
 bsbm:offerWebpage <http://vendor001400.com/offers/Offer2413> 

 dataFromVendor001400:Offer2414
 rdf:type bsbm:Offer ;
 bsbm:product dataFromProducer001411:Product00001435444 ; 
 bsbm:vendor dataFromVendor001400:Vendor001400 ;
 bsbm:price "23.99"^^bsbm:USD ;
 bsbm:validFrom "2008-02-10"^^xsd:date ;
 bsbm:validTo "2008-02-22"^^xsd:date ; 
 bsbm:deliveryDays "7"^^xsd:Integer ;
 bsbm:offerWebpage <http://vendor001400.com/offers/Offer2414> 
}

localhost:provenanceData {
 dataFromVendor001400:Graph-2008-02-13 dc:publisher dataFromVendor001400:Vendor001400 ;
 dataFromVendor001400:Graph-2008-02-13 dc:date "2008-02-13"^^xsd:date .
}

2.2.3 Relational Representation

In order to benchmark systems that map relational databases to RDF and rewrite SPARQL queries into SQL queries against an application specific relational data model, the BSBM data generator is also able to output the benchmark dataset as an MySQL dump.

This dump uses the following relational schema:


ProductFeature(nr, label, comment, publisher, publishDate)
ProductType(nr, label, comment, parent, publisher, publishDate)
Producer(nr, label, comment, homepage, country, publisher, publishDate)
Product(nr, label, comment, producer, propertyNum1, propertyNum2, propertyNum3, propertyNum4, propertyNum5, 
 propertyNum6, propertyTex1, propertyTex2, propertyTex3, propertyTex4, propertyTex5, propertyTex6, 
 publisher, publishDate)
ProductTypeProduct(product, productType)
ProductFeatureProduct(product, productFeature)
Vendor(nr, label, comment, homepage, country, publisher, publishDate)
Offer(nr, product, producer, vendor, price, validFrom, validTo, deliveryDays, offerWebpage, publisher, publishDate)
Person(nr, name, mbox_sha1sum, country, publisher, publishDate)
Review(nr, product, producer, person, reviewDate, title, text, language, rating1, rating2, rating3, rating4, 
 publisher, publishDate)

2.3. Scaling and Dataset Population

This section defines the rules for generating benchmark data for a given scale factor.

The benchmark is scaled by the number of products.

The table below gives an overview about the characteristics of BSBM datasets with different scale factors.

Scale Factor	666	2,785	70,812	284,826
Number of RDF Triples	250K	1M	25M	100M
Number of Producers	14	60	1422	5,618
Number of Product Features	2,860	4,745	23,833	47,884
Number of Product Types	55	151	731	2011
Number of Vendors	8	34	722	2,854
Number of Offers	13,320	55,700	1,416,240	5,696,520
Number of Reviewers	339	1432	36,249	146,054
Number of Reviews	6,660	27,850	708,120	2,848,260
Total Number of Instances	23,922	92,757	2,258,129	9,034,027
Exact Total Number of Triples	250,030	1,000,313	25,000,244	100,000,112
File Size Turtle (unzipped)	22 MB	86 MB	2.1 GB	8.5 GB

The BSBM data generator is described in Section 8.

2.3.1 Class: Product

Products have product types and are described with various properties. There are products with several different product property combinations (many properties, less properties).

Rules for data generation:

Label: String of 1-3 words, dictionary 1
Comment: String of 50-150 words, dictionary 2
productPropertyTextualX: Literal of 3-15 words, dictionary 2
productPropertyNumericX: Integer, range 1-2000, values of the normal distribution (mean value: 0, standard deviation: 1) are mapped from the range 0-2 to the range 1-2000.
productFeature: Every Product has about 10 -20 features.
publishDate: randomly chosen from 2000-09-20 to 2006-12-23.

There are three types of product descriptions. The table below lists the textual and numeric properties for each type.

	Textual Properties	Numeric Properties
Description Type 1	PropertyTextual1 to PropertyTextual5	PropertyNumeric1 to PropertyNumeric5	40%
Description Type 2	PropertyTextual1 to PropertyTextual3 + optional PropertyTextual4 (50%) + optional PropertyTextual5 (25%)	PropertyNumeric1 to PropertyNumeric3 + optional PropertyNumeric4 (50%) + optional PropertyNumeric5 (25%)	20%
Description Type 3	PropertyTextual1 to PropertyTextual3 + optional PropertyTextual5 (25%) + optional PropertyTextual6 (50%)	PropertyNumeric1 to PropertyNumeric3 + optional PropertyNumeric5 (25%) + optional PropertyNumeric6 (50%)	40%

Relation: Product-Producer

Every Product has one producer.
One producer is generated for 50 products on average .
The number of products per producer is taken randomly from the normal distribution (mean value: 50, standard deviation: 16.6) from the range 1 - unlimit.

Relation: Product-ProductType

Every Product is in one leaf of the product-type hierarchy.
Products are randomly assigned to the product types (leaf level) whereas the range 0-2 of the normal distribution (mean value: 1, std. deviation: 1) is mapped to the range 1 - number of (leaf) product types.

Relation: Product-ProductFeature

The set of possible product features for a product results from the product type and its superclasses.
Every feature for this set is chosen with a probability of 25%.

Relation: Product-Offer

Products are offered by multiple vendors.
Offers are randomly assigned to products whereas the range 0-4 of the normal distribution (mean value: 2, std. deviation: 1) is mapped to the range 1 - number of products.

Relation: Product-Review

One Product has 10 reviews on average.
Reviews are randomly assigned to products whereas the range 0-4 of the normal distribution (mean value: 2, std. deviation: 1) is mapped to the range 1 - number of products.

2.3.2 Class ProductType

Irregular subsumption hierarchy (depth 2-6). Number of classes increases with the number of products (around 4^{log₁₀#Products}).

The branching factor for every node on the same level is equal and gets calculated for arbitrary scale factors. The table below illustrates the relationship between number of products and branching factors for every level:

	root level	level 1	level 2	level 3	level 4
100 products	4	4
1 000 products	6	8	2
10 000 products	8	8	4
100 000 products	10	8	8	2
1 000 000 products	12	8	8	4
10 000 000 products	14	8	8	8	2
100 000 000 products	16	8	8	8	4

As can be seen the depth increases by one everytime the product count grows by a factor of 100.

Rules for data generation:

Label: String of 1-3 words, dictionary 1
Comment: String of 20-50 words, dictionary 2
publishDate: randomly chosen from 2000-05-20 to 2000-06-23.

2.3.4 Class Product Feature

Each feature is assigned to a product type in the type hierarchy, which leads to some features being very generic and other being more specific.

Rules for data generation:

Label: String of 1-3 words, dictionary 1
Comment: String of 20-50 words, dictionary 2
The distribution of Product Features among the ProductType hierarchy is done like this: In the root product type there are always 5 product features. In the remaining hierarchy bounds for the number of product features are calculatedfor all depth levels , so that nodes nearer to the root have more product features than nodes which lie deeper in the hierarchy. After that, for every node a random value between the upper and lower bound is chosen as the number of product features for that node. Product features for that node are then generated accordingly.
publishDate: randomly chosen from 2000-05-20 to 2000-06-23.

2.3.5 Class Producer

Rules for data generation:

Label: String of 1-3 words, dictionary 1
Comment: String of 20-50 words, dictionary 2
foaf:homepage: URI within the namespace of the producer
country: ISO3166 (US 40%, UK 10%, JP 10%, CN 10%, 5% DE, 5% FR, 5% ES, 5% RU, 5% KR, 5% AT)
publishDate: randomly chosen from 2000-07-20 to 2005-06-23.

Per 1000 products, there are 20 producers generated on average.

2.3.6 Class Vendor

Rules for data generation:

Label: String of 1-3 words, dictionary 1
Comment: String of 20-50 words, dictionary 2
foaf:homepage: URI within the namespace of the vendor
country: ISO3166 (US 40%, UK 10%, JP 10%, CN 10%, 5% DE, 5% FR, 5% ES, 5% RU, 5% KR, 5% AT)
publishDate: randomly chosen from 2000-09-20 to 2006-12-23.

Per 1000 products, there is 0.5 vendor generated on average.

Relation: Vendor-Offer

Every offer belongs to a vendor.
The number of offers per vendor is taken randomly from the Normal distribution (mean value: 2000, standard deviation: 666) from the range 1 - unlimit.

2.3.7 Class Offer

Rules for data generation:

price: random US-$ value between 5 and 10000
validFrom, validTo: date range between 7 and 180 days overlapping with the publication date of the offer.

validFrom ranges from 0-180 days before the publication date.
validTo ranges from 7-180 days after the publication date.
this means that about half of the offers are not valid anymore.

deliveryDays: Integer between 1-21
offerWebpage: URI within the namespace of the producer
publishDate: randomly chosen from (today - 97 days) to today.

Per 1000 products, there are 20000 offers generated.

2.3.8 Class Person

Rules for data generation:

Name: String of 2-4 words, dictionary 3
mbox_sha1sum: random sha1 value
country: ISO3166 (US 40%, UK 10%, JP 10%, CN 10%, 5% DE, 5% FR, 5% ES, 5% RU, 5% KR, 5% AT)
publishDate: randomly chosen from 2008-5-20 to 2008-8-23.

Per 1000 products, there are on average 500 persons generated.

2.3.9 Class Review

Rules for data generation:

Title: String of 4-15 words, dictionary 2
Text: String of 50-200, dictionary 2, lang: (EN 50%, JA 10%, ZH 10%, 5% DE, 5% FR, 5% ES, 5% RU, 5% KR, 5% AT)
Review Date: Random date within the last year
RatingX: Reviews might include up to 4 types of ratings. The likelihood that a review has a rating of type X is 70%. The values of the ratings range from 1 to 10.
publishDate: randomly chosen from Review Date to today (which is set to 2008-06-20).

Per 1000 products, there are 10000 reviews generated.

Relation: Review-Person

Every Review has one author.
The number of reviews per person is taken randomly from the Normal distribution (mean value: 20, standard deviation: 6.6) from the range 1 - unlimit.
On average there is a new person generated every 20 reviews.

Relation: Review-Ratingsite

Every Review belongs to one rating site.
The number of reviews per rating site is taken randomly from the Normal distribution (mean value: 10 000, standard deviation: 333) from the range 1 - unlimit.
On average there is a new rating site generated every 10000 reviews.

Dictionaries

Dictionary 1: Words from set of product names (around 90.000 words)

Dictionary 2: Words from English text (todo: look for a corpus with English sentences, currently dictionary 1 is used)

Dictionary 3: Names of persons (around 90.000 names)

3. Benchmark Queries

This section defines a suite of benchmark queries and a query mix.

The benchmark queries are designed to emulate the search and navigation pattern of a consumer looking for a product. A product search includes the following steps:

Generic search for a given set of generic product properties.
More specific search for products with a given set of product properties.
Find similar products for a given product.
Retrieve detailed information about several specific products.
Retrieve reviews for given products.
Get background information about reviewers.
Retrieve offers for given products.
Check information about vendors and their delivery conditions.
Export the chosen offer into another information system which uses a different schema.

There are three representations of the benchmark query set: One for the Triple and one for the Named Graphs data model as well as a pure SQL version for the relational representation given in section 2.2.4. All query sets have the same semantics.

3.4 Query Mix

There are two variations of the BSBM query mix:

the complete query mix and
the reduced query mix.

Complete Query Mix

The complete query mix consists of 25 queries that simulate a product search by a single consumer. The query sequenze is given below:

Query 1: Find products for a given set of generic features.
Query 2: Retrieve basic information about a specific product for display purposes.
Query 2: Retrieve basic information about a specific product for display purposes.
Query 3: Find products for a given more specific set of features.
Query 2: Retrieve basic information about a specific product for display purposes.
Query 2: Retrieve basic information about a specific product for display purposes.
Query 4: Find products matching two different sets of features.
Query 2: Retrieve basic information about a specific product for display purposes.
Query 2: Retrieve basic information about a specific product for display purposes.
Query 5: Find products that are similar to a given product.
Query 7: Retrieve in-depth information about a specific product including offers and reviews.
Query 7: Retrieve in-depth information about a specific product including offers and reviews.
Query 6: Find products having a label that contains a specific string.
Query 7: Retrieve in-depth information about a specific product including offers and reviews.
Query 7: Retrieve in-depth information about a specific product including offers and reviews.
Query 8: Give me recent German reviews for a specific product.
Query 9: Get information about a reviewer.
Query 9: Get information about a reviewer.
Query 8: Give me recent German reviews for a specific product.
Query 9: Get information about a reviewer.
Query 9: Get information about a reviewer .
Query 10: Get offers for a given product which fulfill specific requirements.
Query 10: Get offers for a given product which fulfill specific requirements.
Query 11: Get all information about an offer.
Query 12: Export information about an offer into another schemata.

Reduced Query Mix

The reduced query mix consists of the same sequence as the complete query mix but without:

10. Query 5: Find products that are similar to a given product.
13. Query 6: Find products having a label that contains a specific string.

Query 5 and 6 are excluded from the reduced query mix, as they consume together about 80% of the overall execution time on current stores when executed against larger datasets. As of 2008, we therefore recomment to report performance figures for the reduced query mix in addition to figures for the complete query mix, if the benchmark is run against datasets above 25 million triples.

3.2 SPARQL Queries for the Triple Data Model

Each query is defined by the following components:

The motivation for the query within the e-commerce use case, which illustrates the context in which the query could be used;
A functional query definition, which specifies the function to be performed by the query;
The substitution parameters, that are needed to complete the query;
Information about the characteristics of the query.

Query 1: Find products for a given set of generic features.

Use Case Motivation: A consumer is looking for a product and has a general idea about what he wants.

SPARQL Query:


PREFIX bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT DISTINCT ?product ?label
WHERE { 
 ?product rdfs:label ?label .
 ?product a %ProductType% .
 ?product bsbm:productFeature %ProductFeature1% . 
 ?product bsbm:productFeature %ProductFeature2% . 
?product bsbm:productPropertyNumeric1 ?value1 . 
	FILTER (?value1 > %x%) 
	}
ORDER BY ?label
LIMIT 10

Parameters:

Parameter	Description
%ProductType%	A randomly selected Class URI from the class hierarchy (one level above leaf level).
%ProductFeature1% %ProductFeature2%	Two different, randomly selected feature URIs that correspond to the chosen product type.
%x%	A number between 1 and 500

Query Properties:

touches subsumption hierarchy on product types.
touches a large amount of data.
uses ORDER BY and LIMIT.

Query 2: Retrieve basic information about a specific product for display purposes

Use Case Motivation: The consumer wants to view basic information about products found by query 1.

SPARQL Query

PREFIX bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

SELECT ?label ?comment ?producer ?productFeature ?propertyTextual1 ?propertyTextual2 ?propertyTextual3
 ?propertyNumeric1 ?propertyNumeric2 ?propertyTextual4 ?propertyTextual5 ?propertyNumeric4 
WHERE {
 %ProductXYZ% rdfs:label ?label .
	%ProductXYZ% rdfs:comment ?comment .
	%ProductXYZ% bsbm:producer ?p .
	?p rdfs:label ?producer .
 %ProductXYZ% dc:publisher ?p . 
	%ProductXYZ% bsbm:productFeature ?f .
	?f rdfs:label ?productFeature .
	%ProductXYZ% bsbm:productPropertyTextual1 ?propertyTextual1 .
	%ProductXYZ% bsbm:productPropertyTextual2 ?propertyTextual2 .
 %ProductXYZ% bsbm:productPropertyTextual3 ?propertyTextual3 .
	%ProductXYZ% bsbm:productPropertyNumeric1 ?propertyNumeric1 .
	%ProductXYZ% bsbm:productPropertyNumeric2 ?propertyNumeric2 .
	OPTIONAL { %ProductXYZ% bsbm:productPropertyTextual4 ?propertyTextual4 }
 OPTIONAL { %ProductXYZ% bsbm:productPropertyTextual5 ?propertyTextual5 }
 OPTIONAL { %ProductXYZ% bsbm:productPropertyNumeric4 ?propertyNumeric4 }
}

Parameters:

Parameter	Description
%ProductXYZ%	A product URI (randomly selected)

Query Properties:

Query touches only a small amount of data (single product).
Larger set of triple patterns
Query uses OPTIONAL

Query 3: Find products having some specific features and not having one feature.

Use Case Motivation: After looking at information about some products, the consumer has a more specific idea what we wants. Therefore, he asks for products having several features but not having a specific other feature.

SPARQL Query:

PREFIX bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ?product ?label
WHERE {
 ?product rdfs:label ?label .
 ?product a %ProductType% .
	?product bsbm:productFeature %ProductFeature1% .
	?product bsbm:productPropertyNumeric1 ?p1 .
	FILTER ( ?p1 > %x% ) 
	?product bsbm:productPropertyNumeric3 ?p3 .
	FILTER (?p3 < %y% )
 OPTIONAL { 
 ?product bsbm:productFeature %ProductFeature2% .
 ?product rdfs:label ?testVar }
 FILTER (!bound(?testVar)) 
}
ORDER BY ?label
LIMIT 10

Parameters:

Parameter	Description
%ProductType%	A randomly selected Class URI from the class hierarchy (leaf level).
%ProductFeature1% %ProductFeature2%	Three different, randomly selected product feature URI that correspond to the chosen product type.
%x% %y%	Two random numbers between 1 and 500

Query Properties:

Uses negation
Uses Order BY and LIMIT

Query 4: Find products matching two different sets of features.

Use Case Motivation: After looking at information about some products, the consumer has a more specific idea what we wants. Therefore, he asks for products matching either one set of features or another set.

SPARQL Query:

PREFIX bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT DISTINCT ?product ?label ?propertyTextual
WHERE {
 { 
 ?product rdfs:label ?label .
 ?product rdf:type %ProductType% .
 ?product bsbm:productFeature %ProductFeature1% .
	?product bsbm:productFeature %ProductFeature2% .
 ?product bsbm:productPropertyTextual1 ?propertyTextual .
	?product bsbm:productPropertyNumeric1 ?p1 .
	FILTER ( ?p1 > %x% )
 } UNION {
 ?product rdfs:label ?label .
 ?product rdf:type %ProductType% .
 ?product bsbm:productFeature %ProductFeature1% .
	?product bsbm:productFeature %ProductFeature3% .
 ?product bsbm:productPropertyTextual1 ?propertyTextual .
	?product bsbm:productPropertyNumeric2 ?p2 .
	FILTER ( ?p2> %y% ) 
 } 
}
ORDER BY ?label
OFFSET 5
LIMIT 10

Parameters:

Parameter	Description
%ProductType%	A randomly selected Class URI from the class hierarchy (leaf level).
%ProductFeature1% %ProductFeature2% %ProductFeature3%	Three different, randomly selected product feature URI that correspond to the chosen product type.
%x% %y%	Two random numbers between 1 and 500

Query Properties:

Uses UNION
Uses ORDER BY, LIMIT and OFFSET

Query 5: Find product that are similar to a given product.

Use Case Motivation: The consumer has found a product that fulfills his requirements. He now wants to find products with similar features.

SPARQL Query:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>

SELECT DISTINCT ?product ?productLabel
WHERE { 
	?product rdfs:label ?productLabel .
 FILTER (%ProductXYZ% != ?product)
	%ProductXYZ% bsbm:productFeature ?prodFeature .
	?product bsbm:productFeature ?prodFeature .
	%ProductXYZ% bsbm:productPropertyNumeric1 ?origProperty1 .
	?product bsbm:productPropertyNumeric1 ?simProperty1 .
	FILTER (?simProperty1 < (?origProperty1 + 120) && ?simProperty1 > (?origProperty1 – 120))
	%ProductXYZ% bsbm:productPropertyNumeric2 ?origProperty2 .
	?product bsbm:productPropertyNumeric2 ?simProperty2 .
	FILTER (?simProperty2 < (?origProperty2 + 170) && ?simProperty2 > (?origProperty2 – 170))
}
ORDER BY ?productLabel
LIMIT 5

Parameters:

Parameter	Description
%ProductXYZ%	A product URI (randomly selected)

Query Properties:

touches a lot of data
uses more complex FILTER clauses
uses LIMIT

Query 6: Find products having a label that contains a specific string.

Use Case Motivation: The consumer remembers parts of a product name from former searches. He wants to find the product again by searching for the parts of the name that he remembers.

SPARQL Query:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>

SELECT ?product ?label
WHERE {
	?product rdfs:label ?label .
 ?product rdf:type bsbm:Product .
	FILTER regex(?label, "%word1%")
}

Parameters:

Parameter	Description
%word1%	A word from the list of words that were used in the dataset generation.

Query Properties:

uses REGEX

Query 7: Retrieve in-depth information about a specific product including offers and reviews.

Use Case Motivation: The consumer has found a products which fulfills his requirements. Now he wants in-depth information about this product including offers from German vendors and product reviews if existent.

SPARQL Query:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rev: <http://purl.org/stuff/rev#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

SELECT ?productLabel ?offer ?price ?vendor ?vendorTitle ?review ?revTitle 
 ?reviewer ?revName ?rating1 ?rating2
WHERE { 
	%ProductXYZ% rdfs:label ?productLabel .
 OPTIONAL {
 ?offer bsbm:product %ProductXYZ% .
	?offer bsbm:price ?price .
	?offer bsbm:vendor ?vendor .
	?vendor rdfs:label ?vendorTitle .
 ?vendor bsbm:country <http://downlode.org/rdf/iso-3166/countries#DE> .
 ?offer dc:publisher ?vendor . 
 ?offer bsbm:validTo ?date .
 FILTER (?date > %currentDate% )
 }
 OPTIONAL {
	?review bsbm:reviewFor %ProductXYZ% .
	?review rev:reviewer ?reviewer .
	?reviewer foaf:name ?revName .
	?review dc:title ?revTitle .
 OPTIONAL { ?review bsbm:rating1 ?rating1 . }
 OPTIONAL { ?review bsbm:rating2 ?rating2 . } 
 }
}

Parameters:

Parameter	Description
%ProductXYZ%	A product URI (randomly selected)
%currentDate%	A date within the validFrom validTo range of the offers (same date for all queries within a run).

Query Properties:

Touches a lot of data including products, offers, vendors, reviews and reviewers.
Uses OPTIONAL

Query 8: Give me recent reviews in English for a specific product.

Use Case Motivation: The consumer wants to read the 20 most recent English language reviews about a specific product.

SPARQL Query:

PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX rev: <http://purl.org/stuff/rev#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?title ?text ?reviewDate ?reviewer ?reviewerName ?rating1 ?rating2 ?rating3 ?rating4 
WHERE { 
	?review bsbm:reviewFor %ProductXYZ% .
	?review dc:title ?title .
	?review rev:text ?text .
	FILTER langMatches( lang(?text), "EN" ) 
	?review bsbm:reviewDate ?reviewDate .
	?review rev:reviewer ?reviewer .
	?reviewer foaf:name ?reviewerName .
	OPTIONAL { ?review bsbm:rating1 ?rating1 . }
	OPTIONAL { ?review bsbm:rating2 ?rating2 . }
	OPTIONAL { ?review bsbm:rating3 ?rating3 . }
	OPTIONAL { ?review bsbm:rating4 ?rating4 . }
}
ORDER BY DESC(?reviewDate)
LIMIT 20

Parameters:

Parameter	Description
%ProductXYZ%	A product URI (randomly selected)

Query Properties:

uses langMatches() filter
uses ORDER BY
uses LIMIT

Query 9: Get information about a reviewer.

Use Case Motivation: In order to decide whether to trust a review, the consumer asks for any kind of information that is available about the reviewer.

SPARQL Query:

PREFIX rev: <http://purl.org/stuff/rev#>

DESCRIBE ?x
WHERE { %ReviewXYZ% rev:reviewer ?x }

Parameters:

Parameter	Description
%ReviewXYZ%	A review URI (randomly selected)

Query Properties:

use DESCRIBE

Query 10: Get offers for a given product which fulfill specific requirements.

Use Case Motivation: The consumer wants to buy from a vendor in the United States that is able to deliver within 3 days and is looking for the cheapest offer that fulfills these requirements.

SPARQL Query:

PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
PREFIX dc: <http://purl.org/dc/elements/1.1/>

SELECT DISTINCT ?offer ?price
WHERE {
	?offer bsbm:product %ProductXYZ% .
	?offer bsbm:vendor ?vendor .
 ?offer dc:publisher ?vendor .
	?vendor bsbm:country <http://downlode.org/rdf/iso-3166/countries#US> .
	?offer bsbm:deliveryDays ?deliveryDays .
	FILTER (?deliveryDays <= 3)
	?offer bsbm:price ?price .
 ?offer bsbm:validTo ?date .
 FILTER (?date > %currentDate% )
}
ORDER BY xsd:double(str(?price))
LIMIT 10

Parameters:

Parameter	Description
%ProductXYZ%	A product URI (randomly selected)
%currentDate%	A date within the validFrom-validTo range of the offers (same date for all queries within a run).

Query Properties:

uses DISTINCT
uses ORDER BY and LIMIT

Query 11: Get all information about an offer.

Use Case Motivation: After deciding on a specific offer, the consumer wants to get all information that is directly related to this offer.

SPARQL Query:

SELECT ?property ?hasValue ?isValueOf
WHERE {
 { %OfferXYZ% ?property ?hasValue }
 UNION
 { ?isValueOf ?property %OfferXYZ% }
}

Parameters:

Parameter	Description
%OfferXYZ%	An offer URI (randomly selected)

Query Properties:

query contains an unbound predicate
uses UNION

Query 12: Export information about an offer into another schemata.

Use Case Motivation: After deciding on a specific offer, the consumer wants to save information about this offer on his local machine using a different RDF schema.

SPARQL Query:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rev: <http://purl.org/stuff/rev#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
PREFIX bsbm-export: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/export/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

CONSTRUCT { %OfferXYZ% bsbm-export:product ?productURI .
 %OfferXYZ% bsbm-export:productlabel ?productlabel .
 %OfferXYZ% bsbm-export:vendor ?vendorname .
 %OfferXYZ% bsbm-export:vendorhomepage ?vendorhomepage . 
 %OfferXYZ% bsbm-export:offerURL ?offerURL .
 %OfferXYZ% bsbm-export:price ?price .
 %OfferXYZ% bsbm-export:deliveryDays ?deliveryDays .
 %OfferXYZ% bsbm-export:validuntil ?validTo } 
WHERE { %OfferXYZ% bsbm:product ?productURI .
 ?productURI rdfs:label ?productlabel .
 %OfferXYZ% bsbm:vendor ?vendorURI .
 ?vendorURI rdfs:label ?vendorname .
 ?vendorURI foaf:homepage ?vendorhomepage .
 %OfferXYZ% bsbm:offerWebpage ?offerURL .
 %OfferXYZ% bsbm:price ?price .
 %OfferXYZ% bsbm:deliveryDays ?deliveryDays .
 %OfferXYZ% bsbm:validTo ?validTo }

Parameters:

Parameter	Description
%OfferXYZ%	An offer URI (randomly selected)

Query Properties:

Touches a lot of data including products, offers, vendors, reviews and reviewers.
Uses OPTIONAL

3.3 SPARQL Queries for the Named Graph Data Model

The queries for the Named Graphs data model have the same semantics as the queries for the triple data model. The queries do not specify the IRIs of the named graphs in the RDF Dataset using the FROM NAMED clause, but assume that the query is executed against the complete RDF Dataset.

This is still work in progress ...
Todo: Rewrite all queries for Named Graphs. Two examples are already found below:

Query 2: Retrieve basic information about a specific product for display purposes

SPARQL Query

PREFIX bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?label ?comment ?producer ?productFeature ?propertyTextual1 ?propertyTextual2 
 ?propertyNumeric1 ?propertyNumeric2 ?propertyTextual4 ?propertyTextual5 ?propertyNumeric4 
WHERE {
 GRAPH ?graph {
 %ProductXYZ% rdfs:label ?label .
	%ProductXYZ% rdfs:comment ?comment .
	%ProductXYZ% bsbm:producer ?p .
 ?p rdfs:label ?producer .
	%ProductXYZ% bsbm:productFeature ?f .
	?f rdfs:label ?productFeature .
	%ProductXYZ% bsbm:productPropertyTextual1 ?propertyTextual1 .
	%ProductXYZ% bsbm:productPropertyTextual2 ?propertyTextual2 .
	%ProductXYZ% bsbm:productPropertyNumeric1 ?propertyNumeric1 .
	%ProductXYZ% bsbm:productPropertyNumeric2 ?propertyNumeric2 .
	OPTIONAL { %ProductXYZ% bsbm:productPropertyTextual4 ?propertyTextual4 }
 OPTIONAL { %ProductXYZ% bsbm:productPropertyTextual5 ?propertyTextual5 }
 OPTIONAL { %ProductXYZ% bsbm:productPropertyNumeric4 ?propertyNumeric4 }
 }
 GRAPH localhost:provenanceData {
 ?graph dc:publisher ?p .
 }
}

Query 7: Retrieve in-depth information about a specific product including offers and reviews.

SPARQL Query:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rev: <http://purl.org/stuff/rev#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

SELECT ?productLabel ?offer ?price ?vendor ?vendorTitle ?review ?revTitle 
 ?reviewer ?revName ?rating1 ?rating2
WHERE { 
 GRAPH ?producerGraph {
	%ProductXYZ% rdfs:label ?productLabel .
 }
 OPTIONAL {
 GRAPH ?vendorGraph {
 ?offer bsbm:product %ProductXYZ% .
	?offer bsbm:price ?price .
	?offer bsbm:vendor ?vendor .
	?vendor rdfs:label ?vendorTitle .
 ?offer bsbm:validTo ?date .
 FILTER (?date > %currentDate% ) 
 }
 }
 OPTIONAL {
 GRAPH ?ratingSiteGraph {
	?review bsbm:reviewFor %ProductXYZ% .
	?review rev:reviewer ?reviewer .
	?reviewer foaf:name ?revName .
	?review dc:title ?revTitle .
 OPTIONAL { ?review bsbm:rating1 ?rating1 . }
 OPTIONAL { ?review bsbm:rating2 ?rating2 . }
 } 
 }
 GRAPH localhost:provenanceData {
 ?vendorGraph dc:publisher ?vendor .
 }
}

3.4 SQL Queries for the Relational Data Model

This section will contain a SQL representation of the benchmark queries in order to be able to compare the performance of stores that expose SPARQL endpoints to the performance of classic SQL-based RDBMS. Since there is no exact counterpart to some SPARQL specific query forms like DESCRIBE, the SQL queries aren't semantically completely equivalent.

Query 1: Find products for a given set of generic features.

Use Case Motivation: A consumer is looking for a product and has a general idea about what he wants.

SQL Query:

SELECT distinct nr, label
FROM product p, producttypeproduct ptp
WHERE p.nr = ptp.product AND ptp.productType=@ProductType@
	AND propertyNum1 > @x@
	AND p.nr IN (SELECT distinct product FROM productfeatureproduct WHERE productFeature=@ProductFeature1@)
	AND p.nr IN (SELECT distinct product FROM productfeatureproduct WHERE productFeature=@ProductFeature2@)
ORDER BY label
LIMIT 10;

Parameters:

Parameter	Description
@ProductType@	A randomly selected Class ID from the class hierarchy (one level above leaf level).
@ProductFeature1@ @ProductFeature2@	Two different, randomly selected feature IDs that correspond to the chosen product type.
@x@	A number between 1 and 500

Query 2: Retrieve basic information about a specific product for display purposes

Use Case Motivation: The consumer wants to view basic information about products found by query 1.

SQL Query

SELECT pt.label, pt.comment, pt.producer, productFeature, propertyTex1, propertyTex2, propertyTex3, 
	propertyNum1, propertyNum2, propertyTex4, propertyTex5, propertyNum4
FROM product pt, producer pr, productfeatureproduct pfp
WHERE pt.nr=@ProductXYZ@ AND pt.nr=pfp.product AND pt.producer=pr.nr;

Parameters:

Parameter	Description
@ProductXYZ@	A product ID (randomly selected)

Query 3: Find products having some specific features and not having one feature.

SQL Query:

SELECT p.nr, p.label
FROM product p, producttypeproduct ptp
WHERE p.nr=ptp.product
	AND productType=@ProductType@
	AND propertyNum1>@x@
	AND propertyNum3<@y@
	AND @ProductFeature1@ IN (SELECT productFeature FROM productfeatureproduct WHERE product=p.nr)
	AND @ProductFeature2@ NOT IN (SELECT productFeature FROM productfeatureproduct WHERE product=p.nr)
ORDER BY p.label
LIMIT 10;

Parameters:

Parameter	Description
@ProductType@	A randomly selected Class ID from the class hierarchy (leaf level).
@ProductFeature1@ @ProductFeature2@	Three different, randomly selected product feature ID that correspond to the chosen product type.
@x@ @y@	Two random numbers between 1 and 500

Query 4: Find products matching two different sets of features.

SQL Query:

SELECT distinct p.nr, p.label, p.propertyTex1
FROM product p, producttypeproduct ptp
WHERE p.nr=ptp.product AND ptp.productType=@ProductType@
	AND p.nr IN (SELECT distinct product FROM productfeatureproduct WHERE productFeature=@ProductFeature1@)
	AND ((propertyNum1>@x@ AND p.nr IN (SELECT distinct product FROM productfeatureproduct WHERE productFeature=@ProductFeature2@)
) OR (propertyNum2>@y@ AND p.nr IN (SELECT distinct product FROM productfeatureproduct WHERE productFeature=@ProductFeature3@)))
ORDER BY label
LIMIT 10
OFFSET 5;

Parameters:

Parameter	Description
@ProductType@	A randomly selected Class ID from the class hierarchy (leaf level).
@ProductFeature1@ @ProductFeature2@ @ProductFeature3@	Three different, randomly selected product feature IDs that correspond to the chosen product type.
@x@ @y@	Two random numbers between 1 and 500

Query 5: Find product that are similar to a given product.

Use Case Motivation: The consumer has found a product that fulfills his requirements. He now wants to find products with similar features.

SQL Query:

SELECT distinct p.nr, p.label
FROM product p, product po,
 (Select distinct pfp1.product FROM productfeatureproduct pfp1, (SELECT productFeature FROM productfeatureproduct WHERE product=@ProductXYZ@) pfp2 WHERE pfp2.productFeature=pfp1.productFeature) pfp
	WHERE p.nr=pfp.product AND po.nr=@ProductXYZ@ AND p.nr!=po.nr
	AND p.propertyNum1<(po.propertyNum1+120) AND p.propertyNum1>(po.propertyNum1-120)
	AND p.propertyNum2<(po.propertyNum2+170) AND p.propertyNum2>(po.propertyNum2-170)
ORDER BY label
LIMIT 5;

Parameters:

Parameter	Description
@ProductXYZ@	A product ID (randomly selected)

Query 6: Find products having a label that contains a specific string.

Use Case Motivation: The consumer remembers parts of a product name from former searches. He wants to find the product again by searching for the parts of the name that he remembers.

SQL Query:

SELECT nr, label
FROM product
WHERE label like "%@word1@%";

Parameters:

Parameter	Description
@word1@	A word from the list of words that were used in the dataset generation.

Query 7: Retrieve in-depth information about a specific product including offers and reviews.

SQL Query:

SELECT *
FROM (select label from product where nr=@ProductXYZ@) p left join 
((select o.nr as onr, o.price, v.nr as vnr, v.label from offer o, vendor v where @ProductXYZ@=o.product AND
 o.vendor=v.nr AND v.country='DE' AND o.validTo>'@currentDate@') ov right join
 (select r.nr as rnr, r.title, pn.nr as pnnr, pn.name, r.rating1, r.rating2 from review r, person pn where r.product=@ProductXYZ@ AND
 r.person=pn.nr) rpn on (1=1)) on (1=1);

Parameters:

Parameter	Description
@ProductXYZ@	A product ID (randomly selected)
@currentDate@	A date within the validFrom validTo range of the offers (same date for all queries within a run).

Query 8: Give me recent reviews in English for a specific product.

Use Case Motivation: The consumer wants to read the 20 most recent English language reviews about a specific product.

SQL Query:

SELECT r.title, r.text, r.reviewDate, p.nr, p.name, r.rating1, r.rating2, r.rating3, r.rating4
FROM review r, person p
WHERE r.product=@ProductXYZ@ AND r.person=p.nr
	AND r.language='en'
ORDER BY r.reviewDate desc
LIMIT 20;

Parameters:

Parameter	Description
@ProductXYZ@	A product ID (randomly selected)

Query 9: Get information about a reviewer.

Use Case Motivation: In order to decide whether to trust a review, the consumer asks for any kind of information that is available about the reviewer.

SQL Query:

SELECT p.nr, p.name, p.mbox_sha1sum, p.country, r2.nr, r2.product, r2.title
FROM review r, person p, review r2
WHERE r.nr=@ReviewXYZ@ AND r.person=p.nr AND r2.person=p.nr;

Parameters:

Parameter	Description
@ReviewXYZ@	A review ID (randomly selected)

Query 10: Get offers for a given product which fulfill specific requirements.

Use Case Motivation: The consumer wants to buy from a vendor in the United States that is able to deliver within 3 days and is looking for the cheapest offer that fulfills these requirements.

SQL Query:

SELECT distinct o.nr, o.price
FROM offer o, vendor v
WHERE o.product=@ProductXYZ@ 
	AND o.deliveryDays<=3 AND v.country='US'
	AND o.validTo>'@currentDate@' AND o.vendor=v.nr
Order BY o.price
LIMIT 10;

Parameters:

Parameter	Description
@ProductXYZ@	A product ID (randomly selected)
@currentDate@	A date within the validFrom-validTo range of the offers (same date for all queries within a run).

Query 11: Get all information about an offer.

Use Case Motivation: After deciding on a specific offer, the consumer wants to get all information that is directly related to this offer.

SQL Query:

Select product, producer, vendor, price, validFrom, validTo, deliveryDays, offerWebpage, publisher, publishDate
from offer
where nr=@OfferXYZ@;

Parameters:

Parameter	Description
@OfferXYZ@	An offer ID (randomly selected)

Query 12: Export information about an offer into another schemata.

Use Case Motivation: After deciding on a specific offer, the consumer wants to save information about this offer on his local machine using a different RDF schema.

SQL Query:

Select p.nr As productNr, p.label As productlabel, v.label As vendorname, v.homepage As vendorhomepage,
	o.offerWebpage As offerURL, o.price As price, o.deliveryDays As deliveryDays, o.validTo As validTo
From offer o, product p, vendor v
Where o.nr=@OfferXYZ@ AND o.product=p.nr AND o.vendor=v.nr;

Parameters:

Parameter	Description
@OfferXYZ@	An offer ID (randomly selected)

4. Performance Metrics

The three fundamental performance metrics of the BSBM are:

Queries per Second (QpS)
Query Mixes per Hour (QMpH)
Overall Runtime (oaRT)

4.1.1 Metrics for Single Queries

Average Query Execution Time (aQET): Average time for executing an individual query of type x multiple times with different parameters against the SUT.

Queries per Second (QpS): Average amount of queries of type x that were executed per second.

Min/Max Query Execution Time (minQET, maxQET): A lower and upper bound execution time for queries of type x.

4.1.2 Metrics for Query Mixes

Queries Mixes per Hour (QMpH): Number of query mixes with different parameters that are executed per hour against the SUT.

Overall Runtime (oaRT): Overall time it took the test driver to execute a certain amount of query mixes against the SUT.

Composite Query Execution Time (cQET): Average time for executing the query mix multiple times with different parameters.

Average Query Execution Time over all Queries (aQEToA): Overall time to run 50 query mixes devided by the number of queries (25*50=1250).

4.1.3 Price/Performance Metric for the Complete System under Test (SUT)

The Price/Performance Metric defined as $ / QMpH.

Where $ is the total system cost over 5 years in the specified currency. The total system cost over 5 years is calculated according to the TPC Pricing Specification. If compute on demand infrastructure is used, the costing will be $/QMpH/day.

5. Rules for running the Benchmark and reporting Results

When running the BSBM benchmark and reporting BSBM benchmark results, you should obey to the following rules:

The benchmark must run on standard hardware and should run released software, in order to make it possible to replicate results.
If the benchmark is run against pre-release versions of a store, than these versions must be publicly accessible, for instance via SVN or CVS.
Any hardware/software combination, including single machines, clusters, clusters rented from computer providers like Amazon EC2 are eligible. The exact hardware/software combination and setup must be disclosed.
The configuration of a SUT is left to the test sponsor, meaning any kind of indexing, use of property tables, materialized views, and so forth is allowed. Any special tuning or parameter-setting must be disclosed.
The benchmark dataset must be generated using the data generator described in section 9.1. The test sponsor is not allowed to change the data generator.
The test runs must be executed using the test driver described in section 9.2. The test sponsor is not allowed to change the test driver.
The results of the benchmark should be reported as described in section 7 of this specification.

6. Benchmark Scenarios

The Berlin SPARQL Benchmark is designed to support the comparison of native RDF stores, native Named Graph stores, systems that map relational databases into RDF, and SPARQL-wrappers around other kinds of data sources.

There are the following benchmark scenarios:

Single native RDF repository using the triple data model.
Single native RDF repository using the named graph data model.
Other type of data source that exposes triple-based SPARQL endpoint
Other type of data source that exposes named graph-based SPARQL endpoint

6.1 Single native RDF repository using the triple data model.

Scenario for benchmarking native triple stores.

Acronym : NTR = Native Triple Repository

Setup: One or more clients ask SPARQL queries over the SPARQL protocol against a single store which exposes a triple data model.

Use Case Motivation: An e-shop selling different products provides a SPARQL endpoint over its data.

6.2 Single native RDF repository using the named graph data model.

Scenario for benchmarking native Named Graphs stores.

Acronym: NNGR = Native Named Graphs Repository

Setup: One or more clients ask SPARQL queries over the SPARQL protocol against a single store which exposes a Named Graphs data model.

Use Case Motivation: An electronic market which integrates information from different producers, vendors and rating sites provides a SPARQL endpoint over its data.

6.3 Other type of data source which exposes triple-based SPARQL endpoint

Scenario for benchmarking other types of data sources that expose triple-based SPARQL endpoints. For instance, RDF-mapped relational databases, wrappers around WebAPIs and the like.

Acronym: MTS = Mapped Triple Source

Setup: One or more clients ask SPARQL queries over the SPARQL protocol against a store which exposes a triple data model.

Use Case Motivation: An e-shop selling different products provides a SPARQL endpoint over its data.

Benchmark Dataset:

The data generator can export data in a generic XML format and as MySQL dump, which can be imported into different stores, for instance relational data bases.

6.4 Other type of data source which exposes named graph-based SPARQL endpoint

Scenario for benchmarking other types of data sources that expose named graph-based SPARQL endpoints. For instance, RDF-mapped relational databases, wrappers around WebAPIs or the like.

Acronym: MNGS = Mapped Named Graph Source

Setup: One or more clients ask SPARQL queries over the SPARQL protocol against a store which exposes a named graph-based data model.

Use Case Motivation: An electronic market which integrates information from different producers, vendors and rating sites provides a SPARQL endpoint over its data.

Benchmark Dataset:

The data generator can export data in a generic XML format and as MySQL dump, which can be imported into different stores, for instance relational data bases.

7. Reporting Results

This section defines formats for reporting benchmark results.

7.1 Reporting Single Results

Benchmark results are named according to the scenario, the scale factor of the dataset and the number of concurrent clients.
For example:

NTP(1000,5) means

benchmark against a native triple store
containing data about 1000 products
queries have been concurrently executed by 5 clients

23.7 QPS(2)-NTS(10000,1)

means that on average 23.7 queries of type 2 were executed per second by a single client stream against a Native Triple Store containing data about 10,000 products.

7.2 Full Disclosure Report

To guarantee an intelligible interpretation of benchmark reports/results as well as to allow for efficient and even automated handling/comparisons, all necessary information shall be represented in XML. Furthermore we opt for full disclosure policy of the SUT, configuration, pricing etc. to give all information needed for replicating any detail of the system, thus enabling anyone to achieve similar benchmark results.

Full Disclosure Report Contents

A statement identifying the benchmark sponsor and other companies and organizations involved.
Hardware:
1. Overall number of nodes
2. CPUs, cores with associated type description (brand of CPU, GHz per core, size of second level cache etc.)
3. Size of allocated memory
4. Number and type of harddisk storage units, disk controller type and if applied, the RAID configuration;
5. Number of LAN connections and all network hardware included in the SUT that was used during the test (network adapters, routers, workstations, cabling etc.).
Software:
1. Type and run-time execution location of software components
  (eg. RDF-Store, DBMS, drivers, middleware components, query processing software etc.).
2. Query test results of the Qualification Dataset;
Configuration:
1. RDF-Store and DBMS (if DB-backed): All configuration settings that were changed from the defaults
2. Operating system: All configuration settings that were changed from the defaults;
3. Amount of main memory assigned to the RDF-Store and DBMS (if DB-backed);
4. Configuration of any other software component participating in the SUT;
5. Query optimizer: All configuration settings that were changed from the defaults
6. For any recompiled software the compiler optimization options
Database design:
1. Physical organisation of the working set, e.g. different types of clustering;
2. Details about all indices;
3. Data replication for performance or other reasons;
Data generator and Test Driver related items:
1. Version of both software components used in the benchmark;
Scaling and dataset population:
1. Scaling information as defined in section 4;
2. Partitioning of triple set among different disk drives;
3. Database load time for the benchmark dataset;
4. Number of triples after initial load;
5. Number of triples after the test run;
6. Warm up time for the working set needed to enter a stable performance state.
Performance metrics:
1. The timing interval for every query must be reported;
2. The number of concurrent streams for the test run;
3. Start and finish time for each query stream;
4. The total time for the whole measurement period;
5. All metrics.
SUT implementation related items:
1. Optionally: If an RDF store is built on top of a DBMS, the CPU usage between these two can be reported;
2. Server side application logic (if not preset, discuss).

Todo: Define an XML format for the Full Disclosure Report
Todo: Implement a nice tool which generates HTML reports including nice graphics from XML benchmark results in order to motivate people to use the reporting format.

8. Qualification Dataset and Tests

Before the performance of a SUT is measured, it has to be verified that the SUT returns correct results for the benchmark queries.

For testing whether a SUT returns correct results, the BSBM benchmark provides a qualification dataset and a qualification tool which compares the query results of a SUT with the correct query results. At the moment, the qualification tool verifies only the results of SELECT queries. The results of DESCRIBE and CONSTRICT queries (queries 9 and 12) are not checked.

A BSBM qualification test is conducted in the two-step procedure described below:

Load the qualification dataset into the SUT, fetch the Test Driver data for the qualification dataset and unzip the "td_data" directory into the root directory of the BSBM tools, then run a qualification test against the SUT. The qualification test is supported by the Test Driver by running it with the -q parameter. Example:
```
$ java -cp bin:lib/* benchmark.testdriver.TestDriver -q http://SUT/sparql

where http://SUT/sparql specifies the SPARQL endpoint
```
This will create a qualification file named "run.qual" (different file name can be specified with the "-qf" parameter) which is used in step 2. Also the run.log (if logging is set to "ALL" in the log4j.xml file) contains all queries with full result text, so single queries can be examined later on.
Use the Qualification tool from the benchmark.qualification package. It has the following options:

Option	Description
-rc	Only check the amount of results returned and not the result content.
-ql <qualification log file name>	Specify the file name to write the qualification test results into.

You also need a correct qualification file to test your own qualification file against. Then run the Qualification test. Example:

$ java -cp bin:lib/* benchmark.qualification.Qualification correct.qual run.qual

where run.qual is the qualification file generated by the Test Driver in qualification mode

This generates by default a log file called "qual.log" with the following content:

All variations of every single query are logged at the beginning, if any.
At the end of the file is an overview of all queries.

A variation does not always mean an error. For example if the SUT returns an xsd:dateTime value in another (correct) format than expected. So variations should always be inspected in the verbose log file which was generated by the Test Driver (run.log). For comparison means a Test Driver log file is available of a correct run.

9. Data Generator and Test Driver

There is a Java (at least JVM 1.5 needed) implementation of a data generator and a test driver for the BSBM benchmark.

The source code of the data generator and the test driver can be downloaded from Sourceforge BSBM tools.
The code is licensed under the terms of the GNU General Public License.

9.1 Data Generator

The BSBM data generator can be used to create benchmark datasets of different sizes. Data generation is deterministic.

The data generator supports the following output formats:

Format	Option
N-Triples	-s nt
Turtle	-s ttl
XML	-s xml
(My-)SQL dump	-s sql

Next on the todo list: Implement TriG output format for benchmarking Named Graph stores.

Configuration options:

Option	Description
-s <output format>	For the dataset there are several output formats supported. See upper table for details. Default: nt
-pc <number of products>	Scale factor: The dataset is scaled via the number of products. For example: 91 products make about 50K triples. Default: 100
-fc	The data generator by default adds rdf:type statements for all types of a product to the dataset. If the SUT supports RDFS reasoning, the option -fc can be used to exclude these statements and leave generating them to the inference engine of the store. Default: disabled
-dir	The output directory for all the data the Test Driver uses for its runs. Default: "td_data"
-fn	The file name for the generated dataset (suffix is added according to the output format). Default: "dataset"

The following example command creates a Turtle benchmark dataset with the scale factor 1000 and forward chaining enabled:

$ java -cp bin:lib/ssj.jar benchmark.generator.Generator -fc -pc 1000 -s ttl

9.1 Test Driver

The test driver works against a SPARQL endpoint over the SPARQL protocol.

Configuration options:

Option	Description
-runs <number of runs>	The number of query mix runs. Default: 50
-idir <directory>	The input parameter directory which was created by the Data Generator. Default: "td_data"
-w <number of warm up runs>	Number of runs executed before the actual test to warm up the store. Default: 10
-o <result XML file>	The output file containing the aggregated result overview. Default: "benchmark_result.xml"
-dg <default graph URI>	Specify a default graph for the queries. Default: null
-mt <number of clients>	Benchmark with multiple concurrent clients.
-seed <Long value>	Set the seed for the random number generator used for the parameter generation.
-t <Timeout in ms>	If for a specific query the complete result is not read after the specified timeout, the client disconnects and reports a timeout to the Test Driver. This is also the maximum runtime a query can contribute to the metrics.
-q	Turn on qualification mode. For more information, see the qualification chapter.
-qf <qualification file name>	Change the qualification file name, also see the qualification chapter.

In addition to these options a SPARQL-endpoint must be given.

A detailed run log is generated for log level 'ALL' containing information about every executed query.

The following example command runs 128 query mixes (plus 32 for warm-up) against a SUT which provides a SPARQL-endpoint at http://localhost/sparql:

$ java -cp bin:lib/* benchmark.testdriver.TestDriver http://localhost/sparql

Or, if your java version does not support the asterisk in the classpath definition, you can write:

java -cp bin:lib/ssj.jar:lib/log4j-1.2.15.jar benchmark.testdriver.TestDriver http://localhost/sparql

The following example runs 1024 query mixes plus 128 warm up mixes with 4 clients against a SUT which provides a SPARQL-endpoint. The timeout per query is set to 30s.

$ java -cp bin:lib/ssj.jar:lib/log4j-1.2.15.jar benchmark.testdriver.TestDriver -runs 1024 -w 128 -mt 4 -t 30000 http://localhost/sparql

10. References

For more information about RDF and SPARQL Benchmarks please refer to:

ESW Wiki Page about RDF Benchmarks

http://esw.w3.org/topic/RdfStoreBenchmarking

Other SPARQLBenchmarks

Papers about RDF and SPARQL Benchmarks

Y. Guo, Z. Pan, and J. Heflin: LUBM: A Benchmark for OWL Knowledge Base Systems. Journal of Web Semantics 3(2), 2005, pp158-182
Yuanbo Guo et al: A Requirements Driven Framework for Benchmarking Semantic Web Knowledge Base Systems
Li Ma et al.: Towards a Complete OWL Ontology Benchmark (UOBM)
Timo Weithöner et al.: What's Wrong with OWL Benchmarks?

Appendix A: Changes

2008-10-17: Added documentation about the qualitfication dataset and tools (Andreas).
2008-09-15: Editorial fine-tuning for the V2 release (Chris).
2008-09-10: Added the SQL representation of the benchmark queries and the relational representation of the data set (Andreas)
2008-07-30: Update data generation rules and add relational schemata (Andreas, Chris)
2008-07-19: Added data generator and test driver documentation (Andreas)
2008-06-25: Updated queries after first test runs (Chris, Andreas).
2008-06-06: Added description of the product hierarchy to section 2.3.2. Fixed some typos (Andreas).
2008-05-25: Updated document with suggestions from Michael Schmidt (Chris).
2008-05-11: Added sections about Named Graphs (Chris).
2008-05-09: Changed query definitions and added parameter descriptions (Chris, Andreas)
2008-04-15: Extended section about data generation rules and reporting format (Andreas)
2008-02-09: Extended datamodel and scenario descriptions. Minor updates through the whole document (Chris)
2008-02-06: Updates through the whole document (Chris)
2008-02-04: Initial version of this document (Chris)

Appendix B: Thanks

Lots of thanks to

Eli Lilly and Company and especially Susie Stephens for making this work possible with a research grant.
Orri Erling, Andy Seaborne, Arjohn Kampman, Michael Schmidt, Richard Cyganiak, Ivan Mikhailov, Patrick van Kleef, and Christian Becker for their feedback on the benchmark design and the benchmark experiment.

Berlin SPARQL Benchmark (BSBM) Specification - V2.0

Abstract

Table of Contents

2.1. Logical Schema

2.1.1 Namespaces

2.2.2 Classes and Properties

2.2. Triple, Named Graphs and Relational Representation

2.2.1 Triple Representation

2.2.2. Named Graphs Representation

2.2.3 Relational Representation

2.3. Scaling and Dataset Population

2.3.1 Class: Product

2.3.2 Class ProductType

2.3.4 Class Product Feature

2.3.5 Class Producer

2.3.6 Class Vendor

2.3.7 Class Offer

2.3.8 Class Person

2.3.9 Class Review

3.2 SPARQL Queries for the Triple Data Model

Query 1: Find products for a given set of generic features.

Query 2: Retrieve basic information about a specific product for display purposes

Query 3: Find products having some specific features and not having one feature.

Query 4: Find products matching two different sets of features.

Query 5: Find product that are similar to a given product.

Query 6: Find products having a label that contains a specific string.

Query 7: Retrieve in-depth information about a specific product including offers and reviews.

Query 8: Give me recent reviews in English for a specific product.

Query 9: Get information about a reviewer.

Query 10: Get offers for a given product which fulfill specific requirements.

Query 11: Get all information about an offer.

Query 12: Export information about an offer into another schemata.

Query 2: Retrieve basic information about a specific product for display purposes

Query 7: Retrieve in-depth information about a specific product including offers and reviews.

Query 1: Find products for a given set of generic features.

Query 2: Retrieve basic information about a specific product for display purposes

Query 3: Find products having some specific features and not having one feature.

Query 4: Find products matching two different sets of features.

Query 5: Find product that are similar to a given product.

Query 6: Find products having a label that contains a specific string.

Query 7: Retrieve in-depth information about a specific product including offers and reviews.

Query 8: Give me recent reviews in English for a specific product.

Query 9: Get information about a reviewer.

Query 10: Get offers for a given product which fulfill specific requirements.

Query 11: Get all information about an offer.

Query 12: Export information about an offer into another schemata.

6.1 Single native RDF repository using the triple data model.

6.2 Single native RDF repository using the named graph data model.

6.3 Other type of data source which exposes triple-based SPARQL endpoint

6.4 Other type of data source which exposes named graph-based SPARQL endpoint

7.1 Reporting Single Results

7.2 Full Disclosure Report

9.1 Data Generator

9.1 Test Driver

ESW Wiki Page about RDF Benchmarks

Other SPARQLBenchmarks

Papers about RDF and SPARQL Benchmarks