This version 1 of the BSBM specification has been superseeded by version 2 of the specification.
This document defines the Berlin SPARQL Benchmark (BSBM) for measuring the performance of storage systems that expose SPARQL endpoints. Such systems include native RDF stores, Named Graph stores, systems that map relational databases into RDF, and SPARQL wrappers around other kinds of data sources. The benchmark is built around an e-commerce use case, where a set of products is offered by different vendors and different consumers have posted reviews about products. The benchmark query mix illustrates the search and navigation pattern of a consumer looking for a product.
The SPARQL Query Language for RDF and the SPARQL Protocol for RDF are implemented by a growing number of storage systems and are used within enterprise and open web settings. As SPARQL is taken up by the community there is a growing need for benchmarks to compare the performance of storage systems that expose SPARQL endpoints via the SPARQL protocol. Such systems include native RDF stores, Named Graph stores, systems that map relational databases into RDF, and SPARQL wrappers around other kinds of data sources.
The Berlin SPARQL Benchmark (BSBM) defines a suite of benchmarks for comparing the performance of these systems across architectures. The benchmark is built around an e-commerce use case in which a set of products is offered by different vendors and consumers have posted reviews about products. The benchmark query mix illustrates the search and navigation pattern of a consumer looking for a product.
The Berlin SPARQL Benchmark (BSBM) consists of:
The Berlin SPARQL Benchmark was designed along three goals: First, the benchmark should allow the comparison of different storage systems that expose SPARQL endpoints across architectures. Testing storage systems with realistic workloads of use case motivated queries is a well established benchmarking technique in the database field and is for instance implemented by the TPC H benchmark. The Berlin SPARQL Benchmark should apply this technique to systems that expose SPARQL endpoints. As an increasing number of Semantic Web applications do not rely on heavyweight reasoning but focus on the integration and visualization of large amounts of data from autonomous data sources on the Web, the Berlin SPARQL Benchmark should not be designed to require complex reasoning but to measure the performance of queries against large amounts of RDF data.
The rest of this document is structured as follows: Section 2 defines the schema of benchmark dataset and describes the rules that are used by the data generator for populating the dataset according to the chosen scale factor. Section 3 defines the benchmark queries. Section 4 describes the performance metrics that are calculated by the test driver. Section 5 describes the different benchmark scenarios that are covered by BSBM and provides rules for running the benchmark in these scenarios. Section 6 specifies the content of the BSBM full-disclosure report Sections 7 and 8 define how a system under test is verified against the qualification dataset and describe the usage of the data generator and test driver.
This section defines the schema of the BSBM benchmark dataset (2.1) and describes the data generation rules that are used by the data generator to populate the dataset according to the chosen scale factor (2.2).
This section provides the Schema for the benchmark dataset. The dataset is based on an e-commerce use case, where a set of products is offered by different vendors and different consumers have posted reviews about these products on various review sites.
Prefix | Namespace |
rdf: | http://www.w3.org/1999/02/22-rdf-syntax-ns# |
rdfs: | http://www.w3.org/2000/01/rdf-schema# |
foaf: | http://xmlns.com/foaf/0.1/ |
dc: | http://purl.org/dc/elements/1.1/ |
xsd: | http://www.w3.org/2001/XMLSchema-datatypes/ |
rev: | http://purl.org/stuff/rev# |
bsbm: | http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/ |
bsbm-inst: | http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ |
Meta data Properties
The meta data properties are used to capture the information source and the publication date of each instance.
Class Product
Comment: Products are described by different sets of product properties and product features.
Example Instance:
dataFromProducer001411:Product00001435443
rdf:type bsbm:Product;
rdf:type bsbm-inst:ProductType001342;
rdfs:label "Canon Ixus 20010" ;
rdfs:comment "Mit ihrer hochwertigen Verarbeitung, innovativen Technologie und faszinierenden Erscheinung
verkörpern Digital IXUS Modelle die hohe Kunst des Canon Design." ;
bsbm:producer bsbm-inst:Producer001411 ;
bsbm:productFeature bsbm-inst:ProductFeature003432 ;
bsbm:productFeature bsbm-inst:ProductFeature103433 ;
bsbm:productFeature bsbm-inst:ProductFeature990433 ;
bsbm:productPropertyTextual1 "New this year." ;
bsbm:productPropertyTextual2 "Special Lens with special focus." ;
bsbm:productPropertyNumeric1 "1820"^^xsd:Integer ;
bsbm:productPropertyNumeric2 "140"^^xsd:Integer ;
bsbm:productPropertyNumeric3 "17"^^xsd:Integer ;
dc:publisher dataFromProducer001411:Producer001411 ;
dc:date "2008-02-13"^^xsd:date .
Class ProductType
Comment: Product types form an irregular subsumption hierarchy (depth 3-5).
Example Instance:
bsbm-inst:ProductType011432
rdf:type bsbm:ProductType ;
rdfs:label "Digital Camera" ;
rdfs:comment "A camera that records pictures electronically rather than on film." ;
rdfs:subClassOf bsbm-inst:ProductType011000
dc:publisher bsbm-inst:StandardizationInstitution01 ;
dc:date "2008-02-13"^^xsd:date .
Class ProductFeature
Comment: The set of possible product features for a specific product depends on the product type. Each product type in the hierarchy has a set of associated product features, which leads to some features being very generic and others being more specific.
Example Instance:
bsbm-inst:ProductFeature103433
rdf:type bsbm:ProductFeature ;
rdfs:label "Wide Screen TFT-Display" ;
rdfs:comment "Wide Screen TFT-Display." ;
dc:publisher bsbm-inst:StandardizationInstitution01 ;
dc:date "2008-02-13"^^xsd:date .
Class Producer
Example Instance:
dataFromProducer001411:Producer001411
rdf:type bsbm:Producer ;
rdfs:label "Canon" ;
rdfs:comment "Canon is a world leader in imaging products and solutions for the digital home and office." ;
foaf:homepage <http://www.canon.com/>
bsbm:country <http://downlode.org/rdf/iso-3166/countries#US> ;
dc:publisher dataFromProducer001411:Producer001411 ;
dc:date "2008-02-13"^^xsd:date .
Class Vendor
Example Instance:
dataFromVendor001400:Vendor001400
rdf:type bsbm:Vendor ;
rdfs:label "Cheap Camera Place" ;
rdfs:comment "We sell the cheapest cameras." ;
foaf:homepage <http://www.cameraplace.com/>
bsbm:country <http://downlode.org/rdf/iso-3166/countries#GB> ;
dc:publisher dataFromVendor001400:Vendor001400 ;
dc:date "2008-02-03"^^xsd:date .
Class Offer
Example Instance:
dataFromVendor001400:Offer2413
rdf:type bsbm:Offer ;
bsbm:product dataFromProducer001411:Product00001435443 ;
bsbm:vendor dataFromVendor001400:Vendor001400 ;
bsbm:price "31.99"^^bsbm:USD ;
bsbm:validFrom "2008-02-12"^^xsd:date ;
bsbm:validTo "2008-02-20"^^xsd:date ;
bsbm:deliveryDays "7"^^xsd:Integer ;
bsbm:offerWebpage <http://vendor001400.com/offers/Offer2413>
dc:publisher dataFromVendor001400:Vendor001400 ;
dc:date "2008-02-13"^^xsd:date .
Class Person
Example Instance:
dataFromRatingSite0014:Reviewer1213
rdf:type foaf:Person ;
foaf:name "Jenny324" ;
foaf:mbox_sha1sum "4749d7c44dc4c0adf66c1319d42b89e18df6df76" ;
bsbm:country <http://downlode.org/rdf/iso-3166/countries#DE> ;
dc:publisher dataFromRatingSite0014:RatingSite0014 ;
dc:date "2007-10-13"^^xsd:date .
Class Review
Example Instance:
dataFromRatingSite0014:Review022343
rdf:type rev:Review ;
bsbm:reviewFor dataFromProducer001411:Product00001435443 ;
rev:reviewer dataFromRatingSite0014:Reviewer1213 ;
bsbm:reviewDate "2007-10-10"^^xsd:date ;
dc:title "This is a nice small camera"@en ;
rev:text "Open your wallet, take out a credit card. No, I'm not going to ask you to order one just yet ..."@en
bsbm:rating1 "5"^^xsd:Integer ;
bsbm:rating2 "4"^^xsd:Integer ;
bsbm:rating3 "3"^^xsd:Integer ;
bsbm:rating4 "4"^^xsd:Integer ;
dc:publisher dataFromRatingSite0014:RatingSite0014 ;
dc:date "2007-10-13"^^xsd:date .
In order to compare the performance of systems that expose SPARQL endpoints, but use different internal data models, there are three different versions of the benchmark dataset as well as different versions of the benchmark queries.
Within the triple version of the dataset, the publisher and the publication data is captured for each instance by a dc:publisher and a dc:date triple.
Examples:
dataFromVendor001400:Offer2413
rdf:type bsbm:Offer ;
bsbm:product dataFromProducer001411:Product00001435443 ;
bsbm:vendor dataFromVendor001400:Vendor001400 ;
bsbm:price "31.99"^^bsbm:USD ;
bsbm:validFrom "2008-02-12"^^xsd:date ;
bsbm:validTo "2008-02-20"^^xsd:date ;
bsbm:deliveryDays "7"^^xsd:Integer ;
bsbm:offerWebpage <http://vendor001400.com/offers/Offer2413>
dc:publisher dataFromVendor001400:Vendor001400 ;
dc:date "2008-02-13"^^xsd:date . dataFromVendor001400:Offer2414
rdf:type bsbm:Offer ;
bsbm:product dataFromProducer001411:Product00001435444 ;
bsbm:vendor dataFromVendor001400:Vendor001400 ;
bsbm:price "23.99"^^bsbm:USD ;
bsbm:validFrom "2008-02-10"^^xsd:date ;
bsbm:validTo "2008-02-22"^^xsd:date ;
bsbm:deliveryDays "7"^^xsd:Integer ;
bsbm:offerWebpage <http://vendor001400.com/offers/Offer2414>
dc:publisher dataFromVendor001400:Vendor001400 ;
dc:date "2008-02-13"^^xsd:date .
Within the Named Graph version of the dataset, all information that originates from a specific producer, vendor or rating site is put into a distinct named graph. There is one additional graph that contains provenance information (dc:publisher, dc:date) for all other graphs.
Example (using the TriG syntax):
dataFromVendor001400:Graph-2008-02-13 {
dataFromVendor001400:Offer2413
rdf:type bsbm:Offer ;
bsbm:product dataFromProducer001411:Product00001435443 ;
bsbm:vendor dataFromVendor001400:Vendor001400 ;
bsbm:price "31.99"^^bsbm:USD ;
bsbm:validFrom "2008-02-12"^^xsd:date ;
bsbm:validTo "2008-02-20"^^xsd:date ;
bsbm:deliveryDays "7"^^xsd:Integer ;
bsbm:offerWebpage <http://vendor001400.com/offers/Offer2413>
dataFromVendor001400:Offer2414
rdf:type bsbm:Offer ;
bsbm:product dataFromProducer001411:Product00001435444 ;
bsbm:vendor dataFromVendor001400:Vendor001400 ;
bsbm:price "23.99"^^bsbm:USD ;
bsbm:validFrom "2008-02-10"^^xsd:date ;
bsbm:validTo "2008-02-22"^^xsd:date ;
bsbm:deliveryDays "7"^^xsd:Integer ;
bsbm:offerWebpage <http://vendor001400.com/offers/Offer2414>
}
localhost:provenanceData {
dataFromVendor001400:Graph-2008-02-13 dc:publisher dataFromVendor001400:Vendor001400 ;
dataFromVendor001400:Graph-2008-02-13 dc:date "2008-02-13"^^xsd:date .
}
In order to benchmark systems that map relational databases to RDF and rewrite SPARQL queries into SQL queries against an application specific relational data model, the BSBM data generator is also able to output the dataset as an MySQL dump.
The dum uses the following relational schema:
ProductFeature(nr, label, comment, publisher, publishDate)
ProductType(nr, label, comment, parent, publisher, publishDate)
Producer(nr, label, comment, homepage, country, publisher, publishDate)
Product(nr, label, comment, producer, propertyNum1, propertyNum2, propertyNum3, propertyNum4, propertyNum5,
propertyNum6, propertyTex1, propertyTex2, propertyTex3, propertyTex4, propertyTex5, propertyTex6,
publisher, publishDate)
ProductTypeProduct(product, productType)
ProductFeatureProduct(product, productFeature)
Vendor(nr, label, comment, homepage, country, publisher, publishDate)
Offer(nr, product, producer, vendor, price, validFrom, validTo, deliveryDays, offerWebpage, publisher, publishDate)
Person(nr, name, mbox_sha1sum, country, publisher, publishDate)
Review(nr, product, producer, person, reviewDate, title, text, language, rating1, rating2, rating3, rating4,
publisher, publishDate)
This section defines the rules for generating benchmark data for a given scale factor.
The benchmark is scaled by the number of products.
The data generator is described in Section 8.
Products have product types and are described with various properties. There are products with several different product property combinations (many properties, less properties).
Rules for data generation:
There are three types of product descriptions. The table below lists the textual and numeric properties for each type.
Textual Properties | Numeric Properties | ||
Description Type 1 | PropertyTextual1 to PropertyTextual5 | PropertyNumeric1 to PropertyNumeric5 | 40% |
Description Type 2 |
PropertyTextual1 to PropertyTextual3 |
PropertyNumeric1 to PropertyNumeric3 + optional PropertyNumeric4 (50%) + optional PropertyNumeric5 (25%) |
20% |
Description Type 3 |
PropertyTextual1 to PropertyTextual3 |
PropertyNumeric1 to PropertyNumeric3 + optional PropertyNumeric5 (25%) + optional PropertyNumeric6 (50%) |
40% |
Relation: Product-Producer
Relation: Product-ProductType
Relation: Product-ProductFeature
Relation: Product-Offer
Relation: Product-Review
Irregular subsumption hierarchy (depth 2-6). Number of classes increases with the number of products (around 2log10#Products).
The branching factor for every node on the same level is equal and gets calculated for arbitrary scale factors. The table below illustrates the relationship between number of products and branching factors for every level:
root level | level 1 | level 2 | level 3 | |
100 products | 4 | 2 | ||
1 000 products | 6 | 4 | ||
10 000 products | 8 | 8 | ||
100 000 products | 10 | 8 | 4 | |
1 000 000 products | 12 | 16 | 8 | |
10 000 000 products | 14 | 32 | 16 | |
100 000 000 products | 16 | 32 | 16 | 4 |
As can be seen the depth increases by one everytime the product count grows by a factor of 1000.
Rules for data generation:Each feature is assigned to a product type in the type hierarchy, which leads to some features being very generic and other being more specific.
Rules for data generation:
Rules for data generation:
Per 1000 products, there are 20 producers generated on average.
Per 1000 products, there is 1 vendor generated on average.
Relation: Vendor-OfferRules for data generation:
Per 1000 products, there are 20000 offers generated.
Rules for data generation:
Per 1000 products, there are on average 1250 persons generated.
Rules for data generation:
Per 1000 products, there are 25000 reviews generated.
Relation: Review-Person
Relation: Review-Ratingsite
Dictionary 1: Words from set of product names (around 90.000 words)
Dictionary 2: Words from English text (todo: look for a corpus with English sentences, currently dictionary 1 is used)
Dictionary 3: Names of persons (around 90.000 names)
This section defines a suite of benchmark queries and a query mix.
The benchmark queries are designed to emulate the search and navigation pattern of a consumer looking for products. A product search includes the following steps:
There are three representations of the benchmark query set: One for the Triple and one for the Named Graphs data model as well as a pure SQL version against relational model given in section 2.2.3. All query sets have the same semantics.
Each query is defined by the following components:
Use Case Motivation: A consumer is looking for a product and has a general idea about what he wants.
SPARQL Query:
PREFIX bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?product ?label
WHERE {
?product rdfs:label ?label .
?product rdf:type %ProductType% .
?product bsbm:productFeature %ProductFeature1% .
?product bsbm:productFeature %ProductFeature2% . ?product bsbm:productPropertyNumeric1 ?value1 .
FILTER (?value1 > %x%)
}
ORDER BY ?label
LIMIT 10
Parameters:
Parameter | Description |
---|---|
%ProductType% | A randomly selected Class URI from the class hierarchy (one level above leaf level). |
%ProductFeature1% %ProductFeature2% |
Two different, randomly selected feature URIs that correspond to the chosen product type. |
%x% | A number between 1 and 500 |
Query Properties:
Use Case Motivation: The consumer wants to view basic information about products found by query 1.
SPARQL Query
PREFIX bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
SELECT ?label ?comment ?producer ?productFeature ?propertyTextual1 ?propertyTextual2 ?propertyTextual3
?propertyNumeric1 ?propertyNumeric2 ?propertyTextual4 ?propertyTextual5 ?propertyNumeric4
WHERE {
%ProductXYZ% rdfs:label ?label .
%ProductXYZ% rdfs:comment ?comment .
%ProductXYZ% bsbm:producer ?p .
?p rdfs:label ?producer .
%ProductXYZ% dc:publisher ?p .
%ProductXYZ% bsbm:productFeature ?f .
?f rdfs:label ?productFeature .
%ProductXYZ% bsbm:productPropertyTextual1 ?propertyTextual1 .
%ProductXYZ% bsbm:productPropertyTextual2 ?propertyTextual2 .
%ProductXYZ% bsbm:productPropertyTextual3 ?propertyTextual3 .
%ProductXYZ% bsbm:productPropertyNumeric1 ?propertyNumeric1 .
%ProductXYZ% bsbm:productPropertyNumeric2 ?propertyNumeric2 .
OPTIONAL { %ProductXYZ% bsbm:productPropertyTextual4 ?propertyTextual4 }
OPTIONAL { %ProductXYZ% bsbm:productPropertyTextual5 ?propertyTextual5 }
OPTIONAL { %ProductXYZ% bsbm:productPropertyNumeric4 ?propertyNumeric4 }
}
Parameters:
Parameter | Description |
---|---|
%ProductXYZ% | A product URI (randomly selected) |
Query Properties:
Use Case Motivation: After looking at information about some products, the consumer has a more specific idea what we wants. Therefore, he asks for products having several features but not having a specific other feature.
SPARQL Query:
PREFIX bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?product ?label
WHERE {
?product rdfs:label ?label .
?product rdf:type %ProductType% .
?product bsbm:productFeature %ProductFeature1% .
?product bsbm:productPropertyNumeric1 ?p1 .
FILTER ( ?p1 > %x% )
?product bsbm:productPropertyNumeric3 ?p3 .
FILTER (?p3 < %y% )
OPTIONAL {
?product bsbm:productFeature %ProductFeature2% .
?product rdfs:label ?testVar }
FILTER (!bound(?testVar))
}
ORDER BY ?label
LIMIT 10
Parameters:
Parameter | Description |
---|---|
%ProductType% | A randomly selected Class URI from the class hierarchy (leaf level). |
%ProductFeature1% %ProductFeature2% |
Three different, randomly selected product feature URI that correspond to the chosen product type. |
%x% %y% |
Two random numbers between 1 and 500 |
Query Properties:
Use Case Motivation: After looking at information about some products, the consumer has a more specific idea what we wants. Therefore, he asks for products matching either one set of features or another set.
SPARQL Query:
PREFIX bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?product ?label
WHERE {
{
?product rdfs:label ?label .
?product rdf:type %ProductType% .
?product bsbm:productFeature %ProductFeature1% .
?product bsbm:productFeature %ProductFeature2% .
?product bsbm:productPropertyNumeric1 ?p1 .
FILTER ( ?p1 > %x% )
} UNION {
?product rdfs:label ?label .
?product rdf:type %ProductType% .
?product bsbm:productFeature %ProductFeature1% .
?product bsbm:productFeature %ProductFeature3% .
?product bsbm:productPropertyNumeric2 ?p2 .
FILTER ( ?p2> %y% )
}
}
ORDER BY ?label
OFFSET 10
LIMIT 10
Parameters:
Parameter | Description |
---|---|
%ProductType% | A randomly selected Class URI from the class hierarchy (leaf level). |
%ProductFeature1% %ProductFeature2% %ProductFeature3% |
Three different, randomly selected product feature URI that correspond to the chosen product type. |
%x% %y% |
Two random numbers between 1 and 500 |
Query Properties:
Use Case Motivation: The consumer has found a product that fulfills his requirements. He now wants to find products with similar features.
SPARQL Query:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
SELECT DISTINCT ?product ?productLabel
WHERE {
?product rdfs:label ?productLabel .
%ProductXYZ% rdf:type ?prodtype.
?product rdf:type ?prodtype .
FILTER (%ProductXYZ% != ?product)
%ProductXYZ% bsbm:productFeature ?prodFeature .
?product bsbm:productFeature ?prodFeature .
%ProductXYZ% bsbm:productPropertyNumeric1 ?origProperty1 .
?product bsbm:productPropertyNumeric1 ?simProperty1 .
FILTER (?simProperty1 < (?origProperty1 + 150) && ?simProperty1 > (?origProperty1 – 150))
%ProductXYZ% bsbm:productPropertyNumeric2 ?origProperty2 .
?product bsbm:productPropertyNumeric2 ?simProperty2 .
FILTER (?simProperty2 < (?origProperty2 + 220) && ?simProperty2 > (?origProperty2 – 220))
}
ORDER BY ?productLabel
LIMIT 5
Parameters:
Parameter | Description |
---|---|
%ProductXYZ% | A product URI (randomly selected) |
Query Properties:
Use Case Motivation: The consumer remembers parts of a product name from former searches. He wants to find the product again by searching for the parts of the name that he remembers.
SPARQL Query:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
SELECT ?product ?label
WHERE {
?product rdfs:label ?label .
?product rdf:type bsbm:Product .
FILTER regex(?label, "%word1%|%word2%|%word3%")
}
Parameters:
Parameter | Description |
---|---|
%word1% %word2% %word3% |
Three different words from the list of words that were used in the dataset generation. |
Query Properties:
Use Case Motivation: The consumer has found a products which fulfills his requirements. Now he wants in-depth information about this product including offers from German vendors and product reviews if existent.
SPARQL Query:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rev: <http://purl.org/stuff/rev#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
SELECT ?productLabel ?offer ?price ?vendor ?vendorTitle ?review ?revTitle
?reviewer ?revName ?rating1 ?rating2
WHERE {
%ProductXYZ% rdfs:label ?productLabel .
OPTIONAL {
?offer bsbm:product %ProductXYZ% .
?offer bsbm:price ?price .
?offer bsbm:vendor ?vendor .
?vendor rdfs:label ?vendorTitle .
?vendor bsbm:country <http://downlode.org/rdf/iso-3166/countries#DE> .
?offer dc:publisher ?vendor .
?offer bsbm:validTo ?date .
FILTER (?date > %currentDate% )
}
OPTIONAL {
?review bsbm:reviewFor %ProductXYZ% .
?review rev:reviewer ?reviewer .
?reviewer foaf:name ?revName .
?review dc:title ?revTitle .
OPTIONAL { ?review bsbm:rating1 ?rating1 . }
OPTIONAL { ?review bsbm:rating2 ?rating2 . }
}
}
Parameters:
Parameter | Description |
---|---|
%ProductXYZ% | A product URI (randomly selected) |
%currentDate% | A date within the validFrom validTo range of the offers (same date for all queries within a run). |
Query Properties:
Use Case Motivation: The consumer wants to read the 20 most recent English language reviews about a specific product.
SPARQL Query:
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX rev: <http://purl.org/stuff/rev#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?title ?text ?reviewDate ?reviewer ?reviewerName ?rating1 ?rating2 ?rating3 ?rating4
WHERE {
?review bsbm:reviewFor %ProductXYZ% .
?review dc:title ?title .
?review rev:text ?text .
FILTER langMatches( lang(?text), "EN" )
?review bsbm:reviewDate ?reviewDate .
?review rev:reviewer ?reviewer .
?reviewer foaf:name ?reviewerName .
OPTIONAL { ?review bsbm:rating1 ?rating1 . }
OPTIONAL { ?review bsbm:rating2 ?rating2 . }
OPTIONAL { ?review bsbm:rating3 ?rating3 . }
OPTIONAL { ?review bsbm:rating4 ?rating4 . }
}
ORDER BY DESC(?reviewDate)
LIMIT 20
Parameters:
Parameter | Description |
---|---|
%ProductXYZ% | A product URI (randomly selected) |
Query Properties:
Use Case Motivation: In order to decide whether to trust a review, the consumer asks for any kind of information that is available about the reviewer.
SPARQL Query:
PREFIX rev: <http://purl.org/stuff/rev#>
DESCRIBE ?x
WHERE { %ReviewXYZ% rev:reviewer ?x }
Parameters:
Parameter | Description |
---|---|
%ReviewXYZ% | A review URI (randomly selected) |
Query Properties:
Use Case Motivation: The consumer wants to buy from a vendor in a specific country that is able to deliver within 3 days and is looking for the cheapest offer that fulfills these requirements.
SPARQL Query:
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
SELECT DISTINCT ?offer ?price
WHERE {
?offer bsbm:product %ProductXYZ% .
?offer bsbm:vendor ?vendor .
?offer dc:publisher ?vendor .
?vendor bsbm:country %CountryXYZ% .
?offer bsbm:deliveryDays ?deliveryDays .
FILTER (?deliveryDays <= 3)
?offer bsbm:price ?price .
?offer bsbm:validTo ?date .
FILTER (?date > %currentDate% )
}
ORDER BY ?price
LIMIT 10
Parameters:
Parameter | Description |
---|---|
%ProductXYZ% | A product URI (randomly selected) |
%CountryXYZ% | A random country URI from the set |
%currentDate% | A date within the validFrom-validTo range of the offers (same date for all queries within a run). |
Query Properties:
The queries for the Named Graphs data model have the same semantics as the queries for the triple data model. The queries do not specify the IRIs of the named graphs in the RDF Dataset using the FROM NAMED clause, but assume that the query is executed against the complete RDF Dataset.
This is still work in progress ...
Todo: Rewrite all queries for Named Graphs. Two examples are already found below:
SPARQL Query
PREFIX bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?label ?comment ?producer ?productFeature ?propertyTextual1 ?propertyTextual2
?propertyNumeric1 ?propertyNumeric2 ?propertyTextual4 ?propertyTextual5 ?propertyNumeric4
WHERE {
GRAPH ?graph {
%ProductXYZ% rdfs:label ?label .
%ProductXYZ% rdfs:comment ?comment .
%ProductXYZ% bsbm:producer ?p .
?p rdfs:label ?producer .
%ProductXYZ% bsbm:productFeature ?f .
?f rdfs:label ?productFeature .
%ProductXYZ% bsbm:productPropertyTextual1 ?propertyTextual1 .
%ProductXYZ% bsbm:productPropertyTextual2 ?propertyTextual2 .
%ProductXYZ% bsbm:productPropertyNumeric1 ?propertyNumeric1 .
%ProductXYZ% bsbm:productPropertyNumeric2 ?propertyNumeric2 .
OPTIONAL { %ProductXYZ% bsbm:productPropertyTextual4 ?propertyTextual4 }
OPTIONAL { %ProductXYZ% bsbm:productPropertyTextual5 ?propertyTextual5 }
OPTIONAL { %ProductXYZ% bsbm:productPropertyNumeric4 ?propertyNumeric4 }
}
GRAPH localhost:provenanceData {
?graph dc:publisher ?p .
}
}
SPARQL Query:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rev: <http://purl.org/stuff/rev#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
SELECT ?productLabel ?offer ?price ?vendor ?vendorTitle ?review ?revTitle
?reviewer ?revName ?rating1 ?rating2
WHERE {
GRAPH ?producerGraph {
%ProductXYZ% rdfs:label ?productLabel .
}
OPTIONAL {
GRAPH ?vendorGraph {
?offer bsbm:product %ProductXYZ% .
?offer bsbm:price ?price .
?offer bsbm:vendor ?vendor .
?vendor rdfs:label ?vendorTitle .
?offer bsbm:validTo ?date .
FILTER (?date > %currentDate% )
}
}
OPTIONAL {
GRAPH ?ratingSiteGraph {
?review bsbm:reviewFor %ProductXYZ% .
?review rev:reviewer ?reviewer .
?reviewer foaf:name ?revName .
?review dc:title ?revTitle .
OPTIONAL { ?review bsbm:rating1 ?rating1 . }
OPTIONAL { ?review bsbm:rating2 ?rating2 . }
}
}
GRAPH localhost:provenanceData {
?vendorGraph dc:publisher ?vendor .
}
}
This section will contain a SQL representation of the benchmark queries in order to be able to compare the performance of stores that expose SPARQL endpoints, which the performance of classic SQL-based RDBMs.
This is still work in progress ...
A BSBM query mix consists of 25 queries that simulate a product search by a single consumer. The query sequenze is given below:
The three fundamental performance metrics of the BSBM are:
4.1.1 Metrics for Single Queries
Average Query Execution Time (AQET): Average time for executing an individual query of type x multiple times with different parameters.
Queries per Second (QPS): Average amount of queries per second of a certain query.
Min/Max Query Execution Time (minQET, maxQET): A lower and upper bound execution time for queries of type x.
4.1.2 Metrics for Query Mixes
Queries Mixes per Hour (QMpH): Number of query mixes with different parameters that are executed per hour.
Composite Query Execution Time (CQET): Average time for executing the query mix multiple times with different parameters.
Average Query Execution Time over all Queries (AQEToA): Overall time to run 50 query mixes devided by the number of queries (25*50=1250).
4.1.3 Price/Performance Metric for the Complete System under Test (SUT)
The Price/Performance Metric defined as $ / QMpH.
Where $ is the total system cost in the specified currency. The components are priced according to the TPC Pricing Specification.
If compute on demand infrastructure is used, the costing will be $/QMpH/day.
The Berlin SPARQL Benchmark is designed to support the comparison of native RDF stores, native Named Graph stores, systems that map relational databases into RDF, and SPARQL-wrappers around other kinds of data sources.
There are the following benchmark scenarios:
Scenario for benchmarking native triple stores.
Acronym : NTR = Native Triple Repository
Setup: One or more clients ask SPARQL queries over the SPARQL protocol against a single store which exposes a triple data model.
Use Case Motivation: An e-shop selling different products provides a SPARQL endpoint over its data.
Rules for Running the Benchmark:
Scenario for benchmarking native Named Graphs stores.
Acronym: NNGR = Native Named Graphs Repository
Setup: One or more clients ask SPARQL queries over the SPARQL protocol against a single store which exposes a Named Graphs data model.
Use Case Motivation: An electronic market which integrates information from different producers, vendors and rating sites provides a SPARQL endpoint over its data.
Rules for Running the Benchmark:
Scenario for benchmarking other types of data sources that expose triple-based SPARQL endpoints. For instance, RDF-mapped relational databases, wrappers around WebAPIs and the like.
Acronym: MTS = Mapped Triple Source
Setup: One or more clients ask SPARQL queries over the SPARQL protocol against a store which exposes a triple data model.
Use Case Motivation: An e-shop selling different products provides a SPARQL endpoint over its data.
Rules for Running the Benchmark:
Benchmark Dataset:
The data generator will be able to export data in a generic XML format, which can be imported into different stores, for instance relational data bases.
Scenario for benchmarking other types of data sources that expose named graph-based SPARQL endpoints. For instance, RDF-mapped relational databases, wrappers around WebAPIs or the like.
Acronym: MNGS = Mapped Named Graph Source
Setup: One or more clients ask SPARQL queries over the SPARQL protocol against a store which exposes a named graph-based data model.
Use Case Motivation: An electronic market which integrates information from different producers, vendors and rating sites provides a SPARQL endpoint over its data.
Rules for Running the Benchmark:
Benchmark Dataset:
The data generator will be able to export data in a generic XML format, which can be imported into different stores, for instance relational data bases.
This section defines formats for reporting benchmark results.
Benchmark results are named according to the scenario, the
scale factor of the dataset and the number of concurrent clients.
For example:
NTP(1000,5) means
23.7 QPS(2)-NTS(10000,1)
means that on average 23.7 queries of type 2 were executed per second by a single client stream against a Native Triple Store containing data about 10,000 products.
To guarantee an intelligible interpretation of benchmark reports/results as well as to allow for efficient and even automated handling/comparisons, all necessary information shall be represented in XML. Furthermore we opt for full disclosure policy of the SUT, configuration, pricing etc. to give all information needed for replicating any detail of the system, thus enabling anyone to achieve similar benchmark results.
Full Disclosure Report Contents
Todo: Define an XML format for the Full Disclosure Report
Todo: Implement a nice tool which
generates HTML reports including nice graphics from XML benchmark
results in order to motivate people to use the reporting format.
This section will provide a qualification dataset and tests that that SUT has to pass before running the actual benchmark.
Todo: Implement a test driver for validating the SUT.
There is a Java (at least JVM 1.5 needed) implementation of a data generator and a test driver for the BSBM benchmark.
The source code of the data generator and the test driver can be downloaded here.
The code is licensed under the terms of the GNU General Public License.
The BSBM data generator can be used to create benchmark datasets of different sizes. Data generation is deterministic.
The data generator supports the following output formats:
Format | Option |
N-Triples | -s nt |
XML | -s xml |
(My-)SQL dump | -s sql |
Next on the todo list: Implement TriG output format for benchmarking Named Graph stores.
Configuration options:
Option | Description |
---|---|
-s <output format> | For the dataset there are several output formats supported. See upper table for details. Default: nt |
-pc <number of products> |
Scale factor: The dataset is scaled via the number of products. For example: 91 products make about 50K triples. Default: 100
|
-fc |
The
data generator by default adds rdf:type statements for all types of a
product to the dataset. If the SUT supports RDFS reasoning, the option
-fc can be used to exclude these statements and leave generating them
to the inference engine of the store. Default: disabled
|
-dir |
The output directory for all the data the Test Driver uses for its runs. Default: "td_data"
|
-fn |
The file name for the generated dataset (suffix is added according to the output format). Default: "dataset"
|
The following example command creates a N-Triples benchmark dataset with the scale factor 1000:
$ java -cp bin:lib/ssj.jar benchmark.generator.Generator -pc 1000 -s nt
The test driver works against a SPARQL endpoint over the SPARQL protocol.
Configuration options:
Option | Description |
---|---|
-runs <number of runs> |
The number of query mix runs. Default: 50
|
-idir <directory> |
The input parameter directory which was created by the Data Generator. Default: "td_data"
|
-w <number of warm up runs> |
Number of runs executed before the actual test to warm up the store. Default: 10
|
-o <result XML file> |
The output file containing the aggregated result overview. Default: "benchmark_result.xml"
|
-dg <default graph URI> |
Specify a default graph for the queries. Default: null
|
In addition to these options a SPARQL-endpoint must be given.
A detailed run log is generated for log level 'ALL' containing information about every executed query.
The following example command runs 50 query mixes against a SUT which provides a SPARQL-endpoint at http://localhost/sparql:
$ java -cp bin:lib/* benchmark.testdriver.TestDriver http://localhost/sparql
Or, if your java version does not support the asterisk in the classpath definition, you can write:
java -cp bin:lib/ssj.jar:lib/log4j-1.2.15.jar benchmark.testdriver.TestDriver http://localhost/sparql
For more information about RDF and SPARQL Benchmarks please refer to:
Lots of thanks to