This document defines the Business Intelligence use case of the Berlin SPARQL Benchmark (BSBM) - Version 3 for measuring the performance of storage systems that expose SPARQL endpoints. The benchmark is built around an e-commerce use case, where a set of products is offered by different vendors and different consumers have posted reviews about products. This benchmark query mix is composed of queries that represent analytical questions by different stakeholders like vendors, customers or the owners of the e-commerce portal.
The SPARQL Query Language for RDF and the SPARQL Protocol for RDF are implemented by a growing number of storage systems and are used within enterprise and open web settings. As SPARQL is taken up by the community there is a growing need for benchmarks to compare the performance of storage systems that expose SPARQL endpoints via the SPARQL protocol. Such systems include native RDF stores, Named Graph stores, systems that map relational databases into RDF, and SPARQL wrappers around other kinds of data sources.
The Berlin SPARQL Benchmark (BSBM) defines a suite of benchmarks for comparing the performance of these systems across architectures. The benchmark is built around an e-commerce use case in which a set of products is offered by different vendors and consumers have posted reviews about products. In this variation of the BSBM we focus on the SPARQL 1.1 Working Draft whose features are already implemented in various RDF stores. That is we consider grouping, aggregates and sub-queries.
The rest of this document is structured as follows: Section 2 defines the schema of benchmark dataset and describes the rules that are used by the data generator for populating the dataset according to the chosen scale factor. Section 3 defines the benchmark queries. Sections 4 defines how a system under test is verified against the qualification dataset.All three scenarios use the same Benchmark Dataset .The bataset is built around an e-commerce use case, where a set of products is offered by different vendors and different consumers have posted reviews about products. The content and the production rules for the dataset are described in the BSBM Dataset Specification.
This section defines a suite of benchmark queries and a query mix.
The benchmark queries are designed to emulate independent analytical queries over the dataset. These are:
The complete query mix consists of 15 queries that simulate
analytical questions posed by different
stakeholder like vendors, producers, the portal owner and customers.
The query sequence is given below:
Each query is defined by the following components:
Use Case Motivation: A vendor wants to find out
which product categories get the most attention by people from a
certain country
SPARQL Query:
prefix bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
prefix rev: <http://purl.org/stuff/rev#>
Select ?productType ?reviewCount
{
{ Select ?productType (count(?review) As ?reviewCount)
{
?productType a bsbm:ProductType .
?product a ?productType .
?product bsbm:producer ?producer .
?producer bsbm:country %Country1% .
?review bsbm:reviewFor ?product .
?review rev:reviewer ?reviewer .
?reviewer bsbm:country %Country2% .
}
Group By ?productType
}
}
Order By desc(?reviewCount) ?productType
Limit 10
Parameters:
Parameter | Description |
---|---|
%Country1% | A randomly selected Country URI. |
%Country2% | A randomly selected Country URI. |
Use Case Motivation: A consumer wants to list
similar products to the product they are viewing right now.
SPARQL Query:
prefix bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
SELECT ?otherProduct ?sameFeatures
{
?otherProduct a bsbm:Product .
FILTER(?otherProduct != %Product%)
{
SELECT ?otherProduct (count(?otherFeature) As ?sameFeatures)
{
%Product% bsbm:productFeature ?feature .
?otherProduct bsbm:productFeature ?otherFeature .
FILTER(?feature=?otherFeature)
}
Group By ?otherProduct
}
}
Order By desc(?sameFeatures) ?otherProduct
Limit 10
Parameters:
Parameter | Description |
---|---|
%Product% | A randomly selected Product URI. |
Use Case Motivation: Some stakeholder wants to get
a list of products with the highest increase in popularity in a certain
period for further investigation.
SPARQL Query:
prefix bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
prefix bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
prefix rev: <http://purl.org/stuff/rev#>
prefix dc: <http://purl.org/dc/elements/1.1/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
Select ?product (xsd:float(?monthCount)/?monthBeforeCount As ?ratio)
{
{ Select ?product (count(?review) As ?monthCount)
{
?review bsbm:reviewFor ?product .
?review dc:date ?date .
Filter(?date >= "%ConsecutiveMonth_1%"^^<http://www.w3.org/2001/XMLSchema#date> && ?date < "%ConsecutiveMonth_2%"^^<http://www.w3.org/2001/XMLSchema#date>)
}
Group By ?product
} {
Select ?product (count(?review) As ?monthBeforeCount)
{
?review bsbm:reviewFor ?product .
?review dc:date ?date .
Filter(?date >= "%ConsecutiveMonth_0%"^^<http://www.w3.org/2001/XMLSchema#date> && ?date < "%ConsecutiveMonth_1%"^^<http://www.w3.org/2001/XMLSchema#date>) #
}
Group By ?product
Having (count(?review)>0)
}
}
Order By desc(xsd:float(?monthCount) / ?monthBeforeCount) ?product
Limit 10
Parameters:
Parameter | Description |
---|---|
%ConsecutiveMonth_X% | The date of the first day of a randomly selected month (Index 0). With index X the date will be set X months after the picked dat |
Use Case Motivation: A customer wants to inform herself which features have the most impact on the product price to get hints on how to restrict the following product search.
SPARQL Query:
prefix bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
prefix bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
Select ?feature (?withFeaturePrice/?withoutFeaturePrice As ?priceRatio)
{
{ Select ?feature (avg(xsd:float(xsd:string(?price))) As ?withFeaturePrice)
{
?product a %ProductType% ;
bsbm:productFeature ?feature .
?offer bsbm:product ?product ;
bsbm:price ?price .
}
Group By ?feature
}
{ Select ?feature (avg(xsd:float(xsd:string(?price))) As ?withoutFeaturePrice)
{
{ Select distinct ?feature {
?p a %ProductType% ;
bsbm:productFeature ?feature .
} }
?product a %ProductType% .
?offer bsbm:product ?product ;
bsbm:price ?price .
FILTER NOT EXISTS { ?product bsbm:productFeature ?feature }
}
Group By ?feature
}
}
Order By desc(?withFeaturePrice/?withoutFeaturePrice) ?feature
Limit 10
Parameters:
Parameter | Description |
---|---|
%ProductType% | A randomly selected Class
URI from the class hierarchy (except root category). |
Use Case Motivation: For advertisement reasons the
owners of the e-commerce platform want to generate profiles for the two
dimensions product type and the country of a customer.
SPARQL Query:
prefix bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
prefix bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
prefix rev: <http://purl.org/stuff/rev#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
Select ?country ?product ?nrOfReviews ?avgPrice
{
{ Select ?country (max(?nrOfReviews) As ?maxReviews)
{
{ Select ?country ?product (count(?review) As ?nrOfReviews)
{
?product a %ProductType% .
?review bsbm:reviewFor ?product ;
rev:reviewer ?reviewer .
?reviewer bsbm:country ?country .
}
Group By ?country ?product
}
}
Group By ?country
}
{ Select ?country ?product (avg(xsd:float(xsd:string(?price))) As ?avgPrice)
{
?product a %ProductType% .
?offer bsbm:product ?product .
?offer bsbm:price ?price .
}
Group By ?country ?product
}
{ Select ?country ?product (count(?review) As ?nrOfReviews)
{
?product a %ProductType% .
?review bsbm:reviewFor ?product .
?review rev:reviewer ?reviewer .
?reviewer bsbm:country ?country .
}
Group By ?country ?product
}
FILTER(?nrOfReviews=?maxReviews)
}
Order By desc(?nrOfReviews) ?country ?product
Parameters:
Parameter | Description |
---|---|
%ProductType% | A randomly selected Class URI from the class hierarchy (except root category). |
Use Case Motivation: The stakeholders representing
the
e-commerce platform want to find potential spam reviewer, who rate
products by a specific producer much higher than the average.
SPARQL Query:
prefix bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
prefix bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
prefix rev: <http://purl.org/stuff/rev#>
Select ?reviewer (avg(xsd:float(?score)) As ?reviewerAvgScore)
{
{ Select (avg(xsd:float(?score)) As ?avgScore)
{
?product bsbm:producer %Producer% .
?review bsbm:reviewFor ?product .
{ ?review bsbm:rating1 ?score . } UNION
{ ?review bsbm:rating2 ?score . } UNION
{ ?review bsbm:rating3 ?score . } UNION
{ ?review bsbm:rating4 ?score . }
}
}
?product bsbm:producer %Producer% .
?review bsbm:reviewFor ?product .
?review rev:reviewer ?reviewer .
{ ?review bsbm:rating1 ?score . } UNION
{ ?review bsbm:rating2 ?score . } UNION
{ ?review bsbm:rating3 ?score . } UNION
{ ?review bsbm:rating4 ?score . }
}
Group By ?reviewer
Having (avg(xsd:float(?score)) > min(?avgScore) * 1.5)
Parameters:
Parameter | Description |
---|---|
%Producer% | A producer URI (randomly selected) |
Use Case Motivation: A vendor wants information
about potential market niches to offer new products in the vendor's
country of origin.
SPARQL Query:
prefix bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
prefix bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
Select ?product
{
{ Select ?product
{
{ Select ?product (count(?offer) As ?offerCount)
{
?product a %ProductType% .
?offer bsbm:product ?product .
}
Group By ?product
}
}
Order By desc(?offerCount)
Limit 1000
}
FILTER NOT EXISTS
{
?offer bsbm:product ?product .
?offer bsbm:vendor ?vendor .
?vendor bsbm:country ?country .
FILTER(?country=%Country%)
}
}
Parameters:
Parameter | Description |
---|---|
%Country% | A country URI (randomly selected) |
%ProductType% | A random product type (all levels) |
Use Case Motivation: A vendor or customer wants to
find "discounter" vendors for competitor analyses and procurement
respectively.
SPARQL Query:
prefix bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
prefix bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
Select ?vendor (xsd:float(?belowAvg)/?offerCount As ?cheapExpensiveRatio)
{
{ Select ?vendor (count(?offer) As ?belowAvg)
{
{ ?product a %ProductType% .
?offer bsbm:product ?product .
?offer bsbm:vendor ?vendor .
?offer bsbm:price ?price .
{ Select ?product (avg(xsd:float(xsd:string(?price))) As ?avgPrice)
{
?product a %ProductType% .
?offer bsbm:product ?product .
?offer bsbm:vendor ?vendor .
?offer bsbm:price ?price .
}
Group By ?product
}
} .
FILTER (xsd:float(xsd:string(?price)) < ?avgPrice)
}
Group By ?vendor
}
{ Select ?vendor (count(?offer) As ?offerCount)
{
?product a %ProductType% .
?offer bsbm:product ?product .
?offer bsbm:vendor ?vendor .
}
Group By ?vendor
}
}
Order by desc(xsd:float(?belowAvg)/?offerCount) ?vendor
limit 10
Parameters:
Parameter | Description |
---|---|
%ProductType% | A randomly selected Class URI from the class hierarchy (all levels). |
TODO: Adapt to Use Case
Before the performance of a SUT is measured, it has to be verified
that the SUT returns correct results for the benchmark queries.
For testing whether a SUT returns correct results, the BSBM
benchmark provides a qualification dataset and a qualification tool
which compares the query results of a SUT with the correct query
results. At the moment, the qualification tool verifies only the
results of SELECT queries. The results of DESCRIBE and CONSTRICT
queries (queries 9 and 12) are not checked.
A BSBM qualification test is conducted in the two-step procedure
described below:
$ java -cp bin:lib/* benchmark.testdriver.TestDriver -q http://SUT/sparqlThis will create a qualification file named "run.qual" (different file name can be specified with the "-qf" parameter) which is used in step 2. Also the run.log (if logging is set to "ALL" in the log4j.xml file) contains all queries with full result text, so single queries can be examined later on.
where http://SUT/sparql specifies the SPARQL endpoint
Option | Description |
-rc | Only check the amount of results returned and not the result content. |
-ql <qualification log file name> | Specify the file name to write the qualification test results into. |
$ java -cp bin:lib/* benchmark.qualification.Qualification correct.qual run.qualThis generates by default a log file called "qual.log" with the following content:
where run.qual is the qualification file generated by the Test Driver in qualification mode
For more information about RDF and SPARQL Benchmarks please refer to:
The work on the BSBM Benchmark Version 3 is funded through the LOD2 project.