This document defines the Update use case for the Berlin SPARQL Benchmark (BSBM) for measuring the performance of storage systems that expose SPARQL endpoints. The benchmark is built around an e-commerce use case, where a set of products is offered by different vendors and different consumers have posted reviews about products. The query mix of the Update use case simulates the update activity on the dataset by adding or deleting data of products, reviews and offers.
The SPARQL Query Language for RDF and the SPARQL Protocol for RDF are implemented by a growing number of storage systems and are used within enterprise and open web settings. As SPARQL is taken up by the community there is a growing need for benchmarks to compare the performance of storage systems that expose SPARQL endpoints via the SPARQL protocol. Such systems include native RDF stores, Named Graph stores, systems that map relational databases into RDF, and SPARQL wrappers around other kinds of data sources.
The Berlin SPARQL Benchmark (BSBM) defines a suite of
benchmarks for comparing the performance of these systems across
architectures. The benchmark is built around an e-commerce use case in
which a set of products is offered by different vendors and consumers
have posted reviews about products. This benchmark query mix represents
update activity on the dataset. All queries conform to the SPARQL
1.1 Update draft.
The Berlin SPARQL Benchmark was designed along three goals: First, the benchmark should allow the comparison of different storage systems that expose SPARQL endpoints across architectures. Testing storage systems with realistic workloads of use case motivated queries is a well established benchmarking technique in the database field and is for instance implemented by the TPC benchmarks. The Berlin SPARQL Benchmark should apply this technique to systems that expose SPARQL endpoints. As an increasing number of Semantic Web applications do not rely on heavyweight reasoning but focus on the integration and visualization of large amounts of data from autonomous data sources on the Web, the Berlin SPARQL Benchmark should not be designed to require complex reasoning but to measure the performance of queries against large amounts of RDF data.
The rest of this document is structured as follows: Section 2 defines the schema of benchmark dataset and describes the rules that are used by the data generator for populating the dataset according to the chosen scale factor. Section 3 defines the benchmark queries. Sections 4 defines how a system under test is verified against the qualification dataset.
All three scenarios use the same Benchmark Dataset .The bataset is built around an e-commerce use case, where a set of products is offered by different vendors and different consumers have posted reviews about products. The content and the production rules for the dataset are described in the BSBM Dataset Specification
This section defines a suite of benchmark queries and a
query mix. The query mix is not meant to be run on its own. Instead
it should be combined with the query mix of the Explore Use Case to measure the impact of updates on the performance.
The benchmark queries are designed to emulate the update behaviour of the e-commerce portal operator. An update operation is one of the following:
The complete query mix consists of 5 queries that simulate a the update behaviour of the e-commerce portal. The query sequence is given below:
Use Case Motivation: New, so far unknown product
data with related reviews and offers is inserted into the dataset.
SPARQL Query:
INSERT DATA {
%updateData% # product data with associated reviews and offers
# A product has 10 reviews on average
# A product has 20 offers on average
# Altogether one product with all its reviews and offers consists of about 300 triples
}
Use Case Motivation: Outdated or erroneous offers
are deleted, so they won't show up on the customer side.
SPARQL Query
DELETE WHERE
{ %Offer% ?p ?o }
Parameters:
Parameter | Description |
---|---|
%Offer% | An offer URI (randomly selected) |
The queries for the Named Graphs data model have the same semantics as the queries for the triple data model. The queries do not specify the IRIs of the named graphs in the RDF Dataset using the FROM NAMED clause, but assume that the query is executed against the complete RDF Dataset.
This is still work in progress ...
Todo: Rewrite all queries for Named Graphs. Two examples are already
found below:
This section will contain a SQL representation of the
benchmark
queries in order to be able to compare the performance of stores that
expose SPARQL endpoints to the performance of classic SQL-based
RDBMS.
TODO: Write equivalent SQL queries when
SPARQL queries are confirmed
Use Case Motivation: A consumer is looking for a product and has a general idea about what he wants.
SQL Query:
Parameters:
Parameter | Description |
---|---|
|
Use Case Motivation: The
SQL Query
Parameters:
Parameter | Description |
---|---|
For more information about RDF and SPARQL Benchmarks please refer to:
The work on the BSBM Benchmark Version 3 is funded through the LOD2 project