This document gives an overview on the Berlin SPARQL Benchmark (BSBM) for measuring the performance of storage systems that expose SPARQL endpoints. The benchmark suite is built around an e-commerce use case, where a set of products is offered by different vendors and different consumers have posted reviews about products. The BSBM benchmark suite defines three use case motivated query mixes which focus on different aspects of the SPARQL query language.
The SPARQL Query Language for RDF, SPARQL Update and the SPARQL Protocol for RDF are implemented by a growing number of storage systems and are used within enterprise and open web settings. As SPARQL is taken up by the community there is a growing need for benchmarks to compare the performance of storage systems that expose SPARQL endpoints via the SPARQL protocol. Such systems include native RDF stores, Named Graph stores, systems that map relational databases into RDF, and SPARQL wrappers around other kinds of data sources.
The Berlin SPARQL Benchmark (BSBM) defines a suite of
benchmarks for comparing the performance of these systems across
architectures. The benchmark is built around an e-commerce use case in
which a set of products is offered by different vendors and consumers
have posted reviews about products.
The Berlin SPARQL Benchmark (BSBM) consists of:
The Berlin SPARQL Benchmark was designed along three goals: First, the benchmark should allow the comparison of different storage systems that expose SPARQL endpoints across architectures. Testing storage systems with realistic workloads of use case motivated queries is a well established benchmarking technique in the database field and is for instance implemented by the TPC benchmarks. The Berlin SPARQL Benchmark should apply this technique to systems that expose SPARQL endpoints. As an increasing number of Semantic Web applications do not rely on heavyweight reasoning but focus on the integration and visualization of large amounts of data from autonomous data sources on the Web, the Berlin SPARQL Benchmark should not be designed to require complex reasoning but to measure the performance of queries against large amounts of RDF data.
This document defines Version 3 of the BSBM Benchmark. Compared to BSBM Version 2, which has been released in September 2008, Version 3 is split into three use case scenarios:
The rest of this document is structured as follows: Section 2 defines the schema of
benchmark dataset and describes the rules that are used by the data
generator for populating the dataset according to the chosen scale
factor. Section 3
defines
the different use case scenarios that can be tested individually (or in
combination). Each use case is defined in its own document. Section 4 describes how to do proper test runs and reporting of benchmark results.
All three scenarios use the same Benchmark Dataset .The bataset is built around an e-commerce use case, where a set of products is offered by different vendors and different consumers have posted reviews about products. The content and the production rules for the dataset are described in the BSBM Dataset Specification.
This section defines a suite of benchmark use cases, each with a
different focus and thus different query mix. Right now we devised
three different test cases:
All the information around the BSBM benchmark like reporting results and doing test runs is described in it's own document. Here we give a short overview.
Benchmark metrics are the units used to represent benchmark results
of
a test run against a System Under Test (SUT). These metrics are defined
in the Performance Metrics
section.
The rules to carry out benchmark test runs and reporting the results
are described here.
In order to compare the benchmark results generated by different
parties the results have to be published with accompanying information
about the test run. How to do proper reporting is decribed in the reporting section.
The section about the data generator and the test driver describes how to pick the right options for dataset generation and how to run the test driver.
For more information about RDF and SPARQL Benchmarks please refer to: