This document defines performance metrics, benchmark rules, how
to report results and gives an overview of the data generator and test driver for the Berlin
SPARQL Benchmark (BSBM).
The three fundamental performance metrics of the BSBM are:
1.1 Metrics for Single Queries
Average Query Execution Time (aQET): Average time for executing an individual query of type x multiple times with different parameters against the SUT.
Queries per Second (QpS): Average amount of queries of type x that were executed per second.
Min/Max Query Execution Time (minQET, maxQET): A lower and upper bound execution time for queries of type x.
1.2 Metrics for Query Mixes
Queries Mixes per Hour (QMpH): Number of query mixes with different parameters that are executed per hour against the SUT.
Overall Runtime (oaRT): Overall time it took the test driver to execute a certain amount of query mixes against the SUT.
Composite Query Execution Time (cQET): Average time for executing the query mix multiple times with different parameters.
Average Query Execution Time over all Queries (aQEToA): Overall time to run 50 query mixes devided by the number of queries (25*50=1250).
1.3 Price/Performance Metric for the Complete System under Test (SUT)
The Price/Performance Metric defined as $ / QMpH.
Where $ is the total system cost over 5 years in the specified currency. The total system cost over 5 years is calculated according to the TPC Pricing Specification. If compute on demand infrastructure is used, the costing will be $/QMpH/day.
When running the BSBM benchmark and reporting BSBM benchmark results, you should obey to the following rules:
This section defines formats for reporting benchmark results.
Benchmark results are named according to the scenario, the
scale factor of the dataset and the number of concurrent clients.
For example:
NTP(1000,5) means
23.7 QPS(2)-NTS(10000,1)
means that on average 23.7 queries of type 2 were executed per second by a single client stream against a Native Triple Store containing data about 10,000 products.
To guarantee an intelligible interpretation of benchmark reports/results as well as to allow for efficient and even automated handling/comparisons, all necessary information shall be represented in XML. Furthermore we opt for full disclosure policy of the SUT, configuration, pricing etc. to give all information needed for replicating any detail of the system, thus enabling anyone to achieve similar benchmark results.
Full Disclosure Report Contents
Todo: Define an XML format for the Full
Disclosure Report
Todo: Implement a nice tool which generates HTML reports including nice
graphics from XML benchmark
results in order to motivate people to use the reporting format.
TODO: add new options when finished with the new implementation
There is a Java (at least JVM 1.5 needed) implementation of a
data generator and a test driver for the BSBM benchmark.
The source code of the data generator and the test driver can
be downloaded from Sourceforge
BSBM tools.
The code is licensed under the terms of the GNU General Public
License.
The BSBM data generator can be used to create benchmark datasets of different sizes. Data generation is deterministic.
The data generator supports the following output formats:
Format | Option |
N-Triples | -s nt |
Turtle | -s ttl |
XML | -s xml |
(My-)SQL dump | -s sql |
Next on the todo list: Implement TriG output format for benchmarking Named Graph stores.
Configuration options:
Option | Description |
---|---|
-s <output format> | For the dataset there are several output formats supported. See upper table for details. Default: nt |
-pc <number of products> |
Scale factor: The dataset is
scaled via the number of products. For example: 91 products make about
50K triples. Default: 100
|
-fc |
The
data generator by default adds one rdf:type statement for the most
specific type of a
product to the dataset. However, this only works for SUTs that support
RDFs reasoning and can inference the remaining relations. If the SUT
doesn't support RDFS reasoning, the option
-fc can be used to include the statements for the more general classes
also. Default: disabled
|
-dir |
The output directory for all the
data the Test Driver uses for its runs. Default: "td_data"
|
-fn |
The file name for the generated
dataset (suffix is added according to the output format). Default:
"dataset"
|
The following example command creates a Turtle benchmark dataset with the scale factor 1000 and forward chaining enabled:
$ java -cp bin:lib/ssj.jar benchmark.generator.Generator -fc -pc 1000 -s ttl
The test driver works against a SPARQL endpoint over the SPARQL protocol.
Configuration options:
Option | Description |
---|---|
-runs <number of runs> |
The number of query mix runs.
Default: 50
|
-idir <directory> |
The input parameter directory
which was created by the Data Generator. Default: "td_data"
|
-w <number of warm up runs> |
Number of runs executed before the
actual test to warm up the store. Default: 10
|
-o <result XML file> |
The output file containing the
aggregated result overview. Default: "benchmark_result.xml"
|
-dg <default graph URI> |
Specify a default
graph for the queries. Default: null
|
-mt <number of clients> |
Benchmark with multiple concurrent
clients.
|
-seed <Long value> |
Set the seed for the random number
generator used for the parameter generation.
|
-t <Timeout in ms> |
If for a specific query the
complete result is not read after the specified timeout, the client
disconnects and reports a timeout to the Test Driver. This is also the
maximum runtime a query can contribute to the metrics.
|
-q | Turn on qualification mode. For more information, see the qualification chapter of the use case. |
-qf <qualification file name> | Change the qualification file name, also see the qualification chapter of the use case. |
In addition to these options a SPARQL-endpoint must be given.
A detailed run log is generated for log level 'ALL' containing information about every executed query.
The following example command runs 128 query mixes (plus 32 for warm-up) against a SUT which provides a SPARQL-endpoint at http://localhost/sparql:
$ java -cp bin:lib/* benchmark.testdriver.TestDriver http://localhost/sparqlThe following example runs 1024 query mixes plus 128 warm up mixes with 4 clients against a SUT which provides a SPARQL-endpoint. The timeout per query is set to 30s.
Or, if your java version does not support the asterisk in the classpath definition, you can write:
java -cp bin:lib/ssj.jar:lib/log4j-1.2.15.jar benchmark.testdriver.TestDriver http://localhost/sparql
$ java -cp bin:lib/ssj.jar:lib/log4j-1.2.15.jar benchmark.testdriver.TestDriver -runs 1024 -w 128 -mt 4 -t 30000 http://localhost/sparql
For more information about RDF and SPARQL Benchmarks please refer to:
The work on the BSBM Benchmark Version 3 is funded through the LOD2 project.