This document defines performance metrics, benchmark rules, how
to report results and gives an overview of the data generator and test
driver for the Berlin
SPARQL Benchmark (BSBM).
The three fundamental performance metrics of the BSBM are:
1.1 Metrics for Single Queries
Average Query Execution Time (aQET): Average time for executing an individual query of type x multiple times with different parameters against the SUT.
Queries per Second (QpS): Average amount of queries of type x that were executed per second.
Min/Max Query Execution Time (minQET, maxQET): A lower and upper bound execution time for queries of type x.
1.2 Metrics for Query Mixes
Queries Mixes per Hour (QMpH): Number of query mixes with different parameters that are executed per hour against the SUT.
Overall Runtime (oaRT): Overall time it took the test driver to execute a certain amount of query mixes against the SUT.
Composite Query Execution Time (cQET): Average time for executing the query mix multiple times with different parameters.
Average Query Execution Time over all Queries (aQEToA): Overall time to run 50 query mixes devided by the number of queries (25*50=1250).
1.3 Price/Performance Metric for the Complete System under Test (SUT)
The Price/Performance Metric defined as $ / QMpH.
Where $ is the total system cost over 5 years in the specified currency. The total system cost over 5 years is calculated according to the TPC Pricing Specification. If compute on demand infrastructure is used, the costing will be $/QMpH/day.
When running the BSBM benchmark and reporting BSBM benchmark results, you should obey to the following rules:
This section defines formats for reporting benchmark results.
Benchmark results are named according to the scenario, the
scale factor of the dataset and the number of concurrent clients.
For example:
NTP(1000,5) means
23.7 QPS(2)-NTS(10000,1)
means that on average 23.7 queries of type 2 were executed per second by a single client stream against a Native Triple Store containing data about 10,000 products.
To guarantee an intelligible interpretation of benchmark reports/results as well as to allow for efficient and even automated handling/comparisons, all necessary information shall be represented in XML. Furthermore we opt for full disclosure policy of the SUT, configuration, pricing etc. to give all information needed for replicating any detail of the system, thus enabling anyone to achieve similar benchmark results.
Full Disclosure Report Contents
Todo: Define an XML format for the Full
Disclosure Report
Todo: Implement a nice tool which generates HTML reports including nice
graphics from XML benchmark
results in order to motivate people to use the reporting format.
There is a Java (at least JVM 1.5 needed) implementation of a data generator and a test driver for the BSBM benchmark.
The source code of the data generator and the test driver can
be downloaded from Sourceforge
BSBM tools.
The code is licensed under the terms of the GNU General Public
License.
The BSBM data generator can be used to create benchmark datasets of different sizes. Data generation is deterministic.
The data generator supports the following output formats:
Format | Option |
N-Triples | -s nt |
Turtle | -s ttl |
XML | -s xml |
TriG |
-s trig |
(My-)SQL dump | -s sql |
Configuration options:
Option | Description |
---|---|
-s <output format> | For the dataset there are several output formats supported. See upper table for details. Default: nt |
-pc <number of products> |
Scale factor: The dataset is
scaled via the number of products. For example: 91 products make about
50K triples. Default: 100
|
-fc |
The
data generator by default adds one rdf:type statement for the most
specific type of a
product to the dataset. However, this only works for SUTs that support
RDFs reasoning and can inference the remaining relations. If the SUT
doesn't support RDFS reasoning, the option
-fc can be used to include the statements for the more general classes
also. Default: disabled
|
-dir |
The output directory for all the
data the Test Driver uses for its runs. Default: "td_data"
|
-fn |
The file name for the generated
dataset (suffix is added according to the output format). Default:
"dataset"
|
-nof <number of files> |
The number of output files. This
option splits the generated dataset into several files. |
-ud |
Enables generation of update
dataset for update transactions. The dataset file name is
'dataset_update.nt' and is in N-TRIPLES format with special comments to
separate update transactions. |
-tc <number of update
transactions> |
Specifies for how many update
transactions, update data has to be written. Default: 1000 |
-ppt <nr of products per
transaction> |
This specifies how many products
with their corresponding data (offers, reviews) will be generated per
update transaction. Default: 1 Note: the product count has to be at least as high as the math product of the numbers defined with the -tc and -ppt options. |
The following example command creates a Turtle benchmark dataset with the scale factor 1000 and forward chaining enabled:
$ java -cp lib/* benchmark.generator.Generator -fc -pc 1000 -s ttl
The following example command N-TRIPLES benchmark dataset with the scale factor 2000, forward chaining enabled and an update dataset with the default values. The dataset file name will be "scale2000.nt" and the update dataset will be "dataset_update.nt":
$ ./generate -fc -pc 2000 -ud -fn scale2000
The test driver works against a SPARQL endpoint over the SPARQL protocol.
Configuration options:
Option | Description |
---|---|
-runs <number of runs> |
The number of query mix runs.
Default: 50
|
-idir <directory> |
The input parameter directory
which was created by the Data Generator. Default: "td_data"
|
-ucf
<path to use case file> |
Specifies the use case, which in
turn defines the combination of queries from one or more query mixes.
Different use cases are found under the "usecase" directory. |
-w <number of warm up runs> | Number of runs executed before the actual test to warm up the store. Default: 10 |
-o <result XML file> | The output file containing the aggregated result overview. Default: "benchmark_result.xml" |
-dg <default graph URI> | Specify a default graph for the queries. Default: null |
-mt <number of clients> | Benchmark with multiple concurrent clients. |
-sql |
Use JDBC connection to a RDBMS. Instead of a SPARQL-Endpoint, the test driver needs a JDBC URL as argument. default: not set |
-dbdriver <DB-Driver Class Name> |
default: com.mysql.jdbc.Driver |
-seed <Long value> | Set the seed for the random number generator used for the parameter generation. |
-t <Timeout in ms> | If for a specific query the complete result is not read after the specified timeout, the client disconnects and reports a timeout to the Test Driver. This is also the maximum runtime a query can contribute to the metrics. |
-q | Turn on qualification mode. For more information, see the qualification chapter of the use case. |
-qf <qualification file name> | Change the qualification file name, also see the qualification chapter of the use case. |
-rampup |
Run test driver in
ramp-up/warm-up mode. The test driver will execute randomized queries
until it is stopped - ideally when the store reached steady state and
is not improving any more. |
-u <Service endpoint URI for
SPARQL Update> |
If you are running update
queries in your tests this option defines where the SPARQL update
service endpoint can be found. |
-udataset <file name> |
The file name of the update
dataset. |
-uqp <update query parameter> |
The forms parameter name for the SPARQL Update query string. Default: update |
In addition to these options a SPARQL-endpoint must be given.
A detailed run log is generated for log level 'ALL' containing information about every executed query if the -q option is enabled.
The following example command runs 128 query mixes (plus 32 for warm-up) against a SUT which provides a SPARQL-endpoint at http://localhost/sparql:
$ java -cp bin:lib/* benchmark.testdriver.TestDriver http://localhost/sparql
Or, if your java version does not support the asterisk in the classpath definition, you can write:
$ java -cp bin:lib/ssj.jar:lib/log4j-1.2.15.jar benchmark.testdriver.TestDriver http://localhost/sparql
Under Windows replace the colons in the class path by semicolons.
The following example runs 1024 query mixes plus 128 warm up mixes with 4 clients against a SUT which provides a SPARQL-endpoint. The timeout per query is set to 30s.
$ java -cp bin:lib/ssj.jar:lib/log4j-1.2.15.jar benchmark.testdriver.TestDriver -runs 1024 -w 128
-mt 4 -t 30000 http://localhost/sparql
The following example runs the 'explore and update' use case against
the SUT. The update dataset is "dataset_update.nt" and the service
endpoint for SPARQL Update is http://localhost/update.
$ ./testdriver -ucf usecases/explore/sparql.txt -u http://localhost/update
-udataset dataset_update.nt http://localhost/sparql
For more information about RDF and SPARQL Benchmarks please refer to:
The work on the BSBM Benchmark Version 3 is funded through the LOD2 project.