Chris Bizer

Andreas Schultz

Contents

  1. Intro
  2. Benchmark Datasets
  3. Benchmark Machine
  4. Benchmark Results
    1. Jena TDB
    2. BigOWLIM
    3. Virtuoso - Triple Store
  5. Store Comparison
  6. Thanks


Document Version: 1.0
Publication Date: 11/30/2009


 

1. Introduction

The Berlin SPARQL Benchmark (BSBM) is a benchmark for comparing the performance of storage systems that expose SPARQL endpoints. Such systems include native RDF stores, Named Graph stores, systems that map relational databases into RDF, and SPARQL wrappers around other kinds of data sources. The benchmark is built around an e-commerce use case, where a set of products is offered by different vendors and consumers have posted reviews about products.

This document presents the results of a November 2009 BSBM experiment in which the Berlin SPARQL Benchmark was used to measure the performance of

The stores were benchmarked with datasets of 100 millions and 200 millions triples.

This November 2009 BSBM experiment was run in addition to the March 2009 BSBM experiment which compared triple stores, relational database-to-RDF wrappers and SQL database management systems. The results of the March 2009 experiment are found here. Within in the November 2009 experiment BigOWLIM was tested for the first time, TDB was measured again, because of significant speed ups since March 2009. Virtuoso was included for comparison reasons, being the fastest triple store in the previous BSBM experiment.

 

 

2. Benchmark Datasets

We ran the benchmark using the Triple version of the BSMB dataset (benchmark scenario NTR). The benchmark was run for different dataset sizes. The datasets were generated using the BSBM data generator and fulfill the characteristics described in section the BSBM specification.

Details about the benchmark datasets are summarized in the following table:

Number of Triples

100M 200M
Number of Products 284,826 570,000
Number of Producers 5,618 11,240
Number of Product Features 47,884 94,259
Number of Product Types 2011 3,949
Number of Vendors 2,854 5,710
Number of Offers 5,696,520 11,400,000
Number of Reviewers 146,054 292,271
Number of Reviews 2,848,260 5,700,000
Total Number of Instances
9,034,027
18,077,429
Exact Total Number of Triples 100,000,112 200,031,413
File Size Turtle (unzipped) 8.5 GB 18 GB

Note: All datasets were generated with the -fc option for forward chaining.

There is a RDF triple and a relational representation of the benchmark datasets. Both representations can be downloaded below:

Download Turtle Representation of the Benchmark Datasets

Important: Test Driver data for all datasets:
(If you generate the datasets by yourself the Test Driver data is generated automatically in directory "td_data")

        Download Test Driver data

 


 

3. Benchmark Machine

The benchmarks were run on the same machine as the March 2009 experiments. This machine has the following specification:

 


 

4. Benchmark Results

This section reports the results of running the BSBM benchmark against three RDF stores.

Test Procedure

The load performance of the systems was measured by loading the Turtle representation of the BSBM datasets into the triple stores. The loaded datasets were forward chained and contained all rdf:type statements for product types. Thus the systems under test did not have to do any inferencing.

The query performance of the systems was measured by running 500 BSBM query mixes (altogether 12,500 queries) against the systems over the SPARQL protocol. The test driver and the system under test (SUT) were running on the same machine in order to reduce the influence of network latency. In order to measure sustainable performance of the SUTs, a ramp-up period is executed before the actual test runs.

We applied the following test procedure to each store:

  1. Load data into the store.
  2. Shutdown store, clear OS caches, restart store.
  3. Run ramp-up.
  4. Execute single-client test run (500 mixes performance measurement, randomizer seed: 808080)
  5. Execute multiple-client test runs. ( 4 clients, 500 query mixes, randomizer seed: 863528)
  6. Execute test run with reduced query mix. (repeat steps 2 to 4 with reduced query mix and different randomizer seed 919191)

The different runs use distinct randomizer seeds for choosing query parameters. This ensures that the test driver produces distinctly parameterized queries over all runs and makes it harder for the stores to apply query caching.

An overview of load times for SUTs and the different datasets are given in the following table (in hh:min:sec):

SUT
100M
200M
Jena TDB
01:42:45
06:14:41
BigOWLIM 00:33:47 01:18:18
VirtuosoTS 07:43:39 48:41:11


4.1 TDB over Joseki


Jena TDB homepage

4.1.1 Configuration

The following changes were made to the default configuration of the software:


4.1.2 Load Time

The table below summarizes the load times Turtle files (in hh:mm:ss) :

100M 200M
1:42:45
6:14:41



4.1.3 Benchmark Query results: QpS (Queries per Second)

The table below summarizes the query throughput for each type of query over all 500 runs (in QpS):


  100m 200m
Query 1 69.4 29.2
Query 2 46.7 32.3
Query 3 62.1 30.6
Query 4 44.1 20.0
Query 5 1.2 0.7
Query 6 0.3 0.1
Query 7 5.8 3.0
Query 8 9.9 6.1
Query 9 1.9 1.0
Query 10 10.6 5.9
Query 11 20.0 15.1
Query 12 1.9 1.0

 4.1.4 Benchmark Overall results: QMpH for the 100M and M datasets for all runs

For the 100M and 200M datasets we ran a test with 4 clients. The results are in Query Mixes per Hour (QMpH) meaning that larger numbers are better.


1 4
100M 407 524
200M 210 250

4.1.5 Result Summaries


4.1.6 Run Logs (detailed information)

 

4.2 BigOWLIM 3.1


BigOWLIM homepage

4.2.1 Configuration

The following changes were made to the default configuration of the software:

Store Config File
JAVA_OPTS = ... -Xmx6144m ...

 

4.2.2 Load Time

The table below summarizes the load times of theTurtle files (in hh:mm:ss) :

100M 200M
33:47
1:18:18

 

4.2.3 Benchmark Query results: QpS (Queries per Second)

The table below summarizes the query throughput for each type of query over all 500 runs (in QpS):


  100m 200m
Query 1 42.4 14.2
Query 2 71.0 29.0
Query 3 52.5 17.6
Query 4 31.6 12.2
Query 5 1.8 1.0
Query 6 0.4 0.2
Query 7 6.7 3.7
Query 8 7.8 4.2
Query 9 32.0 11.9
Query 10 19.9 9.2
Query 11 13.6 10.5
Query 12 21.7 15.2

4.2.4 Benchmark Overall results: QMpH for the 100M and 200M datasets for all runs

For the 100M and 200M datasets we ran a test with 4 clients. The results are in Query Mixes per Hour (QMpH) meaning that larger numbers are better.


1 4
100M 835 1486
200M 416 709

 

4.2.5 Result Summaries

4.2.6 Run Logs (detailed information)

 

4.3 Virtuoso Open-Source Edition v5.0.11 (Triple Store)


Virtuoso homepage


4.3.1 Configuration

The following changes were made to the default configuration of the software:

MaxCheckpointRemap = 1000000
NumberOfBuffers = 520000
MaxMemPoolSize = 0
StopCompilerWhenXOverRunTime = 1
None

4.3.2 Load Time

The table below summarizes the load times of theTurtle files (in hh:mm:ss) :

100M 200M
7:43:39
48:41:11

4.3.3 Benchmark Query results: QpS (Queries per Second)

The table below summarizes the query throughput for each type of query over all 500 runs (in QpS):


  100m 200m
Query 1 144.6 104.8
Query 2 42.4 35.2
Query 3 136.7 104.4
Query 4 49.6 40.0
Query 5 6.0 3.4
Query 6 0.5 0.3
Query 7 4.3 2.3
Query 8 11.6 5.8
Query 9 53.6 31.0
Query 10 6.4 3.8
Query 11 37.8 26.9
Query 12 33.2 23.4

4.3.4 Benchmark Overall results: QMpH for the 100M and 200M datasets for all runs

For the 100M and 200M datasets we ran a test with 4 clients. The results are in Query Mixes per Hour (QMpH) meaning that larger numbers are better.


1 4
100M 936 1914
200M 495 914


4.3.5 Result Summaries


4.3.6 Run Logs (detailed information)  

 

5. Store Comparison

This section compares the SPARQL query performance of the different stores.

5.1 Query Mixes per Hour

Running 500 query mixes against the different stores resulted in the following performance numbers (in QMpH). The best performance figure for each dataset size is set bold in the tables.


5.1.1 QMpH: Complete Query Mix

The complete query mix is given here.

  Jena TDB BigOWLIM VirtuosoTS
100m 406.9 834.9 936.4
200m 209.5 416.2 495.9

A much more detailed view of the results for the complete query mix is given under Detailed Results For The Complete-Query-Mix Benchmark Run.

5.1.2 QMpH: Reduced Query Mix

The reduced query mix consists of the same query sequence as the complete mix but without queries 5 and 6. The two queries were excluded as they alone consumed a large portion of the overall query execution time for bigger dataset sizes.

  Jena TDB BigOWLIM VirtuosoTS
100m 968.5 2822.4 1957.1
200m 431.2 1397.9 1122.2

A much more detailed view of the results for the reduced query mix is given under Detailed Results For The Reduced-Query-Mix Benchmark Run.

5.2 Detailed Results For The Complete-Query-Mix Benchmark Run

The details of running the complete query mix are given here. There are two different views:

5.2.1 Queries per Second by Query and Dataset Size

Running 500 query mixes against the different stores lead to the following query throughput for each type of query over all 500 runs (in Queries per Second). The best performance figure for each dataset size is set bold in the tables. For comparison reasons the MySQL and Virtuoso results for the SQL queries are also included in the tables but not considered when determining the best performance figure.


Query 1

  Jena TDB BigOWLIM VirtuosoTS
100m 69.4 42.4 144.6
200m 29.2 14.2 104.8

Query 2

  Jena TDB BigOWLIM VirtuosoTS
100m 46.7 71.0 42.4
200m 32.3 29.0 35.2

Query 3

  Jena TDB BigOWLIM VirtuosoTS
100m 62.1 52.5 136.7
200m 30.6 17.6 104.4

Query 4

  Jena TDB BigOWLIM VirtuosoTS
100m 44.1 31.6 49.6
200m 20.0 12.2 40.0

Query 5

  Jena TDB BigOWLIM VirtuosoTS
100m 1.2 1.8 6.0
200m 0.7 1.0 3.4

Query 6

  Jena TDB BigOWLIM VirtuosoTS
100m 0.3 0.4 0.5
200m 0.1 0.2 0.3

Query 7

  Jena TDB BigOWLIM VirtuosoTS
100m 5.8 6.7 4.3
200m 3.0 3.7 2.3

Query 8

  Jena TDB BigOWLIM VirtuosoTS
100m 9.9 7.8 11.6
200m 6.1 4.2 5.8

Query 9

  Jena TDB BigOWLIM VirtuosoTS
100m 1.9 32.0 53.6
200m 1.0 11.9 31.0

Query 10

  Jena TDB BigOWLIM VirtuosoTS
100m 10.6 19.9 6.4
200m 5.9 9.2 3.8

Query 11

  Jena TDB BigOWLIM VirtuosoTS
100m 20.0 13.6 37.8
200m 15.1 10.5 26.9

Query 12

  Jena TDB BigOWLIM VirtuosoTS
100m 1.9 21.7 33.2
200m 1.0 15.2 23.4

5.2.2 Queries per Second by Dataset Size and Query

Running 500 query mixes against the different stores lead to the following query throughput for each type of query over all 500 runs (in Queries per Second). The best performance figure for each query is set bold in the tables. 

 

100M Triples Dataset

  Jena TDB BigOWLIM VirtuosoTS
Query 1 69.4 42.4 144.6
Query 2 46.7 71.0 42.4
Query 3 62.1 52.5 136.7
Query 4 44.1 31.6 49.6
Query 5 1.2 1.8 6.0
Query 6 0.3 0.4 0.5
Query 7 5.8 6.7 4.3
Query 8 9.9 7.8 11.6
Query 9 1.9 32.0 53.6
Query 10 10.6 19.9 6.4
Query 11 20.0 13.6 37.8
Query 12 1.9 21.7 33.2

200M Triples Dataset

  Jena TDB BigOWLIM VirtuosoTS
Query 1 29.2 14.2 104.8
Query 2 32.3 29.0 35.2
Query 3 30.6 17.6 104.4
Query 4 20.0 12.2 40.0
Query 5 0.7 1.0 3.4
Query 6 0.1 0.2 0.3
Query 7 3.0 3.7 2.3
Query 8 6.1 4.2 5.8
Query 9 1.0 11.9 31.0
Query 10 5.9 9.2 3.8
Query 11 15.1 10.5 26.9
Query 12 1.0 15.2 23.4

 

5.3 Detailed Results For The Reduced-Query-Mix Benchmark Run

The details of running the reduced query mix are given here. There are two different views:

5.3.1 Queries per Second by Query and Dataset Size (reduced query mix)

Running 500 query mixes against the different stores lead to the following query throughput for each type of query over all 500 runs (in Queries per Second). The best performance figure for each dataset size is set bold in the tables. For comparison reasons the MySQL and Virtuoso results for the SQL queries are also included in the tables but not considered when determining the best performance figure.

Query 1

  Jena TDB BigOWLIM VirtuosoTS
100m 21.9 21.2 37.2
200m 8.4 8.1 14.4

Query 2

  Jena TDB BigOWLIM VirtuosoTS
100m 40.9 58.1 34.9
200m 19.7 25.5 23.9

Query 3

  Jena TDB BigOWLIM VirtuosoTS
100m 24.1 30.8 33.5
200m 7.7 10.4 6.7

Query 4

  Jena TDB BigOWLIM VirtuosoTS
100m 12.7 15.8 13.8
200m 5.0 5.6 5.5

Query 5

Not executed.

Query 6

Not executed.

Query 7

  Jena TDB BigOWLIM VirtuosoTS
100m 6.4 7.9 4.4
200m 3.1 4.2 2.6

Query 8

  Jena TDB BigOWLIM VirtuosoTS
100m 11.2 9.1 12.0
200m 6.4 5.5 6.3

Query 9

  Jena TDB BigOWLIM VirtuosoTS
100m 2.1 42.1 55.8
200m 0.9 14.3 34.4

Query 10

  Jena TDB BigOWLIM VirtuosoTS
100m 12.0 21.6 6.0
200m 6.5 11.1 4.1

Query 11

  Jena TDB BigOWLIM VirtuosoTS
100m 20.2 14.4 39.3
200m 12.4 10.5 30.5

Query 12

  Jena TDB BigOWLIM VirtuosoTS
100m 2.0 22.8 31.3
200m 0.9 14.3 20.7

5.3.2 Queries per Second by Dataset Size and Query (reduced query mix)

Running 500 query mixes against the different stores lead to the following query throughput for each type of query over all 500 runs (in Queries per Second). The best performance figure for each query is set bold in the tables.  

100m

  Jena TDB BigOWLIM VirtuosoTS
Query 1 21.9 21.2 37.2
Query 2 40.9 58.1 34.9
Query 3 24.1 30.8 33.5
Query 4 12.7 15.8 13.8
Query 5 not executed not executed not executed
Query 6 not executed not executed not executed
Query 7 6.4 7.9 4.4
Query 8 11.2 9.1 12.0
Query 9 2.1 42.1 55.8
Query 10 12.0 21.6 6.0
Query 11 20.2 14.4 39.3
Query 12 2.0 22.8 31.3

200m

  Jena TDB BigOWLIM VirtuosoTS
Query 1 8.4 8.1 14.4
Query 2 19.7 25.5 23.9
Query 3 7.7 10.4 6.7
Query 4 5.0 5.6 5.5
Query 5 not executed not executed not executed
Query 6 not executed not executed not executed
Query 7 3.1 4.2 2.6
Query 8 6.4 5.5 6.3
Query 9 0.9 14.3 34.4
Query 10 6.5 11.1 4.1
Query 11 12.4 10.5 30.5
Query 12 0.9 14.3 20.7


 


6. Thanks

Lots of thanks to

Please send comments and feedback about the benchmark to Chris Bizer and Andreas Schultz.