Chris Bizer

Andreas Schultz

Contents

  1. Intro
  2. Benchmark Datasets
  3. Benchmark Machine
  4. Benchmark Results for the Explore Use Case
    1. 4store
    2. BigData
    3. BigOwlim
    4. TDB
    5. Virtuoso 7
  5. Benchmark Results for the Explore and Update use case
    1. 4store
    2. BigOwlim
    3. TDB
  6. Store Comparison
  7. Experience with the Business Intelligence query mix
  8. Thanks


Document Version: 1.0
Publication Date: 02/22/2011


 

1. Introduction

The Berlin SPARQL Benchmark (BSBM) is a benchmark for comparing the performance of storage systems that expose SPARQL endpoints. Such systems include native RDF stores, Named Graph stores, systems that map relational databases into RDF, and SPARQL wrappers around other kinds of data sources. The benchmark is built around an e-commerce use case, where a set of products is offered by different vendors and consumers have posted reviews about products.

This document presents the results of a February 2011 BSBM experiment in which the Berlin SPARQL Benchmark Version 3 was used to measure the performance of:

The stores were benchmarked with datasets of 100 millions and 200 millions triples.

 

 

2. Benchmark Datasets

We ran the benchmark using the Triple version of the BSMB dataset. The benchmark was run for different dataset sizes. The datasets were generated using the BSBM data generator and fulfill the characteristics described in section the BSBM specification.

Details about the benchmark datasets are summarized in the following table:

 

Number of Triples
100M 200M
Number of Products 284,826 570,000
Number of Producers 5,618 11,240
Number of Product Features 47,884 94,259
Number of Product Types 2,011 3,949
Number of Vendors 2,896 5,758
Number of Offers 5,696,520 11,400,000
Number of Reviewers 146,093 292,095
Number of Reviews 2,848,260 5,700,000
Total Number of Instances 9,034,108 18,077,301
Exact Total Number of Triples 100,000,748 200,031,975
File Size Turtle (unzipped) 8.7 GB 18 GB

Note: All datasets were generated with the -fc option for forward chaining.

The BSBM dataset generator and test driver can be downloaded from SourceForge.

The RDF representation of the benchmark datasets can be generated in the following way:

To generate the 100m dataset as Turtle file type the following command in the BSBM directory:

./generate -fc -s ttl -fn dataset_100m -pc 284826

Variations:

* generate N-Triples instead of Turtle:

use -s nt instead of -s ttl

* generate update dataset for the Explore and Update use case:

add -ud

* Generate multiple files instead of one, for example 100 files:

add -nof 100

* Write test driver data to a different directory (default is td_data), for example for the 100m dataset:

add -dir td_data_100m
 

 

3. Benchmark Machine

We used a machine with the following specification for the benchmark experiment:

 


 

4. Benchmark Results for the Explore Use Case

This section reports the results of running the Explore use case of the BSBM benchmark against:


Test Procedure

The load performance of the systems was measured by loading the Turtle representation of the BSBM datasets into the triple stores. The loaded datasets were forward chained and contained all rdf:type statements for product types. Thus the systems under test did not have to do any inferencing.

The query performance of the systems was measured by running 500 BSBM query mixes (altogether 12,500 queries for the Explore use case) against the systems over the SPARQL protocol. The test driver and the system under test (SUT) were running on the same machine in order to reduce the influence of network latency. In order to measure sustainable performance of the SUTs, a ramp-up period is executed before the actual test runs.

We applied the following test procedure to each store:

  1. Load data into the store.
  2. Shutdown store, (optional: clear OS caches and swap), restart store.
  3. Run ramp-up (randomizer seed: 1212123, at least 8000 query mixes).
  4. ./testdriver -rampup -seed 1212123 http://sparql-endpoint
  5. Execute single-client test run (500 mixes performance measurement, randomizer seed: 9834533)
  6. ./testdriver -seed 9834533 http://sparql-endpoint
  7. Execute multi-client runs (4, 8 and 64 clients; randomizer seeds: 8188326, 9175932 and 4187411). For each run add two times the number of clients of warm up query mixes.
  8. For example for a run with 4 clients execute:

    ./testdriver -seed 8188326 -w 8 -mt 4 -o results_4clients.xml http://sparql-endpoint

The different runs use distinct randomizer seeds for choosing query parameters. This ensures that the test driver produces distinctly parameterized queries over all runs and makes it harder for the stores to apply query caching.

An overview of load times for SUTs and the different datasets are given in the following table (in hh:min:sec):

SUT
100M
200M
4store
26:42*
1:12:04*
BigData
1:03:47
3:24:25
BigOwlim
17:22
38:36
TDB
1:14:48
2:45:13
Virtuoso
1:49:26**
3:59:38**

* The N-Triples version of the dataset was used.
** The dataset was split into 100 respectively 200 Turtle fiiles and loaded with the DB.DBA.TTLP function consecutively.

4.1 4store


4store homepage

4.1.1 Configuration

The following changes were made to the default configuration of the software:


4.1.2 Load Time

The table below summarizes the load times Turtle files (in hh:mm:ss) :

100M 200M
26:42
1:12:04



4.1.3 Benchmark Query results: QpS (Queries per Second)

The table below summarizes the query throughput for each type of query over all 500 runs (in QpS):


  100m 200m
Query 1 117.6 145.3
Query 2 49.0 55.7
Query 3 102.4 122.9
Query 4 43.4 62.9
Query 5 7.8 5.0
Query 6 not executed not executed
Query 7 41.3 49.3
Query 8 49.1 57.1
Query 9 233.0 117.3
Query 10 49.2 52.8
Query 11 145.3 33.3
Query 12 46.5 40.4

 4.1.4 Benchmark Overall results: QMpH for the 100M and 200M datasets for all runs

For the 100M and 200M datasets we ran a test with 1, 4, 8 and 64 clients. The results are in Query Mixes per Hour (QMpH) meaning that larger numbers are better.


Single
100M 5589
200M 4593

4.1.5 Result Summaries


4.1.6 Run Logs (detailed information)


4.2 BigData


  BigData homepage

4.2.1 Configuration

The following changes were made to the default configuration of the software:


4.2.2 Load Time

The table below summarizes the load times Turtle files (in hh:mm:ss) :

100M 200M
1:03:47
3:24:25



4.2.3 Benchmark Query results: QpS (Queries per Second)

The table below summarizes the query throughput for each type of query over all 500 runs (in QpS):


  100m 200m
Query 1 64.2 64.4
Query 2 33.6 35.3
Query 3 12.4 14.4
Query 4 38.4 40.0
Query 5 2.3 1.7
Query 6 not executed not executed
Query 7 31.3 21.9
Query 8 48.5 14.1
Query 9 54.8 44.6
Query 10 61.6 28.8
Query 11 43.8 26.7
Query 12 54.8 31.1

 4.2.4 Benchmark Overall results: QMpH for the 100M and 200M datasets for all runs

For the 100M and 200M datasets we ran a test with 1, 4, 8 and 64 clients. The results are in Query Mixes per Hour (QMpH) meaning that larger numbers are better.



1 4 8
64
100M 2428 4153 4286 4136
200M 1795 3040 3167 2689


4.2.5 Result Summaries


4.2.6 Run Logs (detailed information)


4.3 BigOwlim


Owlim homepage

4.3.1 Configuration

The following changes were made to the default configuration of the software:


4.3.2 Load Time

The table below summarizes the load times Turtle files (in hh:mm:ss) :

100M 200M
17:22
38:36



4.3.3 Benchmark Query results: QpS (Queries per Second)

The table below summarizes the query throughput for each type of query over all 500 runs (in QpS):


  100m 200m
Query 1 112.5 45.0
Query 2 159.3 111.6
Query 3 125.0 48.1
Query 4 97.9 37.8
Query 5 3.0 1.8
Query 6 not executed not executed
Query 7 32.6 11.0
Query 8 38.0 12.6
Query 9 141.8 67.9
Query 10 48.5 22.4
Query 11 51.3 18.7
Query 12 65.4 29.1

 4.3.4 Benchmark Overall results: QMpH for the 100M and 200M datasets for all runs

For the 100M and 200M datasets we ran a test with 1, 4, 8 and 64 clients. The results are in Query Mixes per Hour (QMpH) meaning that larger numbers are better.



1 4 8
64
100M 3534 9349 12798 15285
200M 1795 3713 4041 3622


4.3.5 Result Summaries


4.3.6 Run Logs (detailed information)

BigOwlim run logs for 200M:
Number of clients Single
Multi(zip)
Download links


4.4 TDB


TDB homepage

Fuseki homepage

4.1.1 Configuration

The following changes were made to the default configuration of the software:


4.4.2 Load Time

The table below summarizes the load times Turtle files (in hh:mm:ss) :

100M 200M
1:14:48
2:45:13



4.4.3 Benchmark Query results: QpS (Queries per Second)

The table below summarizes the query throughput for each type of query over all 500 runs (in QpS):


  100m 200m
Query 1 75.1 62.2
Query 2 41.0 44.1
Query 3 82.2 66.1
Query 4 62.1 47.3
Query 5 2.0 1.2
Query 6 not executed not executed
Query 7 22.6 15.0
Query 8 24.4 15.9
Query 9 124.6 97.1
Query 10 33.5 26.4
Query 11 30.0 23.4
Query 12 33.3 28.3

 4.4.4 Benchmark Overall results: QMpH for the 100M and 200M datasets for all runs

For the 100M and 200M datasets we ran a test with 1, 4, 8 and 64 clients. The results are in Query Mixes per Hour (QMpH) meaning that larger numbers are better.



1 4 8
64
100M 2274 4065 3035 2242
200M 1443 2206 1474
*

* TDB crashed for the 200m dataset and 64 clients, because of an error (fixed by now).

4.4.5 Result Summaries


4.4.6 Run Logs (detailed information)

TDB run logs for 200M:
Number of clients Single
Multi(zip)
Download links


4.5 Virtuoso


Virtuoso homepage

4.5.1 Configuration

The following changes were made to the default configuration of the software:


4.5.2 Load Time

The table below summarizes the load times Turtle files (in hh:mm:ss) :

100M 200M
1:49:26
3:59:38



4.5.3 Benchmark Query results: QpS (Queries per Second)

The table below summarizes the query throughput for each type of query over all 500 runs (in QpS):


  100m 200m
Query 1 200.7 163.0
Query 2 71.1 73.8
Query 3 201.4 195.4
Query 4 103.9 94.8
Query 5 15.2 9.3
Query 6 not executed not executed
Query 7 24.9 15.4
Query 8 54.0 22.5
Query 9 379.1 160.1
Query 10 113.7 69.9
Query 11 73.6 39.5
Query 12 68.0 39.8

 4.5.4 Benchmark Overall results: QMpH for the 100M and 200M datasets for all runs

For the 100M and 200M datasets we ran a test with 1, 4, 8 and 64 clients. The results are in Query Mixes per Hour (QMpH) meaning that larger numbers are better.



1 4 8
64
100M 7352 25194
36269 18008
200M 4669 13265 18264
16564

4.5.5 Result Summaries


4.5.6 Run Logs (detailed information)

Virtuoso run logs for 200M:
Number of clients Single
Multi(zip)
Download links


5. Benchmark Results for the Explore-And-Update Use Case


This section reports the results of running the Explore-And-Update use case of the BSBM benchmark against:

BigData and Virtuoso are not listed here for different reasons: BigData does not provide all SPARQL features yet, that are required to run the update query mix. When executing the update mix on Virtuoso, we ran into technical problems and are still working on solving them together with the Openlink team.

Test Procedure

The load performance of the systems was measured by loading the Turtle or N-Triples representation of the BSBM datasets into the triple stores. The loaded datasets were forward chained and contained all rdf:type statements for product types. Thus the systems under test did not have to do any inferencing.

The query performance of the systems was measured by running 500 BSBM query mixes (altogether 15,000 queries from the Explore and Update query mixes) against the systems over the SPARQL protocol. The test driver and the system under test (SUT) were running on the same machine in order to reduce the influence of network latency. In order to measure sustainable performance of the SUTs, a ramp-up period is executed before the actual test runs.

We applied the following test procedure to each store:

  1. Load data into the store.
  2. Shutdown store, clear OS caches and swap, restart store.
  3. Run ramp-up (randomizer seed: 1212123, at least 8000 query mixes).
  4. ./testdriver -ucf usecases/exploreAndUpdate/sparql.txt -rampup -seed 1212123 http://sparql-endpoint
  5. Execute single-client test run (500 mixes performance measurement, randomizer seed: 9834533)
./testdriver -ucf usecases/exploreAndUpdate/sparql.txt -udataset dataset_update.nt -seed 9834533 -u http://sparql-endpoint-update-service http://sparql-endpoint

The different runs use distinct randomizer seeds for choosing query parameters. This ensures that the test driver produces distinctly parameterized queries over all runs and makes it harder for the stores to apply query caching.

5.1 4store


  4store homepage

5.1.1 Configuration

The following changes were made to the default configuration of the software:

Raptor2 version: 2.0.0

Rasqal version: 0.9.24
Update Query 2 in queries/update/query2.txt was changed to the following because the DELETE WHERE syntax was not supported:

DELETE
{ %Offer% ?p ?o }
WHERE
{ %Offer% ?p ?o }


5.1.2 Benchmark Query results: QpS (Queries per Second)

The table below summarizes the query throughput for each type of query over all 500 runs (in QpS):


  100m
Upd. Query 1 20.3
Upd. Query 2 70.4
Exp. Query 1 116.4
Exp. Query 2 66.7
Exp. Query 3 141.7
Exp. Query 4 59.6
Exp. Query 5 8.1
Exp. Query 6 not executed
Exp. Query 7 53.0
Exp. Query 8 67.1
Exp. Query 9 351.5
Exp. Query 10 68.8
Exp. Query 11 192.6
Exp. Query 12 62.1

5.1.3 Benchmark Overall result: QMpH for the 100M dataset

The result is in Query Mixes per Hour (QMpH) meaning that larger numbers are better.

5311.1 QMpH

5.1.4 Result Summaries


5.1.5 Run Logs (detailed information)


5.2 BigOwlim


  Owlim homepage

5.3.1 Configuration

The following changes were made to the default configuration of the software:

Modified heap size:

CATALINA_OPTS="-Xms512m -Xmx8G"

Modified database location:

JAVA_OPTS='-Dinfo.aduna.platform.appdata.basedir=/database/tomcat'
Config files:



5.2.2 Benchmark Query results: QpS (Queries per Second)

The table below summarizes the query throughput for each type of query over all 500 runs (in QpS):


  100m
Upd. Query 1 19.4
Upd. Query 2 30.5
Exp. Query 1 222.0
Exp. Query 2 57.8
Exp. Query 3 278.6
Exp. Query 4 196.2
Exp. Query 5 2.6
Exp. Query 6 not executed
Exp. Query 7 44.3
Exp. Query 8 51.6
Exp. Query 9 292.6
Exp. Query 10 65.6
Exp. Query 11 64.4
Exp. Query 12 63.6

 5.2.3 Benchmark Overall result: QMpH for the 100M dataset

The result is in Query Mixes per Hour (QMpH) meaning that larger numbers are better.

2809.2 QMpH

5.2.4 Result Summaries


5.2.5 Run Logs (detailed information)


5.3 TDB


TDB homepage

Fuseki homepage

4.1.1 Configuration

The following changes were made to the default configuration of the software:

5.3.2 Benchmark Query results: QpS (Queries per Second)

The table below summarizes the query throughput for each type of query over all 500 runs (in QpS):


  100m
Upd. Query 1 0.7
Upd. Query 2 2.4
Exp. Query 1 55.5
Exp. Query 2 39.1
Exp. Query 3 84.5
Exp. Query 4 56.7
Exp. Query 5 2.0
Exp. Query 6 not executed
Exp. Query 7 68.9
Exp. Query 8 82.3
Exp. Query 9 141.0
Exp. Query 10 125.3
Exp. Query 11 78.5
Exp. Query 12 112.6

 5.3.3 Benchmark Overall result: QMpH for the 100M dataset

The result is in Query Mixes per Hour (QMpH) meaning that larger numbers are better.

680.8 QMpH

5.3.4 Result Summaries


5.3.5 Run Logs (detailed information)



6. Store Comparison

This section compares the SPARQL query performance of the different stores.

6.1 Query Mixes per Hour for Single Clients

Running 500 query mixes against the different stores resulted in the following performance numbers (in QMpH). The best performance figure for each dataset size is set bold in the tables.


6.1.1 QMpH: Explore use case

The complete query mix is given here.


  100m 200m
4store 5589 4593
BigData 2428 1795
BigOwlim 3534 1795
TDB 2274 1443
Virtuoso 7352 4669

A much more detailed view of the results for the Explore use case is given under Detailed Results For The Explore-Query-Mix Benchmark Run.

6.1.2 QMpH: Explore and Update use case

The Explore and Update query mix consists of the Update query mix (queries 1 and 2) and the Explore query mix (queries 3 to 14).


  100m
4store 5311
BigOwlim 2809
TDB 680

A much more detailed view of the results for the Explore and Update use case is given under Detailed Results For The Explore-And-Update-Query-Mix Benchmark Run.


6.2 Query Mixes per Hour for Multiple Clients (Explore query mix only)


Dataset Size 100M   Number of clients  
 
1
4
8
64
4store
5589 *
*
*
BigData 2428 4153
4286
4136
BigOwlim
3534 9349
12798
15285
TDB
2274 4065
3035
2242
Virtuoso
7352 25194
36269
18008

 

Dataset Size 200M   Number of clients  
 
1
4
8
64
4store
4593 *
*
*
BigData
1795 3040
3167
2689
BigOwlim
1795 3713
4041
3622
TDB
1443 2206
1474
**
Virtuoso
4669 13265
18264
16564

* We ran into technical problems while testing with multiple clients.

** TDB crashed for the 200m dataset and 64 clients, because of a bug that has been fixed by now.

6.3 Detailed Results For The Explore-Query-Mix Benchmark Run

The details of running the Explore query mix are given here. There are two different views:

6.3.1 Queries per Second by Query and Dataset Size

Running 500 query mixes against the different stores lead to the following query throughput for each type of query over all 500 runs (in Queries per Second). The best performance figure for each dataset size is set bold in the tables.


Query 1

  4store BigData BigOwlim TDB Virtuoso
100m 117.6 64.2 112.5 75.1 200.7
200m 145.3 64.4 45.0 62.2 163.0

Query 2

  4store BigData BigOwlim TDB Virtuoso
100m 49.0 33.6 159.3 41.0 71.1
200m 55.7 35.3 111.6 44.1 73.8

Query 3

  4store BigData BigOwlim TDB Virtuoso
100m 102.4 12.4 125.0 82.2 201.4
200m 122.9 14.4 48.1 66.1 195.4

Query 4

  4store BigData BigOwlim TDB Virtuoso
100m 43.4 38.4 97.9 62.1 103.9
200m 62.9 40.0 37.8 47.3 94.8

Query 5

  4store BigData BigOwlim TDB Virtuoso
100m 7.8 2.3 3.0 2.0 15.2
200m 5.0 1.7 1.8 1.2 9.3

Query 6

Not Executed.

Query 7

  4store BigData BigOwlim TDB Virtuoso
100m 41.3 31.3 32.6 22.6 24.9
200m 49.3 21.9 11.0 15.0 15.4

Query 8

  4store BigData BigOwlim TDB Virtuoso
100m 49.1 48.5 38.0 24.4 54.0
200m 57.1 14.1 12.6 15.9 22.5

Query 9

  4store BigData BigOwlim TDB Virtuoso
100m 233.0 54.8 141.8 124.6 379.1
200m 117.3 44.6 67.9 97.1 160.1

Query 10

  4store BigData BigOwlim TDB Virtuoso
100m 49.2 61.6 48.5 33.5 113.7
200m 52.8 28.8 22.4 26.4 69.9

Query 11

  4store BigData BigOwlim TDB Virtuoso
100m 145.3 43.8 51.3 30.0 73.6
200m 33.3 26.7 18.7 23.4 39.5

Query 12

  4store BigData BigOwlim TDB Virtuoso
100m 46.5 54.8 65.4 33.3 68.0
200m 40.4 31.1 29.1 28.3 39.8

6.3.2 Queries per Second by Dataset Size and Query

Running 500 query mixes against the different stores lead to the following query throughput for each type of query over all 500 runs (in Queries per Second). The best performance figure for each query is set bold in the tables. 


100m

  4store BigData BigOwlim TDB Virtuoso
Query 1 117.6 64.2 112.5 75.1 200.7
Query 2 49.0 33.6 159.3 41.0 71.1
Query 3 102.4 12.4 125.0 82.2 201.4
Query 4 43.4 38.4 97.9 62.1 103.9
Query 5 7.8 2.3 3.0 2.0 15.2
Query 6 not executed not executed not executed not executed not executed
Query 7 41.3 31.3 32.6 22.6 24.9
Query 8 49.1 48.5 38.0 24.4 54.0
Query 9 233.0 54.8 141.8 124.6 379.1
Query 10 49.2 61.6 48.5 33.5 113.7
Query 11 145.3 43.8 51.3 30.0 73.6
Query 12 46.5 54.8 65.4 33.3 68.0

200m

  4store BigData BigOwlim TDB Virtuoso
Query 1 145.3 64.4 45.0 62.2 163.0
Query 2 55.7 35.3 111.6 44.1 73.8
Query 3 122.9 14.4 48.1 66.1 195.4
Query 4 62.9 40.0 37.8 47.3 94.8
Query 5 5.0 1.7 1.8 1.2 9.3
Query 6 not executed not executed not executed not executed not executed
Query 7 49.3 21.9 11.0 15.0 15.4
Query 8 57.1 14.1 12.6 15.9 22.5
Query 9 117.3 44.6 67.9 97.1 160.1
Query 10 52.8 28.8 22.4 26.4 69.9
Query 11 33.3 26.7 18.7 23.4 39.5
Query 12 40.4 31.1 29.1 28.3 39.8

 

6.4 Detailed Results For The Explore-and-Update-Query-Mix Benchmark Run

The details of the Explore and Update query mix are given here for the Update part and here for the Explore part. There are two different views:

6.4.1 Queries per Second by Query and Dataset Size (Explore and Update query mix)

Running 500 query mixes against the different stores lead to the following query throughput for each type of query over all 500 runs (in Queries per Second). The best performance figure for each dataset size is set bold in the tables.

Upd. Query 1

  4store BigOwlim TDB
100m 20.3 19.4 0.7

Upd. Query 2

  4store BigOwlim TDB
100m 70.4 30.5 2.4

Exp. Query 1

  4store BigOwlim TDB
100m 116.4 222.0 55.5

Exp. Query 2

  4store BigOwlim TDB
100m 66.7 57.8 39.1

Exp. Query 3

  4store BigOwlim TDB
100m 141.7 278.6 84.5

Exp. Query 4

  4store BigOwlim TDB
100m 59.6 196.2 56.7

Exp. Query 5

  4store BigOwlim TDB
100m 8.1 2.6 2.0

Exp. Query 6

Not executed.

Exp. Query 7

  4store BigOwlim TDB
100m 53.0 44.3 68.9

Exp. Query 8

  4store BigOwlim TDB
100m 67.1 51.6 82.3

Exp. Query 9

  4store BigOwlim TDB
100m 351.5 292.6 141.0

Exp. Query 10

  4store BigOwlim TDB
100m 68.8 65.6 125.3

Exp. Query 11

  4store BigOwlim TDB
100m 192.6 64.4 78.5

Exp. Query 12

  4store BigOwlim TDB
100m 62.1 63.6 112.6

6.4.2 Queries per Second by Dataset Size and Query (Explore and Update query mix)

Running 500 query mixes against the different stores lead to the following query throughput for each type of query over all 500 runs (in Queries per Second). The best performance figure for each query is set bold in the tables.  

100m

  4store BigOwlim TDB
Upd. Query 1 20.3 19.4 0.7
Upd. Query 2 70.4 30.5 2.4
Exp. Query 1 116.4 222.0 55.5
Exp. Query 2 66.7 57.8 39.1
Exp. Query 3 141.7 278.6 84.5
Exp. Query 4 59.6 196.2 56.7
Exp. Query 5 8.1 2.6 2.0
Exp. Query 6 not executed not executed not executed
Exp. Query 7 53.0 44.3 68.9
Exp. Query 8 67.1 51.6 82.3
Exp. Query 9 351.5 292.6 141.0
Exp. Query 10 68.8 65.6 125.3
Exp. Query 11 192.6 64.4 78.5
Exp. Query 12 62.1 63.6 112.6






7. Experiences with the Business Intelligence use case


BigData and 4store currently do not provide all SPARQL features that are required to run the BI query mix. We thus tried to run the Business Intelligence use case of the Berlin SPARQL Benchmark only against Virtuoso, TDB and BigOwlim. However, we ran into several "technical problems" that prevented us from finishing the tests and from publishing meaningful results. We thus decided to give the store vendors more time to fix and optimize their stores and will run the BI query mix experiment again in about four months (July 2011). For the next test runs, we will also modify query 4, because of its quadratic complexity and dominates the benchmark for larger data sets. We will circulate the updated BI query mix specification via the SPARQL developers mailing list in May 2011 and will ask the vendors for feedback on the specifcation.


 


8. Thanks


Thanks a lot to Orri Erling for his proposal to have the Business Intelligence use case and initial queries for the query mix. Lots of thanks also go to Ivan Mikhailov for his in-depth review of the Business Intelligence query mix and for finding several bugs in the queries. We also want to thank Peter Boncz and Hugh Williams for feedback on the new version of the BSBM benchmark.

We want to thank the store vendors and implementers for helping us to setup and configure their stores for the experiment. Lots of thanks to Andy Seaborne, Ivan Mikhailov, Hugh Williams, Zdravko Tashev, Atanas Kiryakov,  Barry Bishop, Bryan Thompson,  Mike Personick and Steve Harris.

The work on the BSBM Benchmark Version 3 is funded through the LOD2 - Creating Knowledge out of Linked Data project.

 

Please send comments and feedback about the benchmark to Chris Bizer and Andreas Schultz.