Chris Bizer, Freie Universität Berlin
Richard Cyganiak, Freie Universität Berlin
Oliver Maresch, Technische Universität Berlin
Tobias Gauss, Freie Universität Berlin

The WIQA Browser V0.3

Information Quality Assessment enabled Semantic Web Browser

The WIQA browser is a general purpose RDF browser that supports users in exploring RDF datasets containing information from multiple sources. Information can be filtered using a wide range of different, user-definable information quality assessment policies. In order to facilitate the user's understanding of the filtering decisions, the browser can create explanations why displayed information fulfils a selected policy.


News


Table of Contents

    1. Using Context-based Policies
    2. Using Reputation-based Policies
    3. Using Digital Signatures
  1. How does it work?
    1. Provenance Tracking using Named Graphs
    2. The TPL-Trust Policy Language
  2. Download
  3. Credits
  4. References


1. Introduction

Information providers on the Web have different levels of knowledge, different views of the world and different intensions. Thus, provided information may be wrong, biased, inconsistent or outdated. Before information from the Web is used to accomplish a specific task, its quality should be assessed according to task-specific criteria [Nau02][Wang96][BiOl04].

In everyday life, we use a wide range of different policies to assess the quality of information: We might accept information from a friend on restaurants, but distrust him on computers; regard scientific papers only as relevant, if they have been published within the last two years; or believe foreign news only when they are reported by several independent sources. Which policy is chosen depends on the specific task at hand, our subjective preferences and the availability of information quality-related meta-information, such as ratings or background information about the information provider.

The WIQA Browser demonstrates how a similar wide range of information quality assessment policies can be used on the Web.

The browser allows users to extract structured information together with provenance meta-information from Web pages. Information from different web pages is stored in a local repository and can be browsed, sorted and searched together. Collected information can be filtered using a wide range of different, user-definable information filtering policies. In order to facilitate the user's understanding of the filtering decisions, the browser can create explanations why displayed information fulfils a selected policy.

The WIQA browser is based on the Piggy Bank [HuynhMazzocchi05] plug-in for the Firefox browser developed by the SIMILE project. The WIQA browser uses Piggy Bank functionality to extract structured information from Web pages and to display and navigate extracted information. The WIQA browser extends Piggy Bank with the ability to:

The WIQA Browser employs the NG4J - Named Graphs API for Jena to store information together with provenance meta-information as a set of Named Graphs. The browser uses the WIQA - Filtering and Explanation Engine to filter information and to generate explanations about filtering decisions.

The following screenshot shows the user interface of the WIQA browser. The two visible extensions to Piggy Bank are marked red.

Policy Selection

The policy selection box allows users to select a filtering policy from the policy suite that is currently loaded. After selecting a policy, all information in the local repository is filtered using this policy. Policies are specified using the WIQA-PL - Information Quality Assessment Policy Language described in Section 3.

Oh, yeah? Buttons

There is a "Oh, yeah?"-button [BernersLee97] next to each piece of information. Pressing this buttons opens a new browser window containing an explanation why the piece of information fulfils the selected policy.

The following screenshot shows an explanation why a analyst report matches the policy "Use only information that has been asserted by German analysts".

The links in the explanation allow the further exploration of background information:

There is a "WIQA Browser " menu item in the Firefox "Extras" menu which allows users to:

 


2. Use Case: Exploring Financial Information

The following chapter demonstrates how the WIQA Browser can be used to explore Semantic Web information using different quality-based information filtering policies.

As running example, we are using an investor how is subscribed to a financial information service. This service collects business news, postings from business related news groups, stock quotes and information provider ratings from various information sources and sends them to the investor as a set of Named Graphs containing information together with provenance information about the original sources.

Example dataset: FinancialScenarioDataIncludingSchema.trig

Financial Data RDFS Schema used in the example dataset : FinancialScenarioSchema.n3
The example uses the Semantic Web Publishing Vocabulary.

Policy Suite for the example: FinancialScenarioPolicies.wiqa

2.1. Using Context-based Assessment Policies

Context-based information quality assessment policies use meta-information about the circumstances in which information has been claimed, e.g. who said what, when and why, as quality indicator.

Let's assume that our investor has found an article saying that Intel ran into serious problems. In order to decide if he should trust the article, he investigates the article's background. The finds out that the article has been written by Peter Smith and queries for further information about the author.

The inconsistency that Peter Smith is American and German at the same time appears strange to our investor. He thus asks his browser to explain the provenance of both pieces of information.

The investor finds out that the statement that Peter Smith is German has been asserted by the Information Aggregation Service. The statement that he is American has been stated by himself. As our investor thinks that people know their own nationality best, he decides to belief that Peter is American.

The investor than decides to check if there are news about Intel from other information sources. As he has a subjective preference for analysts from the UK, he changes the information quality assessment policy of his browser to "Information from UK-based analysts" and gets the following explanation for a news article:

After looking at the news, the investor checks the postings of two financial discussion forums for related information. As he is especially interested in the opinion of Intel employees about the problem, he decides to filter the postings using the policy "Use only information from Intel employees."

 

2.2. Using Rating-based Assessment Policies

Rating-based information quality assessment policies rely on ratings provided by domain experts or other information consumers to assess information quality.

Our investor might also decide to use a simple eBay-style assessment metric to filter the articles about Intel. Using this policy he get's the following explanation for an news article about the company:

Our investor isn't convinced because of the small number of ratings available for this information provider using the eBay rating scheme. He thus changes his policy to the more sophisticated TidalTrust metric [Gol05], which uses another (hopefully bigger) set of ratings. Using this metric, he gets the following explanation for another article:


3. How does it work?

The WIQA browser is based on:

 

3.1 Provenance Tracking using Named Graphs

The WIQA browser uses Named Graphs as internal data model and the Semantic Web Publishing Vocabulary (SWP) [CaBiHaSt05] for capturing provenance meta-information. SWP also provides terms to indicate whether a graph is asserted or quoted and to attach digital signatures to it.

Whenever the WIQA browser saves information from a webpage into the local repository, it creates a new named graph for this visit of the page and stores the current timestamp, the URL of the page and the authority (website URL) together with the actual information. The following TriG document contains two graphs created by saving information about a person from one webpage and the descriptions of two papers from another webpage.

 
@prefix swp: <http://www.w3.org/2004/03/trix/swp-2/> .
@prefix ns0: <http://www.ontoweb.org/ontology/1#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

<urn:uuid:8c845860-dce7-11d9-b9c0-00112ff60c7f> {
      [] a foaf:Person ;
         foaf:depiction <http://www.wiwiss.fu-berlin.de/suhl/ueber_uns/team/Fotos/Radek.jpg> ;
         foaf:family_name "Oldakowski" ;
         foaf:givenname "Radoslaw" ;
         foaf:phone <tel:+49-30-838-52760> ;
         foaf:workplaceHomepage
              <http://www.wiwiss.fu-berlin.de/suhl/ueber_uns/team/radoslaw_oldakowski.htm> .

 

     <urn:uuid:8c845860-dce7-11d9-b9c0-00112ff60c7f>

           dc:date "2005-06-14T17:18:10+02:00" ;
           swp:assertedBy <urn:uuid:8c845860-dce7-11d9-b9c0-00112ff60c7f> ;
           swp:authority <http://www.wiwiss.fu-berlin.de> ;
           swp:savedFrom <http://www.wiwiss.fu-berlin.de/suhl/ueber_uns/hauptseite_ueber_uns.htm> .
}

 

<urn:uuid:a2baaf80-dce7-11d9-b9c0-00112ff60c7f> {
     <http://www.wiwiss.fu-berlin.de/suhl/bizer/pub/Bizer-D2RQ-ISWC2004.pdf> a ns0:Misc ;
          ns0:author "Christian Bizer, Andy Seaborne" ;

          ns0:title "D2RQ - Treating Non-RDF Databases as Virtual RDF Graphs" ;
          ns0:url "http://www.wiwiss.fu-berlin.de/suhl/bizer/pub/Bizer-D2RQ-ISWC2004.pdf" ;
          ns0:year "2004" .

 

     <http://www.wiwiss.fu-berlin.de/suhl/bizer/pub/Bizer-NG4J-ESWC2005.pdf> a ns0:Misc ;
          ns0:author "Christian Bizer, Richard Cyganiak, Rowland Watkins" ;

          ns0:title "NG4J – Named Graphs API for Jena" ;
          ns0:url "http://www.wiwiss.fu-berlin.de/suhl/bizer/pub/Bizer-NG4J-ESWC2005.pdf" ;
          ns0:year "2005" .

 

     <urn:uuid:a2baaf80-dce7-11d9-b9c0-00112ff60c7f>
          dc:date "2005-06-14T17:18:57+02:00" ;
          swp:assertedBy <urn:uuid:a2baaf80-dce7-11d9-b9c0-00112ff60c7f> ;
          swp:authority <http://www.wiwiss.fu-berlin.de> ;
          swp:savedFrom <http://www.wiwiss.fu-berlin.de/suhl/forschung/publikationen.htm> .
}

 

3.2 The WIQA-PL Information Quality Assessment Policy Language

Information filtering policies are expressed using the WIQA-PL policy language. WIQA-PL policies can combine different content-, context- and rating-based assessment metrics. The basic idea of the language is to represent policies as a set of graph patterns which are matched against the graph set to be filtered. As information quality assessment often requires domain-specific assessment metrics, WIQA-PL provides an extension mechanism that enable domain-specific assessment metrics to be included into policies. WIQA-PL policies may contain explanation templates, which are used by the WIQA framework to generate natural language as well as RDF explanations about filtering decisions.

The WIQA-PL Language Specification describes the WIQA-PL language constructs and explain how the language is used to formulate information filtering policies.

The EBNF grammar of the WIQA-PL policy language is availiable here. The grammar is based on the grammar of the SPARQL query language in order to make it easier for people who already know SPARQL to learn WIQA-PL.

The following example shows the WIQA-PL policy "Accept only information that has been asserted by analysts who have received at least 3 positive ratings."

 
1. NAME "Asserted by analysts with at least 3 positive ratings."
2. DESCRIPTION "Accept only information that has been asserted by
3. analysts who have received at least 3 positive ratings."
4. PATTERNS {
5.
6. GRAPH fd:GraphFromAggregator
7. { ?GRAPH swp:assertedBy ?warrant .
8. ?warrant swp:authority ?authority .
9. EXPL "it was asserted by " ?authority " and " . }
10.
11. GRAPH ?graph2
12. { ?authority rdf:type fin:Analyst . }
13.
14. GRAPH fd:GraphFromAggregator
15. { ?graph2 swp:assertedBy ?warrant2 .
16. ?warrant2 swp:authority ?authority2 .
17. EXPL ?authority2 " claims that " ?authority
18. " is an analyst." . }
19.
20. GRAPH ANY
21. { ?rater fin:positiveRating ?authority .
22. FILTER (wiqa:count(?rater) > 2) .
23. EXPL ?authority "has received positive ratings from" . }
24.
25. GRAPH fd:BackgroundInformation
26. { ?rater fin:affiliation ?company .
27. EXPL ?rater "who works for" ?company . }
28. }

 

WIQA Financial Example Policy Suite contains several policies that an investor could use to filter financial information.

 


4. Download

In order to run the WIQA browser, you have to install the WIQA-Browser.xpi extension into your Firefox browser
(The extension works up to Firefox 1.6, but not with Firefox 2).

To testdrive the WIQA browser you can load our example data:

Version Comment Release Date
V0.3

Replaced the old TriQL.P engine with the new WIQA framework. Moved to the current version of Piggybank.

2006-08-20
V0.2

Replaced MySQL with the HSQLDB database in order to make the installation easier.
Moved to the Piggy Bank 2.0.4 code.

2005-08-17
V0.1

Initial release.

2005-07-15

The source code of the WIQA Browser can be downloaded here: WIQABrowserSourceCode.zip

The WIQA browser is licensed under the terms of the GNU General Public License (GNU-GPL).

Doap description of the project: doap-WIQA-browser.rdf


5. Credits

The following people have contributed to the development of the WIQA browser:

We are very interested in your opinion about the browser and thankful for additional ideas. Please send comments to:

Chris Bizer
chris@bizer.de
http://www.bizer.de/


6. References

[Named-Graphs]
Named Graphs, http://www.w3.org/2004/03/trix/
[TriG]
The TriG Syntax, Christian Bizer, 2004. http://www.wiwiss.fu-berlin.de/suhl/bizer/TriG/
[Nau02]
Felix Naumann: Quality-Driven Query Answering for Integrated Information Systems, Springer, 2002. http://www.springerlink.com/content/bfevlgwdjcl2/
[Wang96]
Richard Wang and Diane Strong. Beyond Accuracy: What Data Quality Means to Data Consumers. Journal of
Management Information Systems, 12(4):5–33, 1996. http://citeseer.ist.psu.edu/context/381786/0
[Gol05]
Jennifer Golbeck. Computing and Applying Trust in Web-based Social Networks. PhD thesis, University of
Maryland, 2005. http://trust.mindswap.org/papers/GolbeckDissertation.pdf
[RDF-SYNTAX]
RDF/XML Syntax Specification (Revised), Beckett D. (Editor), W3C Recommendation, 10 February 2004. This version is http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/. The latest version is http://www.w3.org/TR/rdf-syntax-grammar/.
[BuSaKh01]
Peter Buneman, Sanjeev Khanna, Wang-Chiew Tan: Why and Where: A Characterization of Data Provenance. http://db.cis.upenn.edu/DL/whywhere.pdf
[GoHePa03]
J. Golbeck, J. Hendler, B. Parsia: Trust Networks on the Semantic Web. http://mindswap.org/papers/Trust.pdf
[RiAgDo03]
M. Richardson, R. Agrawal, P. Domingos: Trust Management for the Semantic Web. http://www.cs.washington.edu/homes/mattr/doc/iswc2003/iswc2003.pdf
[Berners-Lee97]
Tim Berners-Lee: Cleaning up the User Interface, Section - The "Oh, yeah?"-Button. http://www.w3.org/DesignIssues/UI.html
[BiOl04]
Christian Bizer, Radoslaw Oldakowski: Using Context- and Content-Based Trust Policies on the Semantic Web. WWW2004, New York, May 2004. http://www.wiwiss.fu-berlin.de/suhl/bizer/SWTSGuide/p747-bizer.pdf
[CaBiHaSt05]
J. Carroll, C. Bizer, P. Hayes, P. Strickler: Named Graphs, Provenance and Trust. http://www.wiwiss.fu-berlin.de/suhl/bizer/pub/Carroll_etall-WWW2005.pdf
[Marchiori04]
Massimo Marchiori: W5: The Five W's of the World Wide Web. http://www.w3.org/People/Massimo/papers/2004/w5_04.pdf
[CuWi99]
Yingwei Cui, Jennifer Widom: Practical Lineage Tracing in Data Warehouses. http://citeseer.nj.nec.com/cache/papers/cs/1704/http:zSzzSzwww-db.stanford.eduzSzpubzSzpaperszSztrace.pdf/cui99practical.pdf
[McGuinnessDaSilva03]
L. McGuinness, P. da Silva: Infrastructure for Web Explanations - ISWC 2003. http://www.cs.toronto.edu/semanticweb/resource/reference/iswc03bestpapers/iswc03-infrastructure-web-explanations.pdf
[HuynhMazzocchi05]
David Huynh, Stefano Mazzocchi, and David Karger: Piggy Bank: Experience the Semantic Web Inside Your Web Browser. Submitted to the International Semantic Web Conference 2005 . http://simile.mit.edu/papers/iswc05.pdf
[Wang95]
Richard Y. Wang, M.P. Ready, and Henry B. Kon: Toward Quality Data: An Attribute-based Approach, In Decision Support System 13,1995, pp. 349-372. http://web.mit.edu/tdqm/www/tdqmpub/Toward%20Quality%20Data.pdf

More references to resources about the trust and security issues arising from the Semantic Web are found in the
Semantic Web Trust and Security Resource Guide.


$Date: 2006/12/10 12:33:00 $
$Id: index.html,v 1.3 2006/12/10 12:33:00 bizer Exp $