Is Catalonia an Independent Country? – Tracking Implicit Biases in Crowdsourced Knowledge Graphs
Author(s): Gianluca Demartini
Full text: submitted version
Abstract: Collaborative creation of knowledge is an approach which has been successfully demonstrated by crowdsourcing project like Wikipedia. Similar techniques have recently been adopted for the creation of collaboratively generated Knowledge Graphs like, for example, Wikidata. While such an approach enables the creation of high quality structured content, it also comes with the challenge of introducing contributors’ implicit biases in the generated Knowledge Graph.
In this paper, we investigate how paid crowdsourcing can be used to understand contributor biases for controversial facts to be included into collaborative Knowledge Graphs. We propose methods to trace the provenance of crowdsourced fact checking this enabling bias transparency rather than aiming at eliminating bias from the Knowledge Graph.
Keywords: bias; crowdsourcing; fact checking
Review 1 (by anonymous reviewer)
(RELEVANCE TO ESWC) The paper is relevant to the ESWC as more and more people in the community use crowdsourcing as a means to gather information in a larger quantity. Therefore it is important that a possible bias is kept in mind. (NOVELTY OF THE PROPOSED SOLUTION) The proposed model is a straight forward, but novel solution, which has not been published (to the best of my knowledge) by someone else earlier (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The proposed model, to keep track of biased data, was not evaluated. (EVALUATION OF THE STATE-OF-THE-ART) Other approaches from the area of bias where introduced in the related-work section but not evaluated in more detail. Which is ok for the form of this paper, as the author clearly set the focused on the experiments done with Amazon MTurk (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) see previous section (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The studies presented in section 3 should be reproducible, because the author clearly describes the task as well as meta data such as payment etc. (OVERALL SCORE) The author of this paper run different experiments in order to show that crowdsourced knowledge is strongly biased by the group of person adding the data. As use case the authors analyse the results of three different tasks, which were run on Amazon MTurk. The main contributions of this paper is to analyse the work of 600 crowed workers as well as a model to follow bias in crowed sourced knowledge graphs. Strong points of this paper are: 1) Clear structure of the paper, including a clear statement of the main contributions 2) A focused and well defined related work about related to bias, crowdsourcing and provenance tracking in RDF 3) Well described experiment section (chapter 3) Weaker points of this paper are: 1) Although the author does provide a conclusion section, it is not clear what to do with the results presented in this paper. Especially how can one avoid having a biased Knowledge Graph and how can we support crowed workers to focus on facts and reducing the bias 2) The introduction of the model is rather short. I would have liked to seen more information about how the proposed scores are calculated and what kind of metadata should be modeled when extending a knowledge graph with this kind of informations. 3)While there were evaluation in section 3, I would have expected an evaluation for the model in terms of, how end users of the crowdsourced knowledge graph work with the information that they are possible looking at biased data
Review 2 (by Melanie Courtot)
(RELEVANCE TO ESWC) This paper is relevant to the Social Web & Web Science track as it pertains to collaborative creation of knowledge, crowdsourcing and more specifically to investigating bias on the web. (NOVELTY OF THE PROPOSED SOLUTION) This paper doesn't provide a solution to the problem. Section 4 proposes a potential approach to be considered towards a solution (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) There is no description or trial of the proposed approach. There is statement to the effect that this may generate quite a lot of churn, but no actual implementation has been tested. (EVALUATION OF THE STATE-OF-THE-ART) There is only a brief mention that reification in RDF could be used citing the paper by Nguyen et al. - which itself proposes to use a Singleton Property instead of reification, so this reference seems odd? (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) There is no demonstration of the proposed solution, this paper rather exemplifies the current issues. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The experimental study is well described and should be reproducible, though arguably one would expect slight differences reflecting the actual workers involved in the study. (OVERALL SCORE) This is an interesting paper which showcases nicely the issue of bias. It is well written and exposes issues to be considered when using crowdsourcing, such as gender, age or task duration bias. I'm not sure the ranking of the results is entirely pertinent as presumably different person would get results ranked differently based on their location for example? Maybe it is however acceptable as proxy measure, though it'd be useful to clarify. However, it doesn't provide explanations as to why those specific bias should be considered - are there other that need to be taken into account? For example, the author use the familiarity with the Ellen show to justify US workers being more familiar with the Pope, but couldn't be linked to religion instead? More importantly, there are no proposed solution implemented with respect to tracking and evaluating bias and as such this paper falls in my opinion short to the acceptable level.
Review 3 (by Phillip Lord)
(RELEVANCE TO ESWC) It's about semantics and Europe, so it's clearly relevant. (NOVELTY OF THE PROPOSED SOLUTION) Storing provenance is not novel. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The solution is very partial. It's not clear what to do with the additional data the author proposes storing. (EVALUATION OF THE STATE-OF-THE-ART) The background material is well covered. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) Again, the author does not say what to do with the data he proposes to store. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) See below. (OVERALL SCORE) This paper discusses a technique for addressing implicit bias in crowd-sourced semantic data. It's an interesting idea, with a clearly written paper, with a snappy title. The actual meat of the paper is, however, very thin and this significantly limits its applicability. Specifically, the main idea is just to represent a set of metrics about the crowd-sourced individuals that were used to collect the data; this would then leave the consumers of the data to "deal with it as they deem appropriate". As well as passing the buck to someone else, this also means that the consumer would need to have extensive knowledge of the forms of bias that the provenance information is likely to cause. It's hard to see how they would get this without carrying out extensive crowd-sourcing studies, which leaves a chicken and egg situation. I am also not convinced by the experimental data presented. The author has simplied carried out a number of different chi squared tests. There is no analysis of whether the different categories are themselves linked (for example, age and gender). Nor is a multiple test correction applied. It is hard to say what the implications of these would be, since the author does not report the significance values just that they are < 0.01. Minor issues: "The Capital of Israel less controversial than Catalonia?" He uses the closeness to an even distribution of pro and cons as a measure of controversy. This makes no sense; by this logic the culinary appeal of marmite is more controversial than the Capital of Israel. "Chi-squared test shows a not significant effect" should read "there is no sigificant difference".
Review 4 (by anonymous reviewer)
(RELEVANCE TO ESWC) The paper addresses the problem of tracking implicit biases in crowdsourced KGs. The problem is very interesting and relevant to the Semantic Web community. (NOVELTY OF THE PROPOSED SOLUTION) The paper proposes a model to keep track of bias information in crowdsourced KGs. It suggests keeping information about the crowd workers (nationality, age, gender) as well as about the validation process (supporting reference (URL), search engine used to retrieve the reference, search query used to retrieve the reference, position of reference in the ranked list of results). (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The paper proposes the enrichment of factual statements about controversial facts in KGs with provenance information, with the aim to enable bias transparency in KGs. The results of a crowdsourcing study are presented where the crowd workers are asked to provide an answer to a controversial question as well as evidence supporting their claims. However, in my opinion, the experimental setup is quite strange and not correct. For questions like "Is Catalonia an independent country?" and "What is the capital of Israel?", there are unambiguous correct answers (like the information provided in Wikidata and Wikipedia). Adding to a KG the information that "Catalonia is a sovereign state (according to 58% of contributors)" (Figure 10), or showing to the end users the information that Catalonia is a "Country in Europe (accoring to 58%)" (Figure 11), is just disinformation since this is a wrong statement! The same applies for the question about the capital city of Israel. In addition, storing the information that "independent.co.uk" is the main source of evidence for the fact "Catalonia - instanceOf - Sovereign state" is misleading and should be avoided (one may think that this online newspaper believes/supports this!). At any case, the full URL of the supporting reference should be provided. For such controversial topics, one can ask opinion/viewpoint-like questions, for example: "Should Catalonia be an independent Country?" or "What should be the capital of Israel?" (EVALUATION OF THE STATE-OF-THE-ART) I was expecting to see works on "Bias in KBs/KGs" (origins, causes; for showing that the problem exists) and "Bias Representation" (any vocabularies/ontologies for bias representation?). The paper discusses works that deal with quality issues in crowdsourcing as well as works that exploit crowdsourcing for Semantic Web-related problems, however these are not much related to the problem addressed in this paper. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) Several aspects are not clear: - No discussion is provided on how the *search engine* and the *search query* (used to retrieve the supporting reference) can be exploited for tracking the implicit biases. Moreover, there is no explanation on why such information is useful to keep and how it can be exploited, while Section 3 does not show any results about these aspects. Likewise, it is not clear why the rank position is important and how one can exploit such information. - Regarding the 3rd crowdsourcing task (verification of fake news), it is not clear what statement/fact someone could add in a KG. Again here, there is an unambiguous correct answer (video is fake). However, according to the paper's approach, we could store the information that this video is NOT fake according to 53.5%! - It would be interesting to provide information about conflicting answers, i.e., cases where different users provide the same URL for supporting different answers. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The experimental study is a crowdsourcing experiment with no clear setup (see "Correctness and Completeness of the Proposed Solution"). A new study with the same setup may provide quite different results. (OVERALL SCORE) Summary of the paper: - The paper proposes the enrichment of factual statements about controversial facts in KGs with provenance information, with the aim to enable bias transparency in KGs. The results of a crowdsourcing study are presented where the crowd workers are asked to provide an answer to a controversial question as well as evidence supporting their claims. Strong points: - Bias in KGs is an interesting topic - The paper is easy to read Weak points: - Experimental setup does not seem correct - Several aspects are not cleary presented - No related works on "Bias in KBs/KGs" and "Bias representation" (Please see my detailed comments under the evaluation criteria). I would suggest the author to change the direction/focus of this work towards "tracking the provenance for opinion questions" where, given a controversial topic, the objective is to track implicit biases in topic-related opinion questions.
Review 5 (by Miriam Fernandez)
(RELEVANCE TO ESWC) This paper targets the problem of documenting provenance (and possible biases) in knowledge graphs when generated via crowd sourcing. While data quality and provenance are very relevant problems for the semantic web community, most of the paper is not really dedicated to analyse and propose solutions to these problems, but to illustrate the existence of possible biases when creating statements via crowdsourcing. While this is a very relevant problem (particularly in the era of misinformation), I don't think this analysis - Section 3- is particularly relevant for the Semantic Web community. (NOVELTY OF THE PROPOSED SOLUTION) The paper claims as a main contribution a method to trace the provenance of crowdsourced facts in knowledge graphs. However, the proposed solution (using reification to store additional metadata) is not particularly novel. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The paper proposes the use of five statements to store provenance information about three crowdsourced facts. This basically represents a proof of concept, but the idea has not really been tested in a real-use case scenario to study (i) which biases emerge? Are those statements sufficient to encapsulate and document those biases? etc. (EVALUATION OF THE STATE-OF-THE-ART) The state of the art is discussed in sufficient detail and appropiate references are provided. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) As mentioned earlier the paper proposes a method to trace provenance but the method has not been really tested and evaluated. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The proposed model is reproducible. (OVERALL SCORE) While the paper targets an important problem (that of document biases during the creation of knowledge graphs when generated via crowd sourcing), the work seems to be in an initial stage and does not address the problem in detail. It would have been very interesting to see an deep analysis of the different biases that can be introduced by crowdsourcing when creating knowledge graphs, the different requirements to document those biases, and an investigation on how that information could be used and applied in real scenarios (challenges, advantages, limitations, etc.). Given the current contributions, the paper seems more adequate for a workshop than for the research track.
Metareview by Harald Sack
The paper addresses and discusses a technique for tracking implicit bias in crowd-sourced semantic data. Although the reviewers agree about the significance of the topic, the presented work seems to be still in an early stage since it does not provide an in-depth analysis and evaluation. The experimental setup is critiziced, explanations as to why those specific biases should be considered and how they should be exploited are missing.