“We Regret to Inform You” – Managing Reactions to Rejection in Crowdsourcing
Author(s): Ujwal Gadiraju, Gianluca Demartini
Full text: submitted version
Abstract: In current microtask crowdsourcing systems on the Web, requesters typically exercise the power to decide whether or not to accept the tasks completed by crowd workers. Rejecting work has a direct impact on workers; (i) they may not be rewarded for work which has actually been done and for their effort that has been exerted, and (ii) rejection affects worker reputation and may limit access to future work opportunities.
This paper presents a comprehensive study that aims to understand how workers react to rejections in microtask crowdsourcing. We investigate the affect of the mood of workers on their performance, as well as the interaction of their moods with their reactions to rejection. Finally, we explore techniques such as social comparison that can be used to foster positive reactions. Our findings bear important implications on maintaining positive interactions between workers and requesters in microtask crowdsourcing systems, thereby improving the effectiveness of the paradigm.
Keywords: Crowdsourcing; Microtasks; Rejection Sensitivity; Mood; Workers; Social Comparison; Performance
Review 1 (by anonymous reviewer)
(RELEVANCE TO ESWC) There is no relation to ontologies or semantic web in this paper. The fact that crowdsourcing techniques *could* be used for SW tasks is not really sufficient - the paper does not actually even consider such tasks, although that in itself would be barely sufficient. (NOVELTY OF THE PROPOSED SOLUTION) All the findings are long known in psychological studies, so there is nothing new there. The method is thus not really interesting because it doesn't go beyond finding out what's already known. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The studies are really too small to be considered effective. (EVALUATION OF THE STATE-OF-THE-ART) This work ignores much of the literature on good practice in crowdsourcing and, more importantly, studies in psychology that have shown the same results. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The authors do try to understand and discuss the results of the approach, though not really the properties of it. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The study is described in detail so it could be replicated without too much effort. (OVERALL SCORE) The paper describes some experiments to determine how happy people feel before and after doing crowdsourcing tasks depending on whether their work is considered acceptable or not. The results show that people who are happy before the task tend to perform better (unsurprisingly) and that those whose work gets rejected feel unhappier (especially the narrow misses) - again unsurprisingly. Strong points: the experiments are described in detail and there is a thorough analysis. The findings support existing psychological theory. It could be interesting if it were taken a bit further. Weak points: there is no relation to Semantic Web in the work and so it is not relevant for this conference. The experiments are rather small and by only assessing people's reactions right after the experiment, they do not show any longer-term emotional status, which would be more interesting. Having read the rebuttal, I maintain my assertion that the paper is not relevant, since there is no semantic web content in either the work itself or even in the crowdsourcing task being evaluated. I also still find the results completely unsurprising, and maintain my concerns about some of the experiment methodology.
Review 2 (by anonymous reviewer)
(RELEVANCE TO ESWC) See detail below. (NOVELTY OF THE PROPOSED SOLUTION) See detail below. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) See detail below. (EVALUATION OF THE STATE-OF-THE-ART) See detail below. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) See detail below. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) See detail below. (OVERALL SCORE) The paper tackles an interesting topic not SW-web specific but of interest to the community. Based on a set of hypotheses the authors measure the impact of fairly/unfairly rejecting work done through crowdsourcing, along with the impact of mood and feedback (or lack of it) on performance. The approach is not necessarily novel but this does not detract from the paper as is. The paper is overall easy to follow. I would suggest providing a bit more detail - see related point below - on decisions made wrt to elements of mood and sensitivity to address - as above the topic is not SW-specific, and the target reader must be able to understand what fed into setting up the study without having to do a separate background study. While I think this would make a good contribution wrt to discussing where other fields could feed into work in the SW community doing so requires a bit more work as you cannot assume what might be basic knowledge. This is the key reason I chose weak rather than regular accept. The discussion highlights unexpected results wrt to the impact of feedback, but the conclusions do not present anything unexpected/unusual. *** Three levels of difficulty, starting from the lowest, were used. Overall, would the task be considered easy or difficult, compared to what would be typical for the target worker selected? Simply, this would impact workers' assessment of fairness. If this cannot be determined that is fine, but I would suggest addressing this as otherwise it's a confounding factor. I ask this partly because the sensitivity scores in Fig 4 are widely scattered. And yet an average is reported without useful indication of reliability - the SD in this case is really not a good measure. The closest to analysis here is admitting to no significant correlation - which only reinforces my point. On what basis were the three conditions in figure 5 chosen - what other options were available? The study cited is outside the field of the typical ESWC attendee - a brief explanation is necessary to provide the context the reader doesn't have, so as to be able to judge whether the conclusions reached on this basis are valid. What is the range/top score, looking at both Figs 5 & 6 I cannot see that the levels that map to joy unambiguously fit what is reported in the text - the reader has no way of telling where medium and high lie. If in the text I missed it, and on a quick scan I couldn't find this information - so even if presented it isn't obvious. In any case I would suggest clearly marking both on the graphs. Even in Fig5 the box plot shows variation for each bar, not overall. And this would not necessarily map to the range in the questionnaire(s). Returning to the discussion - how easy would it be for a non-expert to administer and interpret the results of a mood test? How much additional effort would it be to do so - simply, people using crowdsourcing are typically on a tight budget - time and/or cost - increasing effort either end reduces the cost-benefit tradeoff of even a useful exercise. What is the current level of feedback given - looking at give/none, detailed/minimal. I would surmise this would weight heavily toward none and minimal; based on the results this would be more harmful than useful. For the same reasons as above, this would mean most would therefore prefer to move from minimal to none. While examining this in detail may well be outside the scope of the paper the open question begs an answer or at least a brief discussion about some way forward. Wrt ethics- did the targets know they were the subject of an experiment rather than a regular task - this would not necessarily bias results if properly set up as participants would not know which group they fell in. If not, were they at least informed of this after the fact and compensated accordingly - ignoring the fact that "workers" may be fairly/unfairly treated in regular tasks this WAS intentionally set up to be unfair/misleading and therefore breaks basic ethics rules unless this was planned for and managed accordingly. Whatever was done in the end the paper therefore needs to address this issue. ****** English reads L-R and T-B, I'd suggest ordering Fig 6 a-c across, then d-f down, in line with convention. Unless you have a specific reason for the order that allows comparison in a specific way. In which case should be stated. Please run an auto-grammar check and proof-read - there are a large number of errors that would be identified quite easily. "work" without an 's' - work is uncountable. "We investigate the affect of the mood of workers on their performance" - do you mean Effect? The same word is used again this way later in the paper. ******* Please check and correct capitalisation of proper names and acronyms in the references. And consider ordering grouped references in natural order, this makes looking up references easier for the reader.
Review 3 (by Valeria Fionda)
(RELEVANCE TO ESWC) The paper studies how rejections affects the mood and performance of workers in crowdsourcing, thus, it could be of some interest for the community but its relevance in my opinion is marginal. (NOVELTY OF THE PROPOSED SOLUTION) The performed analysis is clear and detailed. The authors try to give a sense of the results they obtained. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The performed analysis is clear and detailed. The authors try to give a sense of the results they obtained. (EVALUATION OF THE STATE-OF-THE-ART) The authors review the related work on the concepts of mood and emotion. However, they do not review work about how mood has been exploited in crowdsourcing environments, as an example: - Efficient scheduling in crowdsourcing based on workers' mood. Han Yu Zhiqi Shen ; Simon Fauvel ; Lizhen Cui. Agents (ICA), 2017 IEEE International Conference on. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The paper discuss in details the analysis conducted and the results obtained. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The experimental study was performed on real people (250) using a real crowdsourcing platform. Thus, the study is very general. (OVERALL SCORE) The paper discuss how rejections affects the mood and performance of workers in crowdsourcing. After defining the basic concepts and describing the experimental setup, the authors describe the obtained results and draw some conclusions. The paper is well written and the methodology used is explained in details. A lot of example are done to help the reader in understanding the setting they used. The related work section could be expanded by also considering some related work that exploit the mood of workers in crowdsourcing environments (e.g., Efficient scheduling in crowdsourcing based on workers' mood. Han Yu Zhiqi Shen ; Simon Fauvel ; Lizhen Cui. Agents (ICA), 2017 IEEE International Conference on). The quality of the figures is not adequate. The text in Fig.1 and Fig.2 cannot be read. The background of Fig.4 must be white. Strong Points: - Well written - The experimental analysis is explained in details with examples Weak Points: -Related Work section -Figures of poor quality - Marginally interesting for the community Typos: abstract line11: "the affect of the mood" --> "the effect of the mood"
Metareview by Harald Sack
The authors present a study on how rejections affect the mood and performance of workers in crowdsourcing. The results show that people who are happy before the task tend to perform better and that those whose work gets rejected feel unhappier (as one would assume). From a psychological or crowdsourcing point of view the work in general can be considered interesting. However, there is no connection at all pointed out towards Semantic Web. Thereby, the reviewers agree that the paper's relevance for this conference is only marginal to non-existent.