FairGRecs- Fair Group Recommendations by Exploiting Personal Health Information & Semantics
Author(s): Maria Stratigi, Haridimos Kondylakis, Kostas Stefanidis
Full text: submitted version
Abstract: Nowadays, the number of people who search for information related
to health has significantly increased, while the time of health professionals
for recommending online useful sources of information has
been reduced to a great extend. FairGRecs aims to offer an effective
approach that provides valuable information to users, in the form
of suggestions, via their caregivers, and improve as such the opportunities
that users have to inform themselves online about health
problems and possible treatments. Specifically, we propose a model
for group recommendations incorporating the notion of fairness,
following the collaborative filtering approach. For computing similarities
between users, we define a novel measure that is based on the
semantic distance between users’ health problems. Our special focus
is on providing valuable suggestions to a caregiver who is responsible
for a group of users. We interpret valuable suggestions as ones
that are both highly related and fair to the users of the group. As
such, we introduce a new aggregation design, incorporating fairness,
and we compare it with current state-of-the-art. Our experiments
demonstrate the advantages of both the semantic similarity measure
and the fair aggregation design.
Keywords: Group Recommendations; Semantic Recommendations; Collaborative Filtering
Review 1 (by anonymous reviewer)
(RELEVANCE TO ESWC) This is a relevant work in the field of group recommendations in health, and it is relevant to ESWC because it uses semantic to solve a problem. (NOVELTY OF THE PROPOSED SOLUTION) The algorithm proposed is new for this field, however fairness is not a new concept. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The proposed solution is clearly explained during the paper, and mathematically proved. (EVALUATION OF THE STATE-OF-THE-ART) The state-of-the-art is adequate and recent. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The experiments developed to test the algorithm are sufficient, however, the work would benefit from a real data set and real implementation. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The approach is easy to reproduce from the information in the paper, specially because the authors provide information to create the used data set. (OVERALL SCORE) Summary of the Paper The article is focused on recommendations of health online documents for groups of patients from the same caregiver, in an attempt to answer the problem of people looking for health information online. The described recommender algorithm is based on collaborative-filtering, using the similarity between users to provide the recommendations. In order to give the recommendations, the authors calculate the similarity between users with respect to the rating information of each item, and the similarity between users’ health problems (semantic information). Since the goal was to provide a list of recommendations for each group of patients, the authors proposed a fair aggregation design, testing the results against other aggregation methods. The algorithm was tested using an artificial data set developed for the purpose of this work, with 10.000 chimeric patient profiles. The results show that the similarity function proposed gives better results than traditional similarity functions based on ratings. The main contributes from this work are: *A recommendation algorithm for group recommendations *A corpus of data which may be used for further research. Strong Points (SPs) 1. Approach of a strong topic: when patients seek for information online it is important to know what is relevant and was is not, because there are a lot of misleading information that may be more harmful than good. 2. The algorithm is well described: all the steps are clear and the addition of definitions is a strong positive point 3. The use of fairness to create the list of recommendations for the group: using this method all users will find at least one document interesting. Weak Points (WPs) 1. It is not clear if the documents to be recommended are online documents, since in the introduction it is mentioned that patients search for information online [“80% of all adults in US were estimated to have looked online for health information, whereas”], or documents provided by the doctor in paper or digital format. 2. In the introduction, it would be valuable a description of the section in the document, for a global view. 3. The corpus of simulated data is a positive point and a good contribution, however, it would be preferable to test the system in a real world dataset. Questions to the Authors (QAs) 1. In a real world application, how would users rate the documents? 2. Health data is highly sensitive about privacy. In this case, what could be done to protect users’ information? 3. It is not clear if the recommendations are given to the caregiver and the caregiver passes them to the patients, or if the patients have access to the recommendations directly, for example on an online platform.
Review 2 (by anonymous reviewer)
(RELEVANCE TO ESWC) The semantic contribution is very limited. (NOVELTY OF THE PROPOSED SOLUTION) Topics discussed on this papers are not particularly novel with respect to the state of the art. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) As explained in the complete review, there are some issues that should be addressed. (EVALUATION OF THE STATE-OF-THE-ART) Links with some relevant works are missing. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) Some parts are not very clear and should be better presented. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) Data are not available and the generation procedure has not been detailed enough for being reproducible. (OVERALL SCORE) The paper presents an approach for recommending documents based on users' profile. The medical domain has been used for demonstrating the effectiveness of the proposed approach where personal health information are used for selecting relevant documents. Even if the addressed problem is interesting, my impression is that this work should be improved before being considered for publication. I see three main issues. The first one concerns the link with the literature. Many works have been done concerning the aggregation of different criteria for producing document scores within an information retrieval system. An example is the one presented in: - Célia da Costa Pereira, Mauro Dragoni, Gabriella Pasi: Multidimensional relevance: Prioritized aggregation in a personalized Information Retrieval setting. Inf. Process. Manage. 48(2): 340-357 (2012) that is based on the OWA operator theory proposed by Yager in: - R.R. Yager: Modeling prioritized multicriteria decision making. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 34 (6) (2004), pp. 2396-2404 - R.R. Yager: Prioritized aggregation operators. International Journal of Approximate Reasoning, 48 (1) (2008), pp. 263-274 I think that these works should be took in consideration for refining the score aggregation model. The second issue is related to the very limited contribution of semantic information. Here, simple taxonomy information has been used for computing the similarity of patient profiles. The effectiveness of this solution has been already demonstrated also for general document collections and not only for domain specific ones. Example of papers supporting this statement are: - Mauro Dragoni, Célia da Costa Pereira, Andrea Tettamanzi: A conceptual representation of documents and queries for information retrieval systems by using light ontologies. Expert Syst. Appl. 39(12): 10376-10388 (2012) - Mustapha Baziz, Mohand Boughanem, Gabriella Pasi, Henri Prade: An Information Retrieval Driven by Ontology: from Query to Document Expansion. RIAO 2007 This fact strongly affect the novelty of the presented contribution. Finally, the third issue concerns the valuation. It is not clear how the document collection is generated and which is the content of each document. A link to the generated data (documents and profiles) should really helpful for having a more clear idea about the data used. By considering that everything has been generated from scratch there would not be copyright problems. Then, I would also expect some precision/recall measures by considering that the paper discusses a system for recommending documents. My impression is that these issues should be definitely addressed before considering this paper for the publication. --------------------- I thank the authors for their effort in preparing the rebuttal. After reading their reply, I confirm the score given earlier.
Review 3 (by Francesco Ronzano)
(RELEVANCE TO ESWC) Focused on the evaluation of group recommendation approaches with respect to: - the exploitation of semantic descriptions of users (relevant to ESWC) - the introduction of fairness in recommendation (NOVELTY OF THE PROPOSED SOLUTION) The introduction of ICD10 codes-based semantic user similarity, even if domain specific, is a novel element proposed by the paper, apart from the fairn group recommendation aggregation metric. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) Complete evaluation over an automatically generated dataset. (EVALUATION OF THE STATE-OF-THE-ART) SOA review included. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) Clear discussion of the approach. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) It would be great if the authors share the dataset used for evaluation purposes - even if most of the details of the generation procedure are explained. (OVERALL SCORE) ** Summary of the Paper ** This paper proposes an approach to recommend useful sources of health information to (the caregiver of) a group of users. Such recommendations are generated by relying on collaborative filtering and comparing two user similarity measures: a "more classical one" based on the Pearson correlation among the item (i.e. source of health information) ratings given by a pair of users and a new one based on the availability of semantic information describing the health status / problems of a single user, modeled by means of the set ICD10 codes associated to that user. After a brief overview of group recommendation and content recommendation in the health domain, the authors introduce both user similarity metrics. The metric based on the set ICD10 codes associated to a user relies on the fact that such codes are organized in a direct acyclic hierarchy. Given a pair of ICD10 codes (describing for instance two diseases of two users) the similarity of such codes is computed by relying on the lat to the lowest common ancestor. The ICD10 codes-based semantic similarity between a pair of users is equal to the average of the similarity score of each disease of the first user paired to the most similar disease of the second user. By relaying on these two user similarity metrics, the authors defines the relevance of an item for a group of users as the aggregation of the relevance values of the same item for each user in the group. Several approaches to aggregate single user relevance values are explored: two relevance score based measures - the minimum user relevance (UR), the average UR - and two item rank based measures - the Borda count and the Fair method. Also a Round Robin selection of the items to recommend to the group of users (the top ranked item for each user of the group) is considered. The Fail method is introduced in this paper to boost fairness in group recommendations. In order to quantify the fairness of group recommendation, a fairness metric is defined, averaging among all users the percentage of overlap of user recommendations and group recommendations. In order to balance the quality and the fairness of recommendations, the value metric is also defined. The authors evaluate their recommendation approach (relying on Pearson simil. and the proposed semantic user similarity measures) over an automatically generated set of patients, each of them associated to a set of health problems (ICD10 codes) and a set of rated documents. The proposed semantic user similarity measures generates better recommendations than the Pearson simil. of ratings if we consider the single users recommendation scenario. As far as concern group recommendation, the considered approaches to aggregate single user relevance values are compared with respect to Kendall / Spearman but also as far as concern the fairness and value of the recommendations for the whole group of users. Different group size are also considered to evaluate how this factor influences recommendation. ** Strong Points (SPs) ** Paper clear and well written. Interesting approach both to exploit semantic descriptions of user health problem in health info recommendation and to increase fairness in group recommendation. ** Weak Points (WPs) ** Possible issues concerning the automatically generated dataset used for evaluation (see below). ** Questions to the Authors (QAs) ** In your experiments with the automatically generated datasets, the recommendations based on the semantic similarity between ICD10 codes of users turn out to work better than the one based on Pearson correlation of user ratings. When you generate the dataset you state that, considering the totality of item ratings expressed by the users, 20% of such ratings are related to items describing the health problems of the users and 80% of the rating are given to randomly selected documents not associated to the health problems of the user. Could the better recommendation performance of ICD10 codes-based semantic user similarity be influenced by this assignment of rated-documents to users? In the creation of the dataset you generate user ratings randomly. Than in Table 3 you show that the distribution of user scores is 20% of items assigned to each of the scores 1, 4 and 5 - 10% of items assigned to the score 2 and 30% of items assigned to the score 3. Could you explain why you use this bias towards the score 3 penalizing the presence of items rated with score 2? It would be great if you could share the automatically generated dataset used for evaluation purposes. Table 2: in the last column, the use of parenthesis could appear incorrect with respect to operator precedence to someone: 1 - (0.2 + 0.2/3) = 0.87 could be better written 1 - (0.2 + 0.2)/3 = 0.87 (if / has precedence over +) Figures 3 and 4 are really difficult to read - it would be great to split them in two parts and make them bigger. Typo: Section 4 - Aggregation Designs: ......(i.e., less than out targeted k)... --> less than ouR targeted -------------------- ** After rebuttal ** Many thanks to the authors for answering the issues raise by the review. After reading their answers, I still have some of the doubts about biases introduced by the process of automated generation of the evaluation dataset. My final score remains unchanged.
Review 4 (by Tommaso Di Noia)
(RELEVANCE TO ESWC) This paper presents a work which may be too specific for the health domain; however, it presents interesting analysis on aggregation and fairness which paves the way for further experiments in group recommendations. (NOVELTY OF THE PROPOSED SOLUTION) This paper is among the first works addressing an investigation on aggregation methods and fairness in recommendation scenarios. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) Proposed strategies are well explained. (EVALUATION OF THE STATE-OF-THE-ART) Related works can be extended by looking at the literature recently published by the RecSys community (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) details on the proposed approach are provided by the authors (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The application domain is too specific; it deals with patients' health, for whom a public dataset is not available. Hence, the authors provide details on how to generate a dedicated dataset like the one they built for their experiments. (OVERALL SCORE) ---------------------------Summary of the paper-------------------------------- The authors propose a novel method to provide group recommendations in health domain. Their aim is to support caregivers in selecting relevant documents about health problems that affect the group of users for which she is responsible. Since a unique recommendation list suitable for all users of the same group has to be provided, they focus their attention to aggregation and fairness techniques. Furthermore, a so-called semantic similarity is proposed, which is based upon users' medical profiles and ICD10 taxonomy. Since datasets with patient profiles and ratings on health documents are not available, the authors provide a way to generate a synthetic dataset with the needed information. Results gathered with several similarities are compared with each other, in order to find out the best one. ---------------------------Strong Points-------------------------------------- - The paper is about group recommendations, a novel field which is recently attracting RecSys community attention. - An interesting overview of several aggregation methods and fairness is provided. - The authors explain in detail how they generated the exploited dataset, allowing readers to better understand their approach and to carry out further experiments. ---------------------------Weak Points---------------------------------------- - Comparisons with state-of-the-art methods are limited; obviously, the main problem regarding provided recommendations is the novel approach and the lack of a state-of-the-art group recommender. However, further investigations about well known similarities may be carried out. - Discussions about recommender results should be improved. - The dataset is synthetic; the degree of likelihood to real situations is unknown. ---------------------------Questions to the Authors--------------------------- 1. Why only MSE and RMSE are used to evaluate recommendations? How are they exactly used in the evaluation protocol? Can they be used differently? Even if it is clear why accuracy is mainly taken into account, also other metrics may be exploited. ---------------------------------After Rebuttal------------------------------- I thank the authors for their effort in preparing the rebuttal. After reading their reply, I actually lower my score. MAE and RMSE have been recently identified as wrong measures to evaluate a recommendation result as they give the same importance to errors in the top results as well as in the bottom ones. It is quite understandable that in recommendation scenario, errors in the top results have a higher value. This is the main reason why precision, recall, nDCG, ecc. are usually adopted to evaluate a recommender system. Moreover, other than accuracy, when evaluating a recommender system other measures as novelty and diversity of results should be taken into account.
Metareview by Maribel Acosta
This work tackles the problem of fair group recommendations in the context of Personal Health. The authors propose FairGRecs, an approach that relies on collaborative filtering and identifies relevant documents based on the descriptions of users (groups of patients). Empirical results on a synthetic dataset indicate that FairGRecs is able to increase the utility and fairness of the recommendations. The reviewers agreed that the research problem addressed in this paper is interesting for the Semantic Web. Nonetheless, the reviewers identified several major issues in this work. Mainly, the novelty of this work is unclear as the proposed approach is not properly positioned with respect to solutions in the areas of aggregation models or applications of taxonomical information. Further, the reviewers raised concerns about the dataset and the metrics used in the experimental study. In summary, the direction of this work is promising and relevant to the Life Sciences domain. Yet, in its current state the paper not ready for publication. We encourage the authors to address the reviewers' comments to improve the overall quality of this work.