Paper 111 (In-Use track)

Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Job Description Similarity Computation

Author(s): Joydeep Mondal, Sarthak Ahuja, Kushal Mukherjee, Sudhanshu Singh, Gyana Parija

Full text: submitted version

camera ready version

Decision: accept

Abstract: Most solutions providing hiring analytics involve mapping provided job descriptions to a standard job framework, thereby requiring computation of a document similarity score between two job descriptions. Finding semantic similarity between a pair of documents is a problem that is yet to be solved satisfactorily over all possible domains/contexts. Most document similarity calculation exercises require a large corpus of data for training the underlying models.
In this paper we compare three methods of document similarity for job descriptions – topic modeling (LDA), doc2vec, and a novel part-of-speech tagging based document similarity (POSDC) calculation method. LDA and doc2vec require a large corpus of data to train, while POCDC exploits a domain specific property of descriptive documents (such as job descriptions) that enables us to compare two documents in isolation. POSDC method is based on an ”action-object-attribute” representation of documents, that allows meaningful comparisons. We use Standford Core NLP and NLTK Wordnet to do a multilevel semantic match between the actions and corresponding objects. We use sklearn for topic modeling and gensim for doc2vec. We compare the results from these three methods based on IBM Kenexa Talent frameworks job taxonomy

Keywords: Lexical semantics; Document similarity; NLP; Collaborative Cognition


Review 1 (by Sabrina Kirrane)


This paper examines mechanisms that can be used to match job candidates with job descriptions, based on semantic similarity between documents. In essence the paper compares topic modeling (LDA) and doc2vec, with the proposed part-of-speech tagging based document similarity (POSDC) calculation method.
Although the work is guided by a real-life problem and the evaluation is performed over real data, the use of semantic technologies and the impact for the Semantic Web community is low. 
Additionally, the paper reads more like a research paper than an in-use paper, focusing primarily on demonstrating the effectiveness of the proposed approach as opposed to discussing the challenges and limitations that are inherent in real world data.
-	The paper is easy to understand and generally flows well
-	The provided definitions and algorithms help to ensure reusability
-	The proposed approach is evaluated over real world data
-	The paper does not employ semantic technologies (in the traditional sense i.e. RDF, ontologies etc..)
-	It is hard to assess the generality of the proposed approach as it is evaluated over a single dataset 
-	The effectiveness of the proposed approach in terms of the overall use case has not been addressed, for example what are the strengths and weaknesses of the proposed approach? Do other NLP packages exhibit similar results? What are the inherit challenges when dealing with real data?
Many thanks for the clarifications provided in the rebuttal. Given that the paper is for the in use track I think it is important to demonstrate the effectiveness of the approach over more than one dataset, and to clearly highlight the issues when dealing with real data, therefore I will keep with my original score.


Review 2 (by anonymous reviewer)


The paper presents an interesting comparison between two state-of-the art document-to-document comparison method and a new approach that is based on syntactic dependency parsing and does not require training data. The proposed approach achieves results comparable or superior to those of an LDA approach, depending on the size of the training corpus, and outperforms a doc2vec-based approach, on a given document genre: job descriptions. The approach is proposed to help matching job offers to previous job experience description.
The paper is well structured, well written and provides a detailed description of several of their algorithms, which is interesting with respect to reproducibility. One shortcoming of the approach is mentioned in the future work section, namely that it is suitable to a document genre that is focused on simple sentences aimed at describing factual experience or requirements. It would be interesting to see the same method applied to different document genres and have an idea of the performance. 
My main questions are around the semantic expansion that their method implements using WordNet:
-	The authors state that word vectors methods create different vectors for synonyms (as one vector is created per string, with a baseline setting) and gathers different meanings/contexts of homonyms into one vector, which is a fair point, it is indeed a risk. However, how do the authors map unambiguously to the correct Synset of WordNet the individual words of their dictionary? WordNet is famous for having a very fined grained distinction between various sense and to provide a different Synset per sense identified? And, with respect to the shortcoming that they identified for word2vec, how do the authors manage to not aggregate senses for homonym words? Do they ever encounter cases where words that are synonyms in the context of their corpus gets matched to different WordNet synsets? If so, how do they solve the problem?
-	The authors match individual words to WordNet, if my understanding of the paper is correct; is it because WordNet has mostly single-word entries? Is there a measure of how many of these decomposed multi-word terms get a semantic drift by being decomposed at matching time? For example “time machine” can be correctly matched to “time” and “machine” for gathering synonyms, but “Human Resources”  is not accurately represented by the synonyms of “human” and “resources” taken independently. 
-	WordNet is a generic resource, how do the authors cope with the comparison of new jobs and very specific domains that are not covered by WordNet? 
The state of the art section does not mention research using dependency analysis for document similarity computation; the field is rather large though and this approach has been used since the early days of comparable documents alignment across languages, to identify matching candidate translations.
Finally, a minor comment, there also seems to be part of a sentence missing end of page 5: “We assume all the sentences in job description documents were in ”.
The arguments in the rebutal (particularly around WordNet) address some of my questions and would be, in my humble opinion, a good addition to the paper. Because of the possibly limited applicability of the paper's method, I would still rate it with 2.


Review 3 (by anonymous reviewer)


I'm not an expert in this area but the paper is well written and and the method described in clear terms, including references to alternative approaches.


Review 4 (by Astrid van Aggelen)


***update after rebuttal and open review procedure***
I would like to thank the authors for writing a rebuttal. Given the overall agreement of the reviewers that the presented method addresses a real-life problem, and is interesting and substantial, I have changed the score of my review and I support accepting the paper. While I still think the paper itself has shortcomings, to make it feasible to have it revised in time, I would suggest that the authors prioritise adapting the state of the art section and including a discussion of the challenges and limitations of the method.
***original review***
This paper presents an unsupervised method to formalise the similarity between documents of job descriptions. Unlike many previous approaches, the proposed method does not rely on a background model based on a large corpus, as it models documents as sets of (dependency-based) triples of actions, objects, and attributes. To compare pairs of documents, their sets of triples are first aligned in a one-on-one fashion, by maximising the overall similarity score of the document pair. The comparison of any two triples takes into account the similarities of their constituent words, i.e. the pairwise noun-noun, verb-verb, and adjective-adjective similarities, in the form of WordNet graph distances. These word similarities are weighted by their importance (e.g. verbs are given more weight than attributes) to build up to a similarity measure for triples and then documents.
The presented method might be valid and interesting, but this position cannot be formed based on the paper, which does not do it justice. It puts weight on matters plain and simple while rushing over important or difficult parts. My main concerns are as follows (in random order):
1) The literature section lacks focus. It does not list very closely related work; surely there are tons of examples of document classification based on a similar formal representation of their content (as dependencies or RDF triples, augmented with a taxonomy)?
2) The motivation for why (or when) structured approaches are more suited than bag-of-words is unclear. You might need to craft some examples to make this point.
3) The formalisations are not handled well. For the most part the listed algorithms are trivial. For instance, it suffices to say that you used tokenization and dependency parsing. Where they are not trivial, I doubt the usefulness of inserting them as algorithms, as they need to be explained in text anyway (which is not currently the case).
4) The method section is very unclear, and would be much better if it treated word-word, triple-triple, and document-document similarity separately. The definitions in section 3.4 need a word of explanation: what are these formulas based on?
5) Description of the corpus is insufficient. On average, how long are the documents? How many triples are they represented as? What are the job families that you evaluate on, and can you give an example of each? Without this information it is difficult to assess, for instance, how keyword-cued these job groups are, and hence whether LDA might have an advantage over your approach. Given two job families “IT” and “business”, I can imagine two jobs across fields (e.g. IT product owner and business manager) could share more keywords than some jobs within a field (e.g. database manager and IT product owner).
6) A discussion of the results is lacking.
7) The ”small experiment” from section 3.2 is unscientific and not reproducible: fine as a small sense-making test behind the scenes, but not suited for publication.
8) The paper is full of spelling and grammar errors.
In all, I think the publication needs adaptations too encompassing for the given occasion, and I advise to not accept it for publication. I do think the work is interesting and hope the authors will resubmit this work on a later occasion.


Review 5 (by Anna Tordai)


This is a metareview for the paper that summarizes the opinions of the individual reviewers.
The work is guided by a real life problem and the approach is compared against multiple alternatives in an evaluation on real-world data. Nonetheless, the suitability of this paper for the In-use track is not undisputed, as the impact of the technology has not been demonstrated. The reviewers point out that the paper lacks detail in the method section, which has mostly been addressed by the authors in the rebuttal. 
Laura Hollink & Anna Tordai


Share on

Leave a Reply

Your email address will not be published. Required fields are marked *