Towards Context-Aware Syntax Parsing and Tagging
Author(s): Alaa Mohasseb, Mohamed Bader-El-Den, Mihaela Cocea
Full text: submitted version
Abstract: Information retrieval (IR) has become one of the most popular Natural Language Processing (NLP) applications. IR approaches try to improve the technology used in finding relevant results, but many difficulties are still faced because of the continuous increase in the amount of web content. Part of speech (POS) parsing and tagging plays an important role in IR systems. A broad range of POS parsers and taggers tools have been proposed with the aim of helping to find a solution for the information retrieval problems, but most of these are tools based on generic NLP tags which do not capture domain-related information.
Moreover, most parser and tagger methods do not take into consideration the syntax structure of the text. In this research, we present a domain-specific parsing and tagging approach that uses not only generic POS tags but also domain-specific POS tags, grammatical rules, and domain knowledge. In addition, a tag-set that contains more than 10,000 words that could be used in different IR domains has been created. Experimental results show that our approach has a good level of accuracy when applying it to different domains.
Keywords: Natural Language Processing; POS Tagging; POS Parsing; Machine Learning; Text Mining
Review 1 (by anonymous reviewer)
(RELEVANCE TO ESWC) There is literally no relation to the Semantic Web in this work. (NOVELTY OF THE PROPOSED SOLUTION) Combining the different tags together is new, but there is no justification. The method (and even goal of the work) is not described at all clearly, so it's quite hard to understand what even the authors are trying to do. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The solution provided is quite confusing. Conflating annotations at three different levels (word, phrase and "domain-specific") makes no real sense to me - I would expect to see a clear justification for this, and some proof in the evaluation. The classes chosen for the domain-specific tags seem a bit arbitrary (e.g. "celebrity") and I cannot understand why these are conflated with POS tags at the same level. Surely finding the POS tags first is helpful to finding the NEs (and from there, some specific types of NEs perhaps). The terminology is very confusing - why do you not distinguish between a Noun (N) which is a standard POS tag, and Noun Phrase (NP) which is a standard parsing or chunking tag, as is standard linguistic practice? There seem to be many arbritary decisions which are not justified or explained. (EVALUATION OF THE STATE-OF-THE-ART) By conflating the ability to produce correct semantic tags with the ability to classify queries, the experiments are not really evaluating and comparing the different approaches. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) Again, many decisions are not explained or justified, both in the specific components and details of the approach, and on a wider scale of why the elements are conflated. There is no discussion of different tag sets, training data or their performance in the state of the art section. The section on parsing is just a random collection of parsing methods with no coherent analysis or structure. I would expect to see some explanation about domain adaptivity etc. The tool could be compared for its tagging quality against baseline or existing tools, but it isn’t. Many things about the proposed approach are not clear, such as why a new POS tag set was necessary instead of reusing an established existing one, of which there are several. What benefit is the new POS tag set? Conflating the POS tag set and search category seems a very arbitrary approach that is not really justified. It's not clear if and how multiple categorisation is allowed. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The evaluation uses a standard dataset, which is useful. However, since there are many unclear details of the approach, it could be hard to reproduce. (OVERALL SCORE) Summary: the paper describes an approach combining POS tags and some other kinds of semantic information in order to help classify queries into different types for IR. Strong points: - the method is evaluated on a standard dataset. and performs better in some areas. - the work could be interesting if it were properly explained and justified (though not in a Semantic Web conference). Weak points: - Overall, the paper is confusing in its description. - the evaluation doesn't actually show how good the parsing/tagging ability is. - It's not clear how and why many critical decisions were taken. - there is no relation at all to the Semantic Web.
Review 2 (by anonymous reviewer)
(RELEVANCE TO ESWC) There seems to be no direct reference to semantic web technologies. (NOVELTY OF THE PROPOSED SOLUTION) Little novelty, mostly a straightforward application of classification methods. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The reference tag set has been produced by the paper’s authors and is otherwise unknown in the NLP community. The proposed solution does not seem to be correct. (EVALUATION OF THE STATE-OF-THE-ART) In the background section, the paper cites state of the art solutions but since the target tag set has not been used and even unknown in NLP area, the comparison is not convincing. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) Each element of the proposed framework is demonstrated and explained, but the feature set that is used in the classification task (tagging) is not explained. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The reproducibility is limited, since the queries (randomly selected from publicly available sets) are not identified. (OVERALL SCORE) **Short description of the problem tackled in the paper, main contributions, and results** In this paper, the authors try to propose a domain adapted Part of speech tagging. They suggest a framework that starts with grammar generation, based on a context free grammar (CFG). Then they extract phrases from the syntactic structures that were generated in first step, referred to as “Parsing”. Finally, they use classification techniques like SVM and Naïve Bayes to map each part of sentence to specific POS tags. -Strong Points (SPs) ** Enumerate and explain at least three Strong Points of this work** The paper does not present any novel techniques or a radically new approach -Weak Points (WPs) ** Enumerate and explain at least three Weak Points of this work** I would like to point to some essential issues in this paper: 1-There is no clear definition of domain. It has been mentioned that search engines and question answering (Q/A) are domains as well as social networks. However, we can name social media/network content as a new domain since the utterance there is quite different than narrative content, but Q/A and search engines are rather special tasks that can be applied to any domain. 2-POS tagging is a preliminary task in natural language processing. This means that it comes before generating the syntactic structure of shallow parsing. However, in the framework it comes after grammar generation. Which means that during the grammar generation task, there is another internal POS tagging task which impacts performance. 3-The paper refers to a POS tag set that is not common in the NLP community, and it contains syntactic and semantic tags. The POS tagging task mainly targets the syntactic tags not semantics. That papers defining the POS tag were written by the same authors as this paper. 4-In the classification task, there is no any clue which feature set has been used and what are the errors. The error analysis is always interesting in POS tagging tasks.
Review 3 (by Petya Osenova)
(RELEVANCE TO ESWC) The title of the paper, as given by the authors, is suitable to ESWC. However, in my opinion there is discrepancy between the title and/within paper content. Introduction and Related work report on POS tagging and Parsing, while the body parts have nothing to do with these. They rather provide pattern-based annotations of a focused shallow grammar. I do not agree that the general POS tags are not useful in domains, but the 10 000 words in the so-called tagset are. (NOVELTY OF THE PROPOSED SOLUTION) I understand that authors try to combine morphosyntactic and semantic information in pursuing their task on classification of search queries and classification of questions, but I do not see any novelty here. The presented state-of-the-art is not coherently presented with respect to authors' tasks. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) First of all, I think that the POS tagging and Parsing (as used in NLP) have not been used in this very sense here in the paper. Authors seem to use some mixture of POS tagging/bag-of-words with chunking and NE recognition. To me it seems that all the NLP notions are not used correctly in the paper. For example, a POS tagset cannot be just words or combination of words with categorization tags; parsing can rarely come before tagging, etc. Maybe the terminology should be made more precise and straightforward. (EVALUATION OF THE STATE-OF-THE-ART) The state-of-the-art that is provided is not directly related to the approaches that authors use in their experimental set-up. Some question answering tasks are called 'domains'. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) Analyses of the sentence structure and domain knowledge are notoriously complex to get. I do not see information how exactly these challenges are faced. More precisely, how the successful change of processing chain from one domain into another can be secured? (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) I am not sure how reproducible the reported results are, given the fact that authors do not work with real domains, such as finance, medical domain, news, etc. How can the domain specific grammar be easily integrated? Is there any evaluation on the efforts when changing one domain into another one? (OVERALL SCORE) The paper claims that it reports on domain-specific parsing and tagging. However, I do not think that classification of search queries and classification of questions qualify as domains. It seems to be rather a cross-domain approach, which reports very high results. Three strong points: - authors are aware of the latest developments in NLP processing - authors use semantic categorization features in the processing - valuable discussion section Three weak points: - there is discrepancy between the claimed NLP methods and the real experiments - a strange framework in Fig. 1, which uses tagging after parsing and where there is a grammar component (seems shallow syntax and semantics) that is not well decsribed - there is no direct comparability with the state-of-the-art approaches. Questions: - what are 'action verbs'? They are opposed to what other kind of verbs? - why are words considered non-terminal symbols? - how are the domain specific grammatical categories adapt from domain to domain?
Review 4 (by Anett Hoppe)
(RELEVANCE TO ESWC) The paper makes no reference to Semantic Web technologies or other topics central to ESWC. Content-wise, I would rather see it in a more IR-focussed venue. (NOVELTY OF THE PROPOSED SOLUTION) According to the cited related work, the approach seems to be rather innovative and is based on the authors’ own preliminary work [24, 26]. The distinction from existing works could be stated more clearly. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) See below (demonstration and discussion of the proposed approach), I have trouble to comprehend the approach based on the given specifications. While it might be a relevant contribution, I cannot (a) see how you generate and use the domain specificity; (b) see how your work related to the state of the art. (EVALUATION OF THE STATE-OF-THE-ART) Comprehensive list of related references, ad-hoc search could not reveal additional resources. Anyhow, it is not always 100% clear how your work distinguishes itself from existing approaches. Thus, Sections 2.1 and 2.2 read like a list of existing approaches and could be improved by clearly outlining (a) the shortcomings of these existing approaches and (b) how your work solves some of them. Given the number of referenced works, a table comparing core features of the approaches could be helpful. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The meaning of the term “domain” should be stated clearly, perhaps using examples (such as the later used Q/A application case). The introduction could be improved by a clear statement on how the domain specificity will improve the performance in a certain target application (without the reader referring to [31,20]). Perhaps in form of some examples of (a) generic tags which do not contribut to performance in IR tasks, (b) specialized tags which have proven to do so (then perhaps refer to more detailed description of tag set in 3.1) Section 3.1: Could be improved by a short summary or some examples of how the domain-specific tag set differs from a generic one (without forcing the reader to search through the table in the Appendix) Section 3.2: I have some trouble understanding how exactly things are done here. How is the domain-specific grammar generated (indeed, this seems the core interest of the paper, so some more details are desirable)? How is the parsing done? Are existing frameworks used, which? After all the knowledge the reader is presumed to have, Figure 2 seems unnecessary. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) It should be clearly stated which of the evaluated methods is the newly developed one and which of them is to be considered the baseline (if there is one). As far as I can see, there is no comparison against the state of the art (and the reasoning for this missing element, presented in the discussion, is rather thin). Would it be possible to compare the performance of one classifier, once using domain-specific tagging and once a generic method? (OVERALL SCORE) The paper proposes the usage of application-specific tagging and parsing for the classification of queries in Information Retrieval tasks. The adaptation herein has not a topical focus, but aims to provide specific part of speech tags for specific IR applications such as question answering and query classification. Good performance with respect to other existing approaches is claimed. Strengths: The covered related work seems extensive, so does the discussion. The idea that domain-specific tagging and parsing could improve performance seems sound, but could be presented more convincing (see above, perhaps in form of examples). Weaknesses: The novelty of the presented work is not always clear and could be highlighted better. Anyhow, the major weakness of the paper is the lack of a clear description of the actual approach – the reader is not able to comprehend and reproduce _how_ exactly the methods are implemented, thus a reproduction of the results seems hardly possible. No source code is provided. Furthermore, from the evaluation, the quality of the novel approach is not clear – there is no evaluation against methods which do not use domain-specific tagging (or it is not clearly stated) and the argumentation for not comparing against existing methods is not convincing.
Metareview by Valentina Presutti
The paper proposes an approach to perform domain specific parsing and tagging. Tha main criticism raised independently by all reviewers is that the paper is not appropriate for this conference. Although there is a NLP track, papers submitted here should show how to use semantic web technologies for improving NLP or how NLP can improve semantic web technologies, otherwise their natural target should be NLP conferences. Besides relevance to the conference the paper fails to provide the necessary information to make the reviewers appreciate the proposed approach. The problem is not clearly defined and the overall approach and its evaluation are not supported by appropriate justifications and explanations.