Author(s): Alaa Mohasseb, Mohamed Bader-El-Den, Mihaela Cocea
Abstract: Information retrieval (IR) has become one of the most popular Natural Language Processing (NLP) applications. IR approaches try to improve the technology used in finding relevant results, but many difficulties are still faced because of the continuous increase in the amount of web content. Part of speech (POS) parsing and tagging plays an important role in IR systems. A broad range of POS parsers and taggers tools have been proposed with the aim of helping to find a solution for the information retrieval problems, but most of these are tools based on generic NLP tags which do not capture domain-related information.
Moreover, most parser and tagger methods do not take into consideration the syntax structure of the text. In this research, we present a domain-specific parsing and tagging approach that uses not only generic POS tags but also domain-specific POS tags, grammatical rules, and domain knowledge. In addition, a tag-set that contains more than 10,000 words that could be used in different IR domains has been created. Experimental results show that our approach has a good level of accuracy when applying it to different domains.
Keywords: Natural Language Processing; POS Tagging; POS Parsing; Machine Learning; Text Mining