Author(s): Emrah İnan, Vahab Mostafapour, Burak Yonyul, Oguz Dikenelli
Abstract: Knowledge bases are used in different semantic data mining algorithms such as semantic search, question answering, and information extraction.
There are a vast amount of open domain knowledge bases for these algorithms. However, it is difficult automatically to generate annotated datasets for domain-specific tasks. This study presents a tool called SIEge as a multilingual domain-specific semantic embeddings generator for semantic information extraction tasks such as entity linking and relation extraction. Also, SIEge generates evaluation datasets for specific domains using Wikipedia and DBpedia. Wikipedia category pages and DBpedia taxonomy are used for adjusting domain-specific annotated text generation. In this study, we publish semantic embeddings model and evaluation datasets for entity linking and relation extraction tasks for the movie domain that is publicly available.
Keywords: Evaluation Dataset; Semantic Embeddings; DBpedia; Wikipedia