Paper 226 (Resources track)

Build a corpus of scientific articles with semantic representation

Author(s): Jean-Claude Moissinac

Abstract: As part of the SemBib project, we undertook a semantic representation of the scientific production of Telecom Paristech. Beyond the internal objectives, this enriched corpus is a source of experimentation and a teaching resource. This work is based on the use of text mining methods to build graphs of knowledge, and then on the production of analyzes from these graphs. The main proposal is the disjoint graph production methodology, with clearly identified roles, to allow for differentiated uses, and in particular the comparison between graph production and exploitation methods. This article is above all a methodological proposition for the organization of semantic representation of publications, relying on methods of text mining. The proposed method facilitates progressive enrichment approaches to representations with evaluation possibilities at each step.

Keywords: semantic publishing; publication; Linked Data; SPARQL

