Paper 187 (Research track)

Improved Categorization of Computer-assisted Ontology Construction Systems- focus on Bootstrapping capabilities

Author(s): Omar Qawasmeh, Antoine Zimmermann, Maxime Lefrançois, Pierre Maret

Full text: submitted version

Abstract: In this research we investigate the problem of ontology construction in both automatic and semi-automatic approaches. Four categories have been defined in the literature to classify such approaches: 1. Conversion or translation, 2. Mining based, 3. External knowledge based, and 4. Frameworks. In this paper we present an updated state of the art of such approaches using this classification. We also propose additional classification of existing work according to features that we
newly introduce: 1. reusability, 2. type of extracted data, 3. bootstrapping capabilities. These features are crucial for ontology construction because they address two key issues: the blank page problem (i.e. starting the development of an ontology from a blank page) and the lack of the availability of the domain experts. We finally describe an approach for ontology construction including the bootstrapping feature. For this feature we take advantage of external knowledge bases. We report on a comparative study between our system and the existing ones on the wine ontology.

Keywords: Ontology; Knowledge base; Ontology Bootstrapping

Decision: reject

Review 1 (by Anna Kaspzik)

(RELEVANCE TO ESWC) The construction of high-quality ontologies and its (semi-)automation is highly relevant to ESWC.
(NOVELTY OF THE PROPOSED SOLUTION) The first main contribution of the paper is a survey and classification of existing approaches by criteria taken from another paper ([7]). Concerning its other two contributions, the three new criteria and the additional approach that are introduced seem useful but not out of the ordinary (except for the use of NELL in addition to DBpedia and WikiData) so that in my opinion the strength of the paper cannot be attributed to novel contents. (Suggestion: Maybe one step towards more novelty could have been the use of an ontology for an actual use case instead of the old evergreen example of a wine ontology, as suggested in Section 7 for future work.)
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) Since two of the three contributions of the paper consist of a survey and classification of existing approaches, this category can only be applied to the third contribution, an additional bootstrapping functionality in the ontology development process based on large public knowledge bases. This functionality does not require a very high degree of formality and seems correct and complete. The algorithm is given explicitly, and the steps are motivated and explained in sufficient detail.
(EVALUATION OF THE STATE-OF-THE-ART) (A remark up front: The statement that since 2007 there has not been any other survey of the SotA seems to be a bold claim -- a quick Google Scholar search yields for example a paper by Subhashini & Akilandeswari (2011), and there may be others?)
For the State-of-the-Art, the authors adopt the criteria as given by the main reference for the SotA ([7]) and classify 13 new approaches according to those criteria. Since the presentation of the SotA is actually one of the main contributions of this paper it takes up a big portion of the document. Accordingly, the SotA is fairly extensive, however, due to the number of approaches that are classified each approach is addressed in a few lines only and one has to refer to the table and references for the actual properties of the respective approach. Since the three new criteria proposed by the authors are actually part of the contribution of the paper these cannot be judged in this category (normally, the SotA is given in order to prepare the ground for the contributions of a paper) and will therefore be discussed in the next.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) Concerning the three new criteria for the classification of ontology construction methods and their impact on the evaluation of those methods: Although the criteria seem useful they are only listed at the beginning of Section 4 without further motivation for their introduction. The conclusions drawn at the end of Section 4 seem reasonable although they could refer more explicitly to the three newly introduced criteria.
Concerning the proposed additional bootstrapping functionality in the ontology development process: The solution and its prerequisites and resources are explained in sufficient detail. The properties of the approach are demonstrated via the experimental study and discussed very shortly in two sentences in the conclusion of the paper.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The experimental study is general since it uses three large public knowledge bases (and a very well-known ontology as an example). The experiment is described in acceptable detail and is thus reproducible although it is modeled after another reference ([27]) and one probably has to refer to [27] in order to understand the rationale of the conducted study in more depth.
(OVERALL SCORE) The paper presents an updated State-of-the-Art for ontology construction methods and defines additional criteria for their classification. The authors also propose an additional bootstrapping functionality in the ontology development process based on large public knowledge bases.
Strong points:
- The State-of-the-Art is based on sound criteria and adds clarity to the general classification of ontology construction methods.
- The three new criteria seem reasonable, and the additional bootstrapping functionality is explained in sufficient detail.
- The use of NELL for ontology population is an interesting approach.
Weak points:
- Maybe the strength of the paper suffers a bit from the fact that it makes two smaller contributions of which one consists in a survey of existing approaches.
- The classification of existing approaches is mainly based on criteria taken from another paper, and the three new criteria are not out of the ordinary.
- The advantages of additional functionality could have been motivated by a real use case (you state at the beginning of Section 6 that this is hard to do but I really think that it would have made your approach more appealing for various applications, and in Section 7 you do mention similar ideas for future work).
Questions to the authors:
- What is the additional value of populating the generated ontoloy instantly with instances by using NELL? Maybe you could motivate that in more detail in order to underline the strength of your approach.
- Also, can you motivate why you use DBpedia for the extraction of general information and WikiData for classes and relations although superficially both resources provide information in RDF?
- On page 10 you state that you also extract types from DPpedia but in your algorithm I cannot find a variable for that, only for abstracts, labels, and URIs. Where are the types?
Typos and small mistakes you might want to correct:
p.2: NELL (_Never ... ); 13 _items of_ more recent work
p.3: These problem_s
p.4: up_coming; _a_ relational database schema to _an_ RDF-OWL ... ; able to construct _an_ ontology from _ raw text; 54%_._
p.6: to define _a_ hierarchy; 68.5% of _the_ knowledge concepts; 2. match_ and analyze_ [etc. -- use the infinitive consistently]; There finding _is that_; size of the dictionary of concepts; to _ manually annotated
p.7: [First sentences: maybe mention that the target domain was wine?]; terms and relation_s_ _are_; _the_ Alzheimer glossary; papers on Alzheimer_'s_ disease; a system that process_es_; a list of concepts from them [-- from what?]; _the_ users_'_; approaches_._; life-cycle [consistency]
p.8: e.g. list_s_ of concepts; availability of such _a_ dictionary; offers _a_ better scientific
p.9: _the_ NELL knowledge base; As shown _in_ Figure 2; repeated and _thus_ provide_s_ 
p.10: from _the_ Wikipedia page _for wine_; different queries over _ Wikidata; returns _ the IDs
p.11: web pages_; _the_ NELL knowledge base 
p.12: Recall _that the_ authors in; _The_ authors in [27] use _the_ keyword; comparison i_s_ fair; instances _that_ our system suggests 
p.13: than [27] _which_ is based on WordNet


Review 2 (by anonymous reviewer)

(RELEVANCE TO ESWC) The paper tackles the problem of ontology construction via (semi-)automatic approaches. The problem  is well-known in the Semantic Web community. Thus, the submission is relevant to the conference.
(NOVELTY OF THE PROPOSED SOLUTION) The novelty of the solution seems rather limited although there are some aspects that could be interesting for the Semantic Web community. The paper illustrates two contributions.
Concerning the first contribution,  the authors  describe various methods for ontology construction and their limits. Such discussion is really interesting. However, the proposed classification of the state-of-the-art solutions is not really new (as admitted by the authors). As regards the second contribution, an approach for bootstrapping ontologies is proposed but it is not clear how the described method tries to overcome the drawbacks of the other methods and what benefits an user obtain adopting the authors' system. In this perspective, the authors should better explain the novelty of the second contribution.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) As regards the survey of the state-of-the-art solutions, the authors describe a lot of  methods for ontology construction although important references are missing (e.g. Volker et [email protected]).
Focusing on the correctness and the completeness of the method for extracting information from DBpedia, WikiData, and NELL,  some information is missing. In Sect. 5 the authors should give more details illustrating how the system returns an ontology as an output and, in particular,  how the system integrates the extracted information. For instance, the paper does not explain if the algorithm keeps some information useful to link DBPedia types denoted by URIs (that should represent DBpedia classes) to the concepts/classes extracted from WikiData (e.g. does the user manually add owl:sameAs relationship between DBpedia types and Wikidata classes? )
(EVALUATION OF THE STATE-OF-THE-ART) The state-of-the-art methods have been widely discussed throughout the paper but the evaluation seems to consider them only marginally. Moreover the authors do not give insight of the outcomes obtained by the solutions considered in the evaluation.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The properties of the approach have not been sufficiently discussed. It is hard to understand the motivation for the proposed vmethod (described in Sect. 5) and how the solution overcomes  the drawbacks of the existing methods. To improve the paper, the authors can discuss many aspects, such as the introduction of inconsistency cases in the resulting knowledge base, the efficiency of the approach, etc.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The experimental study is sufficiently reproducible: a reader can easily re-implement the system (even if the paper must clarify some aspects of the algorithm) and the outcomes of the evaluation are available. However, I suggest to publish also the source code (e.g. Github Repository).
Concerning the generality of experimental study, the authors tested the method only considering one keyword as a seed.  This is fair with the evaluation of the other methods  but, as a consequence,  drawing general conclusions about the strength and weakness of the authors' system  is hard. To deal this this problem, the authors should test the method considering different seeds and/or  different external knowledge bases. Finally, the authors should motivate the choice of the parameters adopted in the experiments, such as the threshold value used for retrieving instances from NELL.
(OVERALL SCORE) *** Summary  of the paper***
The paper proposes a short survey of the state-of-the-arts methods adopted for ontology construction and a further approach based on external knowledge base. The authors use the method for building a new  wine ontology and compare them (in terms of number of classes, properties and instances) against the W3C's wine ontology and another version obtained using WordNet as an external knowledge source. 
****Strong and Weak points:****
The strong and weak point of the paper are summarized in the following:
1) Strong points:
1-  Clarity: The paper is well-written and organized.
2-  Good survey about the state-of-the art-methods:  the paper describes a lot of methods classified them according to different dimensions. The limits of the state-of-the art methods have been discussed.
3- Reproduce: a reader can reproduce the experiments only with a limited effort (although I suggest to publish the source code).
2) Weak Points:
1- Limited novelty:  The author stated that the paper has two contributions. Concerning the first one, the classification of the various methods proposed in the literature is not really new (as admitted by the authors). As regards the second contribution, throughout the paper  the differences between the method illustrated in paper  and  the existing solutions are not clear. Particularly, the authors report various drawbacks of the existing solutions but they do not discuss which problem they attempted  to solve.
2- Limited evaluation: The evaluation is at a very early stage. The paper requires modifies to provide a better insight of the results (see "Demonstration and Discussion of the Properties of the Proposed Approach"  and  "Reproducibility and Generality of the Experimental Study").
3- Some missing references:  For instance: 
Volker et [email protected]  2011, Statistical Schema Induction: the paper describes an approach to elicit an ontology schema through association rule mining. This is  close to  both mining-based approaches and ontology development by conversion described by the authors.
****Questions to the Authors (QAs)****
The questions to the authors are reported below:
QA1-  How does the method illustrated in Sect 5 differ from the existing solutions  and which advantages (efficiency, efficacy) does an user obtain?
QA2- How the information retrieved from the various knowledge sources are integrated  to build the ontology(e.g. using owl:sameAs)?
QA3-In the experiments, the authors adopted a threshold of 0.94 for retrieving instances from NELL. How did the authors choose this value?
QA4 - Did the authors test the procedure varying the keywords and the threshold value for building ontologies?


Review 3 (by Valentina Janev)

(RELEVANCE TO ESWC) The paper is relevant because it addresses a fundamental issues in Semantic Web - ontology construction.
(NOVELTY OF THE PROPOSED SOLUTION) Two contributions:
- presents a state of the art of automatic ontology construction approaches, that could be further improved in a journal paper
- introduces an algorithm for ontology bootstrapping based on the use of three external knowledge bases
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) I see problems with the objectives of the paper. The paper has two parts : the state-of-the-art (Section 2, 3 and 4) and the algorithm with the experiment (Section 5, 6). 
The title reflects the first part - Improved Categorization of Computer-assisted
Ontology Construction Systems. 
The based the work on a categorization developed in 2007 by I. Bedini and B. Nguyen. They slighttly improve the categorization - therefore the contribution is not realy related to - Improved Categorization...
The system designed by authors is not fully tested i.e. one experiment in one domain only.
(EVALUATION OF THE STATE-OF-THE-ART) In Section 3, the authors use four categories to identify the state-pf-the-art ontology construction approaches/tools developed in last 10 years: 1. conversion or translation based,
2. mining based, 3. based on external knowledge integration, and 4. based on frameworks.
In my opinion, the autors missed to include the eXtreme methodology, see Engineering Ontologies with Patterns-The eXtreme Design Methodology. (E Blomqvist, K Hammar, V Presutti - 2016) 
The work in Section 5 (authors system) is not included in the state-of-the-art analysis (page 5).
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) Here, I evaluate just the work in Section 5 and 6. The system is still in the experimental phase.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) it is expected to show some advancement over the existing methods, but that is not proved in this paper.
(OVERALL SCORE) Weak Points (WPs) 
- state-of-the-art could be further improved
- no experiments with real data It seems that just one experiment has been conducted. The compared the results of this experiment with results from 2006 (H. Kong, M. Hwang, and P. Kim, “Design of the automatic ontology building system about the specific domain knowledge,” in Advanced Communication Technology,2006. ICACT 2006. The 8th International Conference, vol. 2, pp. 4–pp, IEEE, 2006.) and the W3C wine ontology (2003)
Strong Points:
- state-of-the art in the selected domain, that in my opinion should be further improved


Review 4 (by Vojtěch Svátek)

(RELEVANCE TO ESWC) Ontology construction is definitely one of the major ESWC topics.
(NOVELTY OF THE PROPOSED SOLUTION) The authors' own approach might be somewhat novel (in the sense of the blend of background sources used), though far from groundbreaking. The comparison with related research is unconvincing; similar methods have been previously proposed. An example (relying on DBpedia and OpenCyC) is
Albert Weichselbraun, Gerhard Wohlgenannt, Arno Scharl, Refining non-taxonomic relation labels with external structured data to support ontology learning, Data & Knowledge Engineering, Volume 69, Issue 8, 2010, Pages 763-778.
But there are definitely many more.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The original OL lifecycle model and the categorization of OL approaches in [7] already seem a bit flawed to me. But the way the authors textually reinterpret it in the current paper even more. 
"Extraction: defining the type of the input..." - extraction is clearly not about just defining the type of input!
"Generation: merging the generated ontology..." - you call 'generation' what actually uses a generated ontology as input - this is at least confusing.
"Conversion or translation: ... does not really address the problem of ontology construction." Why? If the source (say, XML or UML) is semantically sound, I believe a solid core of an ontology can be built this way.
"Frameworks: approaches that integrate different modules..." Perhaps it would make sense to speak about a 'hybrid' category encompassing multiple of those above. But the 'framework' aspect has little to do with the actual nature of the OL process. 
As regards the reuse and extension of the categorization by the authors 10 years later:
- Validation, as presented in the table, seems (even more than in the old survey) to reflect the particular setting of a study in the reviewed paper rather than an inherent feature of the approach.
- Reusability as treated here is mostly an aspect of an implemented tool (source code availability or the like) rather than of the approach as such.
- "Classes of extracted data", presented as "classes, properties, instances, relations, and classification of objects", is also a dubious category. Especially in OWL, properties are nearly the same as 'relations', and similarly for 'instances' and 'objects'. The actual content of the fields does not provide much clue either.
- The notion of bootstrapping is not explained. This term is clearly used in a broader sense than in Machine Learning literature. OTOH, if it just means that a system automatically creates a part of the ontology, thus saving the effort of a human designer, then nearly all of OL falls under this category, and the 'type of bootstrapping' is rather 'type of OL'. 
As a survey paper, the work is also strongly flawed by the absence of a clear literature review protocol. The authors merely state that they "present here the most recent approaches (13 new research approaches)". Actually, some are not even that recent (from 2008), and they are not 13, since two of the reported papers (by Kong, and Kietz) have already been in the previous survey.
The individual descriptions of approaches is verbose, heterogeneous, and even the categorization is sometimes unintuitive. For example, the summary of [12] does not mention any kind of 'framework'. 
The description of the authors' own proposed method is inconsistent. While Fig.2 positions the use of NELL after the user input, in Algo 1 it is the opposite. The former also contains an iteration, while the latter doesn't. The two representations of the approach are indeed hard to match.
I also miss formal expressions (SPARQL templates?) of the queries posed to the external resources.
Item 3 in p.10 speaks about 'relations', but Table 3 does not list any.
(EVALUATION OF THE STATE-OF-THE-ART) The proposed system is not compared with its predecessors working on similar principles. The only direct comparison is made with a 12-years old approach [27]. The landscape of available external knowledge bases has tremendously changed meanwhile, and newer systems definitely use them, too. (Some of them that cite [27] can be, e.g., found using Google Scholar.)
As regards the survey part of the paper, aside [7] on which the authors build, a couple of other surveys have recently addressed the ontology learning field, see, e.g.,
Petasis G., Karkaletsis V., Paliouras G., Krithara A., Zavitsanos E. (2011) Ontology Population and Enrichment: State of the Art. In: Paliouras G., Spyropoulos C.D., Tsatsaronis G. (eds) Knowledge-Driven Multimedia Information Extraction and Ontology Evolution. Lecture Notes in Computer Science, vol 6050. Springer, Berlin, Heidelberg
Wilson Wong, Wei Liu, and Mohammed Bennamoun. 2012. Ontology learning from text: A look back and into the future. ACM Comput. Surv. 44, 4, Article 20 (September 2012), 36 pages. DOI=http://dx.doi.org/10.1145/2333112.2333115
Drumond, L. & Girardi, R. (2008). A Survey of Ontology Learning Procedures.. In F. L. G. de Freitas, H. Stuckenschmidt, H. S. Pinto, A. Malucelli & Ó. Corcho (eds.), WONTO, : CEUR-WS.org. 
Barforoush, Ahmad & Rahnama, Ali. (2012). Ontology Learning: Revisited. Journal of Web Engineering (JWE). 11. 269-289.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The authors' own approach is not sufficiently demonstrated.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) There is only a tiny experimental study (on the wine ontology). The results are available, but the authors' system cannot be accessed, thus the study cannot be reproduced.
The study also contains the rather subjective steps of 
- judging whether a class is "already part of the W3C’s wine ontology" (actually, the supplementary Google table only speaks about "Similar Classes between the output from our system and the W3C's wine ontology"!)
- judging the additional classes as "relevant for a Wine ontology".
There is no rigorous quality evaluation step for the overall resulting ontology. By the Google page, actually, some classes might be marginal (such as "Swedish wine") and some instances are even completely erroneous.
(OVERALL SCORE) Summary of the paper
====================
It is actually two shorter papers in one. The first is a survey of some published OL methods, attempting to categorize them using some invented criteria. The second is a short statement on the authors' own system, including a small experimental study. Aside the overall OL topic and the unifying (arguable) message that better ontologies can be obtained from structured data than from texts, the two parts are not clearly connected.
Strong points
====================
- Attempt to cover substantially different OL approaches in one comparative framework
Weak points
====================
- The two parts of the paper do not make a coherent picture. The authors' approach is not even included in the survey table!
- The categorization (reused+)proposed for the survey is arguable.
- The survey lacks a literature retrieval protocol.
- The authors' own approach is not sufficiently explained, never mind thoroughly compared to the nearest kins (the only comparison is made to a 12 years old competitor, which could only use WordNet!).
- The evaluation of the system is rather tiny and the results not particularly convincing.
- There are also (minor) flaws in English and typography.
Questions
====================
If the perceived discrepancy between your system's graphical workflow and detailed algorithm is only my sloppy reading then please explain this to me.
Response to the rebuttal
====================
I appreciate that the authors reacted to some of my comments. While I see the direction of the paper as promising, the rebuttal however does not change anything on its assessment in the current form.


Metareview by Hsofia Pinto

The paper deals with (semi-)automatic construction of high-quality ontologies a topic that is relevant to the ESWC community. The paper was perceived by reviewers as having two clear different blocks: a survey and classification of existing approaches and a system in the same topic. The survey seems to have some problems (detailed in the reviews) and the system lacks comparison and proper evaluation. 
Therefore, after rebuttal and discussions among the reviewers the paper could not be recommended for acceptance. However, the authors are strongly encouraged to work on the comments provided by the reviewers and resubmit as a poster for presentation at the conference. We hope to see you at ESWC 2018.


Share on

Leave a Reply

Your email address will not be published. Required fields are marked *