# Paper 80 (Research track)

Semantic labeling for quantitative data using Wikidata

Author(s): Phuc Nguyen, Hideaki Takeda

Full text: submitted version

Abstract: Semantic labeling for quantitative data is a process of matching numeric columns in table data to a schema or an ontology structure. It is beneficial for table search, table extension or knowledge augmentation. There are several challenges of quantitative data matching, for example, a variety of data ranges or distribution, and especially, different measurement units. Previous systems use several similarity metrics to determine column numeric values and corresponding semantic labels. However, lack of measurement units can lead to incorrect labeling. Moreover, the attribute columns of different tables could be measured by units differently. In this paper, we tackle the problem of semantic labeling in various measurement units and scales by using Wikidata background knowledge base (WBKB). We apply hierarchical clustering for building WBKB with numeric data taken from Wikidata. The structure of WBKB follows the nature taxonomy concept of Wikidata, and it also has richness information about units of measurement. We considered two transformation methods: z-score-tran based on standard normalization technique and unit-tran based on restricted measurement units for each semantic label of WBKB. We tested two transformation methods on six similarity metrics to find the most robust metric for Wikidata quantitative data. Our experiment results show that using unit-tran and ks-test metric can effectively find corresponding semantic labels even when numeric columns are expressed in different units.

Keywords: semantic labeling; quantity; unit of measurement; tabular data; LOD; Wikidata

Decision: reject

Review 1 (by Dagmar Gromann)

(RELEVANCE TO ESWC) While the overall idea of schema annotation and the contribution of an instantiated unit of measurement knowledge base might be relevant to the conference, in the way this is presented I am not even sure that this is what the paper is trying to achieve.
(NOVELTY OF THE PROPOSED SOLUTION) The only novelty over the reference implementation seems to be the use of a different knowledge base, that is, Wikidata instead of DBpedia.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) It is hard to understand what exactly the proposed solution is.
(EVALUATION OF THE STATE-OF-THE-ART) The results are presented in tables without any proper description or discussion.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The properties are not discussed and some points mentioned earlier as part of the method are never actually described in the paper.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) With the provided description of the approach and its results it would not be feasible to reproduce the study.
(OVERALL SCORE) SUMMARY
If understood correctly, this paper seeks to provide an approach for annotating numeric data stored in tables with units of measurement from the Wikidata schema, thereby contributing a resource entitled Wikidata background knowledge base (WBKB).
From the abstract I am left wondering what the paper exactly describes. Which data are labeled using which methods? Only after looking at Figure 1 it becomes somewhat clearer what the paper is trying to propose. Unfortunately, this issue with the mode of presentation persists throughout the paper. It is hard to understand what the authors are trying to say. Take for instance the following sentences: "In DBpedia, data extract from template matching from Wikipedia InfoBox." or "It clear that using the sample which is different scale with WNKB is really hard for making the correct labeling."
Furthermore, if I understood correctly, the only innovative part in comparison to Neumaier et al.'s approach is the use of Wikidata instead of DBpedia and testing some additional similarity metrics. Neither the Wikidata queries nor the well-known similarity metrics require such a lengthy description. Instead, the results should be properly described and then also discussed. While the abstract and introduction claim to be using hierarchical clustering to build a knowledge graph, this is not described properly in the paper.
STRONG POINTS:
1) The overall idea and discussed problem might be relevant to this community
WEAK POINTS:
1) This paper is barely legible and highly unclear
2) Parts of the method that are discussed in the introduction and not mentioned again anywhere in the paper, e.g. hierarchical clustering
3) Neither the results nor the evaluation are properly described or discussed
Formating:
The abstract seems very long and quite unclear. There should be a heading after the abstract stating "Introduction" with number 1 and the Section Related Work should not be numbered with "0.1" but with an integer, i.e., 2. Typesetting problems, such as p_v_alue (where only the v is subscript) or restrictedunits, KullbackLeibler, etc. Introduced acronyms suddenly change, e.g. WBKB suddenly becomes WNKB on page 8 and following. Once it is figure then it is Figure to refer to figures. Figure 2 is barely readable and its axes should be labeled.
Overall evaluation:
I advise to reject this paper since the proposed approach is first of all presented in a barely understandable way and second of little innovation. It uses already existing work and applies it to Wikidata while the previous work used DBpedia.
I would like to thank the authors for addressing the questions raised in the review. I still do not think the paper is ready to be published in its current state.

Review 2 (by anonymous reviewer)

(RELEVANCE TO ESWC) The paper moderately overlaps the  scope of the conference.
Matching (a very specific form of) existing knowledge to standard ontologies is certainly relevant in general.
The paper should be better motivated and referred to the Semantic Web literature.
An effort should be made to convince on the value of the class of presented methods in the context of the conference topics.
(NOVELTY OF THE PROPOSED SOLUTION) The approach appears only moderately original:
most of the framework is crafted along a cited work: [2].
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) It is very difficult to judge, also due to the problems with the quality of writing
(EVALUATION OF THE STATE-OF-THE-ART) In its brevity, the paper does not discusses related work sufficiently.
Most of the method is referred to [2].
An ad hoc section could be added, given the brevity of the current version of the paper.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) Because of the mentioned problem with the quality of writing, it is difficult to judge the technical quality of the proposal.
The purpose of the two transformation methods should be better and more formally presented also in comparison to the state-of-the-art.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The experiments like the rest of the paper refers to setup and resources deriving from the targeted system [2].
The experiments do not seem to be comparative, e.g. with respect to the framework in [2] which is repeatedly referred to as a source of inspiration for this work.
I suspect that the provided details would not sufficient for a possible replication of the experiment.
A comparative experiment would support the significance of this piece of work.
At the current stage it appears to be a case study on a limited problem related to a single data source / knowledge base.
(OVERALL SCORE) The paper presents a method for matching numeric table contents from Wikimedia to their units ans encoded by WBKB.
- the current version demands a total rewriting and a revision of the layout/organization
- marginally relevant problem to the conference topics
- not convincing about novelty and effectiveness w.r.t. state of the art
In general it was difficult to read the paper due to the poor quality of writing: a large number of grammar errors and linguistic problems should be solved before a re-submission.
I’m afraid that the poor presentation does compromise the appreciation of the possible merits of the presented method.
There are too many comments to be made and typos to be listed.
- the first section is missing its title, that is why the subsections numbering starts with a 0.
- the problems with open datasets should be better stated from the very beginning (first section).
- please revise the presentation of the notation in Sect. 1.1 and possibly provide examples of the notions you introduce (there is a lot of margin to extend the paper length)
- NKB == WNKB ?
- Please provide a better explanation for Fig. 2
I would suggest a thorough revision before a re-submission to a later conference.
I also suggest broadening the scope of applicability of the method showing it can compete with existing ones also on at least two other open datasets.
=== AFTER REBUTTAL
I'd like to thank the authors for their answers.

Review 3 (by anonymous reviewer)

(RELEVANCE TO ESWC) The paper is related to the conference, as it target semantic annotation of numerical data, however, it is not novel.
(NOVELTY OF THE PROPOSED SOLUTION) The paper is a replica of Neumaier et al.
The only novel part is to use unit-conversion.
The algorithm used for constructing the WBKB is also taken from Neumaier et al.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) A lot of details is missing, specially for the WBKB construction and the schema mapping algorithm.
(EVALUATION OF THE STATE-OF-THE-ART) The paper stated the previous sate of the art research, however it did not compare against any of them, not even with the work of Neumaier et al. that this work is based on.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) The paper did not discuss a lot of details related to the hierarchical clustering algorithm.
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) I doubt that the results of the paper can be replicated with the level of details given.
(OVERALL SCORE) ** Summary of the Paper **
The paper propose a method for constructing a background knowledge base for numerical data based on Wikidata, Wikidata Background Knowledge Base (WBKB). The authors use WBKB in semantic labelling of tabular data with quantities. The inputs of the semantic labelling algorithm are: a list of numbers, a unit, a property label, and context description. The expected output is Measure, unit and object of measurement. For example: height, meter, person. However, The paper is a replica of [1], and the authors ignored to mention a lot of details and referred back to [1].
** Strong Points **
* Using the unit conversion in the matching stage of column values to the WBKB, which is the main contribution of the paper.
**Weak Points: **
* The paper is poorly written with many details omitted. Section 2.1, miss the details of the Hierarchical clustering algorithm used, and only mentioned a set of SPARQL queries. It was confusing to understand the construction of the knowledge base without referring back to [1]. Also the experimental setting was not clear and the dataset is not well defined. Also it is not clear why a training dataset is needed at all.
* The paper is a replica of the work presented in [1]. I see no Novelty in the paper except for using Wikidata as the source for constructing the WBKB.
* The authors did not provide any diagrams or examples of the ontology of the WBKB
* Many sections refer back to [1] without giving further details.
*** Questions to the Authors ***
* How do you use Hierarchical Clustering in the construction of WBKB?
* Why do you need a training dataset? what are the parameters you are training? What is your training procedure?
* Is your dataset only based on Wikidata?
* The only difference I can see between your work and [1] is using Wikidata and the unit conversion part, is that true?
;o
[1] Multi-level Semantic Labelling of Numerical Values, Authors: Sebastian Neumaier, Jürgen Umbrich, Josiane Xavier Parreira, Axel Polleres.
Many thanks for addressing the comments raised in the review. Unfortunately, in its current state the paper is not mature enough to be accepted.

Review 4 (by anonymous reviewer)

(RELEVANCE TO ESWC) Semantic annotation of tabular data is a very relevant topic to lift CSV files into knowledge graphs. Annotation of numerical values has been addressed in previous work, but the problem is far from being solved yet.
(NOVELTY OF THE PROPOSED SOLUTION) The novelty of the proposed approach consists in two main contributions:
- the consideration of unit diversity and conversion among units (unit conversion) and the introduction of normalization function before computing the similarity between two numerical sets.
- the use of Wikidata as a reference knowledge base for annotation of numerical values (WDKB)
Otherwise, the approach extends previous work, in the sense that the paper does not propose a radically new approach to compute the similarity between sets of numerical values.
(CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The paper presents an end-to-end solution to build the reference knowledge base and use it to annotate numerical values. The approach is principled and considers a reference knowledge base that covers a very large number of numerical data types.
(EVALUATION OF THE STATE-OF-THE-ART) To the best of my knowledge, the discussion of related work is complete.
(DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) Discussion of experiment results is rather shallow. For example, it would be interesting to provide some examples that demonstrate when the approach is correct (or, better, improves on Neumaier et al.) and when it is not. This is particularly important because absolute performance is not very high, which would require a much in-depth discussion.
In addition, the authors compare several approaches to compute the similarity between sets of numerical values but do not explain why it is not possible to use all (or a subset) of them to collect more evidence. Are there efficiency issues in adopting such a combinatorial approach?
Finally, some insights on execution times would be useful. The efficiency of the proposed approach is not discussed at all in the paper.
Overall, to add the above-mentioned explanation, you could reduce the space dedicated to similarity metrics, which are taken from previous work (maybe keeping just the few that perform better).
(REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) Code of the proposed approach is not shared.
The paragraph describing the construction of the three sets used in evaluation is not very clear. For example, what does "we shuffle random select maximum 50 leave nodes" mean? Since you are constructing your own dataset, I suggest provide some examples used as ground truth.
I also checked supplementary material on GitHub and it is not very well organized. For example, in addition to property identifiers, you could add property labels. Also, the different datasets could be made available in a format that is easier to inspect (e.g., CSV files).
(OVERALL SCORE) SUMMARY
The paper proposes two main technical contributions:
-	the use of WBKB as background knowledge to annotate sets of numerical values with their type
-	the use of unit transformation and normalization functions to improve results, while using similarity metrics proposed in state of the art.
STRONG POINTS
-	Using WDKB is very interesting for annotation of numerical values, because of its large coverage of types and units.
-	The main contributions of the approach are (relatively) well explained (despite the very large amount of grammatical errors and typos)
-	Handling unit transformations is an important topic, and using normalization techniques is a principled approach to compare different distributions of values
-	The approach seems to provide better results than a state-of-the-art approach (arguably the best one available as of today for this specific problem)
WEAK POINTS
-	Evaluating the impact of one contribution of the paper, i.e., normalization and unit transformation, also on other datasets used in related work would have been useful to make the results more conclusive; for example, why not using the data used in Neumaier et al.?
-     The absolute performance of the approach is still rather low (e.g., 0.11 on type inference when different units are considered - see Table 2); the improvement on Neumaier et al. is also quite limited in terms of absolute numbers (e.g., +0.07 on top-k prop; +0.04 on top-k type - see Table 3). This can be motivated by the use of WBKB, which has a very large amount of numerical types, but this is a further argument for which an evaluation of unit_tran_ks_test_d also on the same dataset used by Neumaier et al. would have been useful to provide more conclusive results.
- An in-depth discussion of the results is missing; the description of the datasets used in the evaluation should also be improved to ensure repeatability
-	The paper contains a very very large amount of typos and grammatical errors, and requires a thorough proof-checked before being accepted for publication
-	The authors compared their work only with Neumaier et al.; a better argument for choosing only this approach for comparison should be provided (I agree with the choice, but readers less familiar with the topic should be informed)
QUESTIONS
Are there efficiency issues in adopting such a combinatorial approach? Why not combining different similarity measures to improve on the results?
Can you better explain such low absolute performance numbers and discuss the slight improvement on Neumaier?
What does "we shuffle random select maximum 50 leave nodes" mean?
What does "second layer is called as a p-o hierarchy which is sub-nodes of type nodes" mean? (you can make an example)
Can you collect transformation rules automatically or did you need to implement the transformation yourselves?
Can you better specify what you mean with "We modify the type measure to the top k neighbors contain the correct type path" (Sec. 2.3)
*** typos and grammatical errors are literally too many to be listed: the paper require a careful proof-check from a native english speaker before it can be published***
Section 1.1
A definition of WBKB is missing; in particular, define a node in the WBKB
v_q \in R --> you may want to use a symbol for real numbers instead of R (e.g., \mathcal{R}?)
"Semantic labeling system perform K-nearest neighbor to find a corresponding node p in WBKB with property label lvp ," -->  should be better explained
"Each node has the information about the canonical unit and other restrictedunits or scales." --> specify which nodes (all WBKB nodes? only numerical ones?)
In Query 1.2, why not using a variable instead of wdt:P2237?  This would show the generality of your query given an input property identifier. Same suggestion apply to Query 1.3 and Query 1.4, where you want to enphasize variables.
Section 1.4
Eq. 7 has undefined variables
Section 2.1
"We select the most of 50 properties for building WBKB" --> the most what? Please rewrite
"In total, [...] dif-set" --> not clear, please explain better
* After rebuttal *
I thank the authors for their replies. I think that the work is not mature to be accepted for publication yet. However, I encourage the authors to improve the experimental evaluation (possibly testing combinations of the different measures) and the presentation and re-submit the paper.

Metareview by Oscar Corcho

As pointed out by the reviewers, the paper is generally difficult to follow and the relationship with some approaches in the state of the art is not completely clear.

Share on