Where is my URI?
Author(s): Andre Valdestilhas, Tommaso Soru, Markus Nentwig, Edgard Marx, Muhammad Saleem, Axel-Cyrille Ngonga Ngomo
Full text: submitted version
Abstract: One of the Semantic Web foundations is the possibility to dereference URIs to let applications negotiate their semantic content.
However, this exploitation is often infeasible as the availability of such information depends on the reliability of networks, services, and human factors.
Moreover, it has been shown that around 90% of the information published as Linked Open Data is available as data dumps and 84% of endpoints are offline.
To this end, we propose a Web service called Where is my URI?.
Our service aims at indexing URIs and their use in order to let Linked Data consumers find the respective RDF data source, in case such information cannot be retrieved from the URI alone.
We rank the corresponding datasets by following the rationale upon which a dataset contributes to the definition of a URI proportionally to the number of literals.
We finally describe potential use-cases of applications that can immediately benefit from our simple yet useful service.
Keywords: Link Discovery; Linked Data; Dumps; URI; Dereferenceable
Review 1 (by Amelie Gyrard)
I have read the rebuttal from the authors. the demo was running today: http://18.104.22.168:8080/LinkLion2_WServ/ https://dice-group.github.io/wimu/ link should be added to the web site --- Summary: The authors provide a Web service called Where is my URI (WIMU) since around 90% of the information published as Linked Open Data is available as data dumps and 84% of endpoints are offline. The authors explaining the index creation, the web interface and the data processing. The dataset can be ranked if referenced by multiple data sources. The authors claim that they process more than 58 billion unique triples from more than 660,000 datasets obtained from LODStats and LOD Laundromat. Three use cases are explained: (1) data quality and data interlinking, (2) finding class axioms, and (3) statistics about the dataset. Advantages: • We need such tools Drawbacks: • The use case section is not clear enough • The resource link was dead when tested Resources: • The web service https://w3id.org/where-is-my-uri/ (when tested 22 January it did not worked: “This site can’t be reached”) • The source code is available online https://github.com/dice-group/wimu under GNU Affero public license 3.0 Suggestions for improvements: • “around 21% of the information published as Linked Open Data is available as data dumps” –> prove that • “more than 58% of endpoints are offline“ –> prove that. Do you learnt that from those projects (SPORTAL , SPARQLES )? • “We also rank the data sources in case a single URI is provided by multiple data sources” -> check the Linked Open Vocabularies (LOV) project  and the journal publications since it can count the number of times an ontology is used by other ontologies. • Page 3: “:hasURI, :hasDataset, :hasScore” -> which ontology has been used? Did you design your own ontology? More explanations are expected. • Page 5 “we present two use-cases” -> but you have 3 sub sections • Page 5 concise bounded descriptions (CBDs) -> Concise Bounded Descriptions (CBDs) • Page 5 “Linked Data Lifecycle” -> add reference LOD2 project? • Page 6. The picture is not well-explained. Not clear enough • Check Semantic Web Best Practices Project . Similar studies are done when loading ontologies with the errors encountered.  Linked Open Vocabularies (LOV): a gateway to reusable semantic vocabularies on the Web [Vandenbussche et al. 2017]  http://www.sportalproject.org/  http://sparqles.ai.wu.ac.at/?  http://perfectsemanticweb.appspot.com/
Review 2 (by Alasdair Gray)
Review 3 (by Vinh Nguyen)
This paper describes a service that can find the datasets for a given URI. This service is conceptually useful due to the unavailability of the endpoints and not everyone can afford to host many LOD datasets locally. The service is fully available online and could potentially be reusable in some scenarios that may need some checkings before downloading the entire dataset files, which are sometimes big. Reasons to accept: - The web interface and rest service basically work, I tested them all. - The code is available on GitHub and instruction was provided to be reproduced. - This service indexes the datasets from LOD Stats and LOD Laundromat and uses the indices to look for the datasets that contain the given URI. - The returning datasets are heuristically ranked by the number of literals they contain for the given URI, which makes sense. Reasons to reject: - Although the authors described some use cases where the service could be utilized, it has not yet to be used by any application or community. It may just have a few applications. - The input URI must exactly match the entire URI in the datasets, otherwise, it cannot find the dataset. For example, this URI http://sws.geonames.org/4896861/ would give some results while this URI http://sws.geonames.org/4896861 gives NOTHING! The two URIs are not much different from a human point of view! It took me quite a long time to realize that I got different results because I have given different input strings. This potentially causes some confusion to the users too. - In addition to URI lookups, what if the input is an entity without a full URI? Can this service search for datasets from such a given entity, e.g. BarackObama, if someone is interested in this entity without knowing the full URI and the datasets containing information about it. I think the impact may increase if supported. The exact matching of the URIs as mentioned above will eliminate many applications of the entity lookups.
Review 4 (by anonymous reviewer)
The authors present a repository/a service of indexed URIs and corresponding RDF data resources. The service addresses a current problem that we still have and is a quite nice addition in terms of URI/RDF resolution tools. The paper is nicely written and easy to follow. Link to respective implementations are provided. Some detailed comments: 1) Do not use fancy adjectives to describe your work, unless you have already shown that they apply. Ex. in the introduction — scalable and time-efficient deployment of SW applications. Where does this come from? why time-efficient and why scalable. Introduction, page 2 efficient, low cost and scalable service. Are you developing payed service? What costs are you talking about? How is the service scalable if you are already publishing updates only once monthly? I am not against making big claims, but avoid sounding to marketing-like without actual support for making the statements 2) 3.1 Steps 3. and 4., especially 4. I am quite sure that some people form the community would immediately ask why 3. and then 4. and how exactly do you do that. Consider adding a paragraph on that, just to be on the safe side. 3) One obvious question that comes to mind is - what is the innovation, if actually, this is only an indexed merge of LODstats and LOD Laundromat? How easy it is to include further sources? Devote some space to clearly explaining that benefits are much higher and it is not only just merging two repositories. 4) Section 4.3 Formatting problem in the first paragraph. You can use \sloppy to fix that.