Representing Network Knowledge Using Provenance-Aware Formalisms for Cyber Situational Awareness
Author(s): Leslie F. Sikos, Markus Stumptner, Wolfgang Mayer, Catherine Howard, Shaun Voigt, Dean Philp
Full text: submitted version
Abstract: Cyber situational awareness is required for a wide range of applications, such as network monitoring, management, vulnerability assessment and defense. Due to the amount of network data available, formal knowledge representation, fusion and reasoning techniques are required to support network analysts’ cyber situational awareness. To this end, Semantic Web technologies have been used to formally represent network data and knowledge. While Semantic Web standards support the level of task automation required, capturing the provenance of RDF statements using Semantic Web standards, while taking scalability into account, is non-trivial. This paper proposes a formally grounded model for representing the semantics of complex communication network concepts, along with data provenance, using terms of the Cyber Situational Awareness Ontology. This novel ontology enables the formal, unified representation of complex network concepts independent of the type of data source so that network analysts can represent expert knowledge and query network data fused from disparate sources.
Keywords: RDF Provenance; Cyber Situational Awareness Ontology; Network Knowledge Discovery; Network Ontology
Review 1 (by anonymous reviewer)
(RELEVANCE TO ESWC) The work presents the design of a Cyber Situation Awareness Ontology using OWL DL. It's directly related to ESWC and a good fit to the Ontology and Schema Track. (NOVELTY OF THE PROPOSED SOLUTION) The proposed ontology is very domain specific and can potentially support some nice query scenarios. However, the work itself does not introduce any novelty to the existing knowledge about ontology design patterns. The novelty of the proposed solution is further weakened by the lack of evaluation on the choice of applying full DL representation to the ontology, or a demonstration of the benefit of SW technologies in this domain (see later). (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) Although the work is nicely motivated in general in the introduction section, the set of concrete requirements (if any) motivated the design of the ontology is less clear. A lot of section 2 is focused on explaining the formalisation behind the ontology and the choice of ontology design patterns. However, it has less sufficient details about the general design process of the ontology (e.g. how it might be motivated by requirements or engagement with users), its high-level functions, and justifications of the choices. On contrast, the requirements for different levels of provenance information representation are better discussed. I would say this part of the work is largely based upon existing work, and probably less domain-specific. This is fine, but the authors could make this clearer, and make the domain-specific part of the contributions stronger. (EVALUATION OF THE STATE-OF-THE-ART) The paper has a brief mention of related ontology in section 2, but an in-depth discussion. The authors should also consider including some related standardisation efforts in the application, so as to justify the choice of concept names in the ontology. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) On page 12, the authors provide an interesting discussion about the case study. However, the discussion was only one paragraph long and it was so brief. Furthermore, the evaluation does not show how the full DL reasoning might be helpful to the queries from the given domain, and whether the formalisms might cause any challenges to the queries. If this part of work is further extended, it can much strengthen the proposed work. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) No queries or the actual data sets were provided. (OVERALL SCORE) The paper proposes a domain specific Cyber Situation Awareness ontology using OWL DL. The work is nicely motivated, and the authors provided detailed descriptions about the choice of ontology design patterns. The work is motivated by some domain-specific queries, which might benefit from clearer semantics about the data. The authors clearly have put into a lot of thoughts about the conceptualisation and formalisation of the domain. And they provided some interesting lessons learnt from working with application domain. Strong points: - a novel application domain - nice formalisation of the domain - interesting lessons from practical applications Weak points - lack of details about the design process and justifications - lack of sufficient evaluation - lack of details about the datasets and queries used for the evaluation QA to authors: - how were requirements collected for motivating the ontology design, who were involved? - figure 1 is really hard to read - what the set of high level functions that the ontology is designed to support - what kind of reasoning engine and query infrastructure used to generate the results on page 12 I thank the authors for the rebuttal. But not all my questions were answered and the authors did not propose solutions to fix major issues of the paper.
Review 2 (by Javier D. Fernández)
(RELEVANCE TO ESWC) Authors propose an ontology to represent the information and provenance of communication networks, of special interest for cyber situational awareness scenarios. The topic could be of interest for ESWC, although the paper lacks of clarity to understand the concrete research challenges addressed in the paper. (NOVELTY OF THE PROPOSED SOLUTION) If I understood correctly, the main contribution of authors is the cyber situational awareness ontology. However, besides the modelling challenge per se (which could be seen as a resource paper), I failed to see the concrete research contribution of authors, in particular regarding provenance (besides using named graphs). A clarification from authors in the rebuttal could maybe help. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The solution seems correct, but the paper is hard to read and most of the complex examples are not explained (e.g. in page 6). It is not clear why authors need the extensive formalisation in Section 2.1, which is mostly state of the art and not really needed in the following sections. In the same section, authors state (s,o,p) for a triple instead of the typical (s,p,o), and the meaning of get (c) is not clear (GET of which resource? What is c?). (EVALUATION OF THE STATE-OF-THE-ART) Authors didn't have an appropriate state-of-the-art section, and most of the references in the text (such as singleton properties) are not properly cited. Some acronyms (such as SWRL) are not explained or explained either. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) In general the paper is really unorganised. I would recommend authors to refocus the paper with clarity in mind, clearly stating the contributions of the work and explaining every step with the appropriate level of detail. If the main goal is the definition of the ontology, the resource track might be more suitable, but then some digestible summary and statistics of the ontology must be provided. (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The paper presents a case study where the provenance information via named graphs is checked. However the purpose is unclear and the evaluation is limited to a single dataset. (OVERALL SCORE) I thank authors for their responses. Nonetheless I still think that the work is interesting but it requires a major review to provide more clarity and formalism to understand the actual contribution. The paper addresses the challenge of representing and integrating information for cyber situational awareness applications. To do so, authors propose a new ontology to represent the information and provenance of communication networks. Besides presenting theoretical foundations, authors discuss about different mechanisms to represent provenance (where named graphs are used), and a case study is briefly presented. As I mentioned in my comments above, in general the paper is a bit immature and lacks of the necessary structure and clarity to understand the full extent of the work. As stated, I would recommend authors to reshape the explanation of the work with clarity in mind. * Strong points - Interesting topic - Few work on ontologies for cyber situational awareness - Efficient provenance representation is still an open challenge * Week points - The research contributions of the paper are unclear - The paper lacks of clarity and structure - Formalization and examples are hard to process Question for the authors: - What are the concrete research contributions of the paper? - If provenance is managed with named graphs, why is this different than the state of the art?
Review 3 (by anonymous reviewer)
(RELEVANCE TO ESWC) The paper introduces an ontology which captures properties of communication networks. A certain focus is put on provenance of RDF statements. (NOVELTY OF THE PROPOSED SOLUTION) The key novel aspect is to model in one ontology various network aspects which are so far existing in different formats and different levels of expressiveness. The aspect of provenance has been studied exhaustively, it is not fully clear which requirements cannot be handled by existing approaches (including PROV) or by extending existing approaches. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) The paper introduces the theoretical properties and the expressiveness of the proposed formalisation. However, important aspects are missing, e.g. the answer to the question "which questions should the ontology be able to answer?" and "how well does the ontology answer these questions?" (EVALUATION OF THE STATE-OF-THE-ART) Some existing approaches are outlined, e.g. in the introduction section. However, the paper lacks a dedicated state-of-the-art section which does a strcutured comparison to other work. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) Key aspects needed to understand the value of the ontology are missing, in particular explaining the modelled content in addition to the theoretical properties of the modelling decisions and also answering the questions "what has been modelled?", "why has it been modelled this way?", etc.. According to the paper 52 classes and 123 properties have been modelled. Only a small fraction of these modelled constructs are explained in the paper. In addition, Figure 1 is hardly readable (holds also for Figure 2). (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The ontology is provided as a download link. The usefulness of the ontology should be shown, e.g. by providing evidence from usage in practical settings. (OVERALL SCORE) I thank the authors for the rebuttal answers! The paper presents a specific domain ontology. - Strong points: The ontology aims to integrate various aspects of network properties of the internet infrastructure to facilitate analysis of network structures by allowing to query them. The theoretic properties of the formal approach are covered, thus e.g. indicating the level of experssivity. - Weak points: Many modelling decisions remain unclear, in particular what has been modelled and why has it been modelled in this way. The ontology is available as download link, however, no experimental study confirms the usefulness of the modelled ontology. No documented adoption and reuse of the ontology by other parties has been shown.
Review 4 (by anonymous reviewer)
(RELEVANCE TO ESWC) The paper employs semantic web technologies. (NOVELTY OF THE PROPOSED SOLUTION) Novel combination of elements, but not convincing. (CORRECTNESS AND COMPLETENESS OF THE PROPOSED SOLUTION) This review criterion is problematic. In what sense is a proposed ontology "correct" or "complete"? (a set of inference rules, e.g., a Tableaux calculus can be correct and/or complete - but an specific ontology??) (EVALUATION OF THE STATE-OF-THE-ART) Some high-level comparisons were given. The review criterion of "evaluation of state-of-the-art" sounds problematic. (DEMONSTRATION AND DISCUSSION OF THE PROPERTIES OF THE PROPOSED APPROACH) No convincing application of the framework was presented. What specific questions or applications are enabled by this? (REPRODUCIBILITY AND GENERALITY OF THE EXPERIMENTAL STUDY) The experimental study didn't contain specific questions or queries. It's not clear how a "good" (e.g. general) study would be distinguished from a "bad" (e.g., less general) study. (OVERALL SCORE) I would have argued "weak reject" but there is no such category. Some detailed comments below: * Abstract: The paper proposes a "formally grounded model" for representing the semantics of complex communication network concepts, along with data provenance using terms of the Cyber Situational Awareness Ontology. It is not immediately clear whether the framework (and the ontology) can be accessed. More importantly, no application seems to be available: Are there no example questions that have been implemented in the framework? * Section 1: What is cyber situational awareness? This isn't explained (although the example questions provide some an idea what is meant). "Because these sources are disparate and heterogeneous, data integration requires syntactic and semantic interoperability, which can be achieved by formal knowledge representation standards, such as RDF." This suggests that RDF is a solution to the integration of disparate and heterogeneous data. However, RDF is no magical bullet for any of the large number of specific data integration problems. (See e.g. Lenzerini, M. Data integration: A theoretical perspective, PODS 2002, for an overview of data integration from a database angle.) * Section 2: Presents a "formal grounding" (Sec. 2.1), the Cyber Situational Awareness Ontology (Sec. 2.2), and a provenance-aware network knowledge representation (Sec. 2.3). Sec. 2.1 contains the usual formalities / preliminaries. There are some minor issues, e.g., "... based on three countably finite ... sets" Did you mean to say *infinite* here? (Finite sets are obviously countable..) The formulas "\exists R.C" and "\forall R.C" are not quantifiers (instead they are quantified formulas and *contain* the quantifiers "\exists" and "\forall"). Sec. 2.2 contains the details of the ontology, but also some statements that are vague and hard or impossible to verify (or falsify), e.g. ".., our model prefers DL axioms over SWRL rules whenever possible to ensure decidability." Are there cases when you have to use SWRL rules? Why and where? Do you then lose decidability? Earlier in the paper you mention that you need a "specific vocabulary" rather than a "generic" one. But I see no specific concepts defined. Quite the contrary: the terms "Type1" through "Type11" sounds really generic. Of course they might be defined elsewhere, but the reader does not gain an understanding what the specific concepts mean. In Sec. 2.3 you use terms a "issues", "not appealing", and "not ideal". The corresponding statements are rather vague. * Section 3 presents a "case study", but there is really just some repetition of terminology, some syntactic variants, and unfortunately no concrete application of the ontology: What domain questions, queries, and inference are supported by your framework? As a result, it's also not clear why a more traditional approach (e.g., relational model) couldn't be used.
Metareview by Hsofia Pinto
Authors propose an ontology to represent the information and provenance of communication networks, of special interest for cyber situational awareness scenarios. The topic is interesting and within ESWC scope. However, the submission lacks enough details and a proper evaluation. After rebuttal comments from the authors, the reviewers mantain their views that the paper does not yet meet the standards required for this venue. Therefore at this time, the paper is not recommended for acceptance.