Author(s): Hajer Nabli, Raoudha Bendjemaa, Ikram Amous
Abstract: The enormous amount of Web-based information has a great ef-fect for focused crawlers in order to provide effective Cloud services. It is a chal-lenge for focused crawlers to search only for URLs that are relevant to Cloud services from this explosion of information. To solve this problem, this paper contributes to the semantic focused crawler for Cloud services. In particular, we introduce a topic model based semantic similarity measure that integrates both semantic and syntactic methods for computing similarity measures between texts. First, URLs are ranked in descending order based on their semantic priority scores. Then, an LDA topic model is applied to compute the topical similarity between the URL document and the concepts document that includes a set of keywords related to the given Cloud service category. Moreover, in order to au-tomatically discover and categorize Cloud services, we present a Cloud Service Ontology (CSOnt) that contains a set of concepts defining Cloud service catego-ries. Experimental results show that the proposed approach enhances the perfor-mance of the focused crawlers and outperforms the focused crawler based on Best-First approach. In conclusion, the proposed focused crawler presents an ef-ficient way to parse the Web and collect Web pages relevant to Cloud services.
Keywords: Cloud Service Discovery; Cloud Service Ontology; Focused Crawler; LDA Model; TF-IDF; Semantic Similarity