Neuerscheinungen 2014Stand: 2020-02-01 |
Schnellsuche
ISBN/Stichwort/Autor
|
Herderstraße 10 10625 Berlin Tel.: 030 315 714 16 Fax 030 315 714 14 info@buchspektrum.de |
Yan Wang
Query Selection in Deep Web Crawling
Help your crawler efficiently retrieve data from the largest data sources in the Web
2014. 152 S. 220 mm
Verlag/Jahr: SCHOLAR´S PRESS 2014
ISBN: 3-639-71245-5 (3639712455)
Neue ISBN: 978-3-639-71245-2 (9783639712452)
Preis und Lieferzeit: Bitte klicken
The deep web is the content that is dynamically generated from data sources such as databases or file system. Unlike surface web where web pages are collected by following the hyperlinks embedded inside collected pages, data from a deep web data source is guarded by a search interface and only can be retrieved by queries. The amount of data in deep web exceeds by far that of the surface web. This calls for deep web crawlers to excavate the data so that they can be used, indexed, and searched upon in an integrated environment. Crawling deep web is the process of collecting data from search interfaces by issuing queries. One of the major challenges in crawling deep web is the selection of the queries so that most of the data can be retrieved at a low cost. This work first comprehensively introduces the state-of-art work in query selection techniques for crawling, then in-depth analyzes the remaining problems, such as cold start problem and return limit problem, and finally presents a novel technique to address them.
Yan Wang is an assistant professor of computer science at Central University of Finance and Economics in China. He went on to study computer science at University of Windsor in Canada and received his PhD and Master degrees in 2012 and 2007 respectively. His research interests are in the areas of deep web, online social networks and web analytics.