開啟主選單

求真百科

Crawler software

於 2022年6月27日 (一) 15:41 由 4521552651對話 | 貢獻 所做的修訂 (创建页面,内容为“Crawler software is a common software used to collect a large amount of information, and crawling information by using vulnerabilities is called malicious crawler. W…”)
(差異) ←上個修訂 | 最新修訂 (差異) | 下個修訂→ (差異)

Crawler software is a common software used to collect a large amount of information, and crawling information by using vulnerabilities is called malicious crawler. Web crawler is a program that automatically extracts web pages. It downloads Web pages for search engines from the world wide web. It is an important part of search engines. The traditional crawler starts from the URL of one or several initial web pages to obtain the URL on the initial web page. In the process of crawling the web page, it continuously extracts new URLs from the current page and puts them into the queue until certain stop conditions of the system are met. The workflow of the focused crawler is complex. It needs to filter the links irrelevant to the topic according to a certain web page analysis algorithm, keep the useful links and put them into the URL queue waiting to be captured. Then, it will select the next web page URL from the queue according to a certain search strategy, and repeat the above process until it reaches a certain condition of the system. In addition, all web pages captured by the crawler will be stored by the system, analyzed, filtered and indexed for future query and retrieval; For focused crawlers, the analysis results obtained in this process may also provide feedback and guidance for the future capture process.