Simple-Web-Crawler | 基于C # .NET的简单网页爬虫,支持异步并发、切换代理、操作Cookie、Gzip加速。
kandi X-RAY | Simple-Web-Crawler Summary
kandi X-RAY | Simple-Web-Crawler Summary
基于C#.NET的简单网页爬虫,支持异步并发、切换代理、操作Cookie、Gzip加速。
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of Simple-Web-Crawler
Simple-Web-Crawler Key Features
Simple-Web-Crawler Examples and Code Snippets
Community Discussions
Trending Discussions on Simple-Web-Crawler
QUESTION
I'm building a simple web crawler to automate a newsletter, which means I only need to scape a set amount of pages. In this example, it is not a big deal because the script will only crawl 3 extra pages. But for a different case, this would be hugely inefficient.
So my question is, would there be a way to stop executing request()
in this forEach
loop?
Or would I need to change my approach to crawl pages one-by-one, as outlined in this guide.
Script ...ANSWER
Answered 2018-May-01 at 22:31There's no way of stopping a forEach
. You can simulate a stop by checking a flag inside the forEach
, but that will still loop through all the elements. By the way, using a loop for an io operation is not optimal.
As you have stated, the best way to process a set of increasing data to process is to do it one-by-one, but I'll add a twist: Threaded-one-by-one.
NOTE: With thread I don't mean actual threads. Take it more of a definition of "multiple lines of work". As IO operations don't lock the main thread, while one or more requests are waiting for the data, other "line of work" can run the JavaScript to process the data received, as JavaScript is single threaded (Not talking about WebWorkers).
Is as easy as having an array of pages, which receives pages to be crawled on the fly, and one function that reads one page of that array, process the result and then returns to the starting point (loading the next page of the array and processing the result).
Now you just call that function the amount of threads that you want to run, and done. Pseudo-code:
QUESTION
ANSWER
Answered 2017-Feb-13 at 05:35I found the issue, and couldn't resolve it. So, what I was trying was that I wanted to scrap info from a webpage showing some results of specific search. The issue was that the website is somehow not letting me to connect from my java application using jsoup. Probably to protect their contents. That's why there's was no elements I needed, because it's actually not there. The website offers openAPI for charge, so I decided to use other websites.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install Simple-Web-Crawler
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page