puppeteer-cluster | Puppeteer Pool , run a cluster of instances in parallel | Automation library
kandi X-RAY | puppeteer-cluster Summary
kandi X-RAY | puppeteer-cluster Summary
Create a cluster of puppeteer workers. This library spawns a pool of Chromium instances via Puppeteer and helps to keep track of jobs and errors. This is helpful if you want to crawl multiple pages or run tests in parallel. Puppeteer Cluster takes care of reusing Chromium and restarting the browser in case of errors.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of puppeteer-cluster
puppeteer-cluster Key Features
puppeteer-cluster Examples and Code Snippets
Community Discussions
Trending Discussions on puppeteer-cluster
QUESTION
package.json
:
ANSWER
Answered 2021-Dec-30 at 14:44The peer error is coming from puppeteer-cluster
. It seems that the puppeteer-cluster
puppeteer
version is outdated compared to the puppeteer
version you're running at the root of the project (mismatch between 13.0.1
and 13.0.0
).
puppeteer-cluster
has its own puppeteer
version. You don't need to install both packages (even tho it says so in the documentation).
npm uninstall puppeteer
npm uninstall puppeteer-cluster
npm i puppeteer-cluster
If that doesn't fix it you could always force puppeteer-cluster
install through npm i puppeteer-cluster --force
.
QUESTION
I need to get every client in a table so that I can iterate through them, and use Puppeteer to crawl some data. I need the MySQL query because I gotta pass some params through the querystring.
I'm using Puppeteer, Puppeteer-cluster (due to the hundreds of rows), and MySQL driver.
...ANSWER
Answered 2021-May-21 at 22:37damn boy, i have things to say :)
- I think the main cause of your issue is interaction between loops / callbacks / cluster here is an exemple to clarify my point on loops
QUESTION
I am trying to use args in my code to use a proxy service I have. If I remove the args altogether things run fine but if I have them in there I get an error stating: Error: Unable to restart chrome. I checked multiple examples and copied the same to my code but it seems to fail. Any ideas on how to implement this correctly?
Code:
...ANSWER
Answered 2021-Mar-19 at 18:55I played around a bit and discovered by removing the arg --single-process then it works fine.
QUESTION
Long story short, I've made an app for web scraping and in order for it to be able to simultaneously run more then 1 process at a time (more than 1 Chromium opened), i used puppeteer-cluster. I've got it to run several processes at once, but the cluster won't stop afterwards, it permanently runs. Along the way, I've encountered the following error (1)
...ANSWER
Answered 2021-Mar-02 at 01:27Cluster.launch
return a Promise. If you just call const cluster = Cluster.launch
, now cluster
is Promise, when you call (await cluster).close();
, (await cluster)
will return a Cluster
instance -> It work!
Let’s use cluster
as a Cluster
instance instead of a Promise object:
QUESTION
I'm trying to load a page with a canvas and then save it as an image.
For example, this page. On Chrome, I can right click the canvas with a circle on the upper right side of the page and click save image. I want to do this exact same thing but through NodeJS and Puppeteer. Is this possible?
So far I'm trying to select it via
...ANSWER
Answered 2021-Jan-27 at 22:48In your example, the canvas is inside an iframe. So you need to get the frame first, then you will able to transfer the string with the data URL:
QUESTION
I use puppeteer-cluster + node js. I am new in that. I have some trouble. I need to get XHR response from site.I am listening to the page, but I cannot write the resulting value to the variable. I need to use the value in another part of the code. how to wait for the function to execute in a listener and write the result to a variable?
...ANSWER
Answered 2020-Nov-05 at 17:49Something like this:
QUESTION
So I have seen the puppeteer-cluster package but that has very manual examples my situation is very dynamic so i'll try my best to explain.
Ok So I have an app in which users schedule posts. Once the time of posting arrives puppeteer runs, goes to the site, logs in the user using creds from my app's db, and posts the content fairly simple.
Now the problem arises when says 20 users all decided to post today at 1pm. Now puppeteer spawns 25 chromium instances which messess with the server cause of limited RAM. What I am asking basically is how can achieve the following: 1). Limit puppeteer's concurrency to 10 instances. Any more then that then it should basically do it in batches like do 10 first then close them and start 10 again etc. 2). If less then 10 then just keep normal functionality.
I know this seems like I m giving you homework but trust me i just need some guidance a little help or pointing me in the right direction would suffice. or if you could tell me how to use this: puppeteer-cluster dynamically to suit my needs. Many thanks!
...ANSWER
Answered 2020-Oct-10 at 20:09- First of all, You need to have an advance messaging queueing system to capture all the incoming concurrent requests like Kafka / RabbitMQ
- Get the messages in chunks of 10 requests and run a for loop on these chunks and each loop creating one cluster per chunk.
- The following code explains how you can accomplish it, this piece of code answers all your questions listed.
Code snippet:
QUESTION
I wanted to know if anyone using Puppeteer-Cluster could elaborate on how the Cluster.Launch({settings}) protects against sharing of cookies and web data between pages in different context.
Do the browser contexts here, actually block cookies and user-data is not shared or tracked? Browserless' now infamous page seems to think no, here and that .launch({}) should be called on the task, not ahead of the queue.
So my question is, how do we know if puppeteer-cluster is sharing cookies / data between queued tasks? And what kind of options are in the library to lower the chances of being labeled a bot?
Setup: I am using page.authenticate with a proxy service, random user agent, and still getting blocked(403) occasionally by the site which I'm performing the test.
...ANSWER
Answered 2020-Jan-16 at 16:01Author of puppeteer-cluster
here. The library does not actively block cookies, but makes use of browser.createIncognitoBrowserContext()
:
Creates a new incognito browser context. This won't share cookies/cache with other browser contexts.
In addition, the docs state that "Incognito browser contexts don't write any browsing data to disk" (source), so that restarting the browser cannot reuse any cookies from disk as there were no data written.
Regarding the library, this means when a job is executed, a new incognito context is created, which does not share any data (cookies, etc.) with other contexts. So as long as Chromium properly implements the incognito browser contexts, there is no data shared between the jobs.
The page you linked only talks about browser.newPage()
(which shares cookies between pages) and not about incognito contexts.
Some websites will still block you, because they use different measures to detect bots. There are headless browser detection tests as well as fingerprinting libraries that might report you as bot if the user agent does not match the browser fingerprint. You might be interested in this answer by me that provides some more detailed explanation how these fingerprints work.
You can try to use a library like puppeteer-extra
that comes with a stealth
plugin to help you solve the problem. However, this basically is a cat-and-mouse game. The fingerprinting tests might be changed or another sites might use a different "detection" mechanism. All-in-all, there is no way to guarantee that a website does not detect you.
In case you want to use puppeteer-extra
, be aware that you can use it in conjunction with puppeteer-cluster
(example code).
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install puppeteer-cluster
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page