puppeteer-cluster | Puppeteer Pool , run a cluster of instances in parallel | Automation library

 by   thomasdondorf TypeScript Version: 0.24.0 License: MIT

kandi X-RAY | puppeteer-cluster Summary

kandi X-RAY | puppeteer-cluster Summary

puppeteer-cluster is a TypeScript library typically used in Automation, Nodejs applications. puppeteer-cluster has no bugs, it has no vulnerabilities, it has a Permissive License and it has medium support. You can download it from GitHub.

Create a cluster of puppeteer workers. This library spawns a pool of Chromium instances via Puppeteer and helps to keep track of jobs and errors. This is helpful if you want to crawl multiple pages or run tests in parallel. Puppeteer Cluster takes care of reusing Chromium and restarting the browser in case of errors.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              puppeteer-cluster has a medium active ecosystem.
              It has 2798 star(s) with 272 fork(s). There are 39 watchers for this library.
              OutlinedDot
              It had no major release in the last 12 months.
              There are 97 open issues and 139 have been closed. On average issues are closed in 110 days. There are 21 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of puppeteer-cluster is 0.24.0

            kandi-Quality Quality

              puppeteer-cluster has 0 bugs and 0 code smells.

            kandi-Security Security

              puppeteer-cluster has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              puppeteer-cluster code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              puppeteer-cluster is licensed under the MIT License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              puppeteer-cluster releases are available to install and integrate.
              Installation instructions, examples and code snippets are available.

            Top functions reviewed by kandi - BETA

            kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
            Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of puppeteer-cluster
            Get all kandi verified functions for this library.

            puppeteer-cluster Key Features

            No Key Features are available at this moment for puppeteer-cluster.

            puppeteer-cluster Examples and Code Snippets

            No Code Snippets are available at this moment for puppeteer-cluster.

            Community Discussions

            QUESTION

            puppeteer-cluster@0.20.0 requires a peer of puppeteer@^1.5.0 || ^2.0.0 but none is installed. You must install peer dependencies yourself
            Asked 2021-Dec-30 at 14:44

            package.json:

            ...

            ANSWER

            Answered 2021-Dec-30 at 14:44

            The peer error is coming from puppeteer-cluster. It seems that the puppeteer-cluster puppeteer version is outdated compared to the puppeteer version you're running at the root of the project (mismatch between 13.0.1 and 13.0.0).

            puppeteer-cluster has its own puppeteer version. You don't need to install both packages (even tho it says so in the documentation).

            1. npm uninstall puppeteer
            2. npm uninstall puppeteer-cluster
            3. npm i puppeteer-cluster

            If that doesn't fix it you could always force puppeteer-cluster install through npm i puppeteer-cluster --force.

            Source https://stackoverflow.com/questions/70530700

            QUESTION

            Async/await - passing data to Puppeteer in a MySQL callback
            Asked 2021-May-21 at 22:37

            I need to get every client in a table so that I can iterate through them, and use Puppeteer to crawl some data. I need the MySQL query because I gotta pass some params through the querystring.

            I'm using Puppeteer, Puppeteer-cluster (due to the hundreds of rows), and MySQL driver.

            ...

            ANSWER

            Answered 2021-May-21 at 22:37

            damn boy, i have things to say :)

            1. I think the main cause of your issue is interaction between loops / callbacks / cluster here is an exemple to clarify my point on loops

            Source https://stackoverflow.com/questions/67644228

            QUESTION

            Correct way to pass args in puppeteer-cluster via puppeteerOptions
            Asked 2021-Mar-19 at 18:55

            I am trying to use args in my code to use a proxy service I have. If I remove the args altogether things run fine but if I have them in there I get an error stating: Error: Unable to restart chrome. I checked multiple examples and copied the same to my code but it seems to fail. Any ideas on how to implement this correctly?

            Code:

            ...

            ANSWER

            Answered 2021-Mar-19 at 18:55

            I played around a bit and discovered by removing the arg --single-process then it works fine.

            Source https://stackoverflow.com/questions/66713742

            QUESTION

            Puppeteer cluster.close() "crashes" after calling cluster.queue()
            Asked 2021-Mar-02 at 01:27

            Long story short, I've made an app for web scraping and in order for it to be able to simultaneously run more then 1 process at a time (more than 1 Chromium opened), i used puppeteer-cluster. I've got it to run several processes at once, but the cluster won't stop afterwards, it permanently runs. Along the way, I've encountered the following error (1)

            ...

            ANSWER

            Answered 2021-Mar-02 at 01:27

            Cluster.launch return a Promise. If you just call const cluster = Cluster.launch, now cluster is Promise, when you call (await cluster).close();, (await cluster) will return a Cluster instance -> It work!

            Let’s use cluster as a Cluster instance instead of a Promise object:

            Source https://stackoverflow.com/questions/66424297

            QUESTION

            How to save a canvas as an image using puppeteer?
            Asked 2021-Jan-27 at 22:48

            I'm trying to load a page with a canvas and then save it as an image.

            For example, this page. On Chrome, I can right click the canvas with a circle on the upper right side of the page and click save image. I want to do this exact same thing but through NodeJS and Puppeteer. Is this possible?

            So far I'm trying to select it via

            ...

            ANSWER

            Answered 2021-Jan-27 at 22:48

            In your example, the canvas is inside an iframe. So you need to get the frame first, then you will able to transfer the string with the data URL:

            Source https://stackoverflow.com/questions/65914988

            QUESTION

            Get result from listener async
            Asked 2020-Nov-05 at 17:49

            I use puppeteer-cluster + node js. I am new in that. I have some trouble. I need to get XHR response from site.I am listening to the page, but I cannot write the resulting value to the variable. I need to use the value in another part of the code. how to wait for the function to execute in a listener and write the result to a variable?

            ...

            ANSWER

            Answered 2020-Nov-05 at 17:49

            QUESTION

            Puppeteer Chromium instance management
            Asked 2020-Oct-10 at 20:09

            So I have seen the puppeteer-cluster package but that has very manual examples my situation is very dynamic so i'll try my best to explain.

            Ok So I have an app in which users schedule posts. Once the time of posting arrives puppeteer runs, goes to the site, logs in the user using creds from my app's db, and posts the content fairly simple.

            Now the problem arises when says 20 users all decided to post today at 1pm. Now puppeteer spawns 25 chromium instances which messess with the server cause of limited RAM. What I am asking basically is how can achieve the following: 1). Limit puppeteer's concurrency to 10 instances. Any more then that then it should basically do it in batches like do 10 first then close them and start 10 again etc. 2). If less then 10 then just keep normal functionality.

            I know this seems like I m giving you homework but trust me i just need some guidance a little help or pointing me in the right direction would suffice. or if you could tell me how to use this: puppeteer-cluster dynamically to suit my needs. Many thanks!

            ...

            ANSWER

            Answered 2020-Oct-10 at 20:09
            1. First of all, You need to have an advance messaging queueing system to capture all the incoming concurrent requests like Kafka / RabbitMQ
            2. Get the messages in chunks of 10 requests and run a for loop on these chunks and each loop creating one cluster per chunk.
            3. The following code explains how you can accomplish it, this piece of code answers all your questions listed.

            Code snippet:

            Source https://stackoverflow.com/questions/64286507

            QUESTION

            Is Puppeteer-Cluster Stealthy enough to pass bot tests?
            Asked 2020-May-22 at 21:26

            I wanted to know if anyone using Puppeteer-Cluster could elaborate on how the Cluster.Launch({settings}) protects against sharing of cookies and web data between pages in different context.

            Do the browser contexts here, actually block cookies and user-data is not shared or tracked? Browserless' now infamous page seems to think no, here and that .launch({}) should be called on the task, not ahead of the queue.

            So my question is, how do we know if puppeteer-cluster is sharing cookies / data between queued tasks? And what kind of options are in the library to lower the chances of being labeled a bot?

            Setup: I am using page.authenticate with a proxy service, random user agent, and still getting blocked(403) occasionally by the site which I'm performing the test.

            ...

            ANSWER

            Answered 2020-Jan-16 at 16:01
            Direct answer

            Author of puppeteer-cluster here. The library does not actively block cookies, but makes use of browser.createIncognitoBrowserContext():

            Creates a new incognito browser context. This won't share cookies/cache with other browser contexts.

            In addition, the docs state that "Incognito browser contexts don't write any browsing data to disk" (source), so that restarting the browser cannot reuse any cookies from disk as there were no data written.

            Regarding the library, this means when a job is executed, a new incognito context is created, which does not share any data (cookies, etc.) with other contexts. So as long as Chromium properly implements the incognito browser contexts, there is no data shared between the jobs.

            The page you linked only talks about browser.newPage() (which shares cookies between pages) and not about incognito contexts.

            Why websites might identify you as a bot

            Some websites will still block you, because they use different measures to detect bots. There are headless browser detection tests as well as fingerprinting libraries that might report you as bot if the user agent does not match the browser fingerprint. You might be interested in this answer by me that provides some more detailed explanation how these fingerprints work.

            You can try to use a library like puppeteer-extra that comes with a stealth plugin to help you solve the problem. However, this basically is a cat-and-mouse game. The fingerprinting tests might be changed or another sites might use a different "detection" mechanism. All-in-all, there is no way to guarantee that a website does not detect you.

            In case you want to use puppeteer-extra, be aware that you can use it in conjunction with puppeteer-cluster (example code).

            Source https://stackoverflow.com/questions/59672126

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install puppeteer-cluster

            Install using your favorite package manager:.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            Install
          • npm

            npm i puppeteer-cluster

          • CLONE
          • HTTPS

            https://github.com/thomasdondorf/puppeteer-cluster.git

          • CLI

            gh repo clone thomasdondorf/puppeteer-cluster

          • sshUrl

            git@github.com:thomasdondorf/puppeteer-cluster.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link