crawlpy | Python web spider/crawler

 by   mdevboss Python Version: Current License: No License

kandi X-RAY | crawlpy Summary

kandi X-RAY | crawlpy Summary

crawlpy is a Python library. crawlpy has no bugs, it has no vulnerabilities and it has low support. However crawlpy build file is not available. You can download it from GitHub.

Is the site you want to crawl running on http or https?. The domain or subdomain you want to spider. Nothing outside this domain/subdomain will be touched. 0: Crawl indefinetely until every subpage has been reached.1: Only crawl links on the initial page.2: Crawl links on the initial page and everything found on the links of that page.Note: when you do a login, the login page already counts as one level of depth by scrapy itself, but this is rewritten internally to subtract that depth again, so your output will not show that extra depth. Each array string element is treated as a substring (no regex) and is checked against a FQDN. If any of the specified substrings is found in that URL, it will not be crawled.Note: It does make sense, when you login somewhere, to ignore the logout page, as well as other pages that might delete/disable your current user, so you will not be kicked from your login session during crawl time. By default scrapy ignores pages with status code other than 2xx, so if you know that a 403 page contains actual content with links, just add this here.Note: There is no need to specify 200, as scrapy crawls them by default. true: Do a login prior crawlingfalse: do not login**Note:**When login is set to false, you do not need to fill in the rest of the variables inside the login section. Method required to execute the login. Relative login page (from the base domain, including leading slash) where the post or get will go to. A string that is found on the login page, when the login fails. post or get params. POST or GET params required to login.Examples: username, password, hidden-field-name. true: Login page has a dynamic CSRF token that you want to read out and submit along the normal submit data.false: Login does not require a CSRF token to be submitted.Note: If the login has a static (never-changing) CSRF field, just add the data into the fields sectionNote: Read below about built-in automatic CSRF detection and leave this off at first. The name of the input field which holds the CSRF token. true: Save webpages to diskfalse: Do not save webpages to disk. Absolute or relative path to store webpages to disk.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              crawlpy has a low active ecosystem.
              It has 0 star(s) with 0 fork(s). There are no watchers for this library.
              OutlinedDot
              It had no major release in the last 6 months.
              crawlpy has no issues reported. There are no pull requests.
              It has a neutral sentiment in the developer community.
              The latest version of crawlpy is current.

            kandi-Quality Quality

              crawlpy has no bugs reported.

            kandi-Security Security

              crawlpy has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

            kandi-License License

              crawlpy does not have a standard license declared.
              Check the repository for any license declaration and review the terms closely.
              OutlinedDot
              Without a license, all rights are reserved, and you cannot use the library in your applications.

            kandi-Reuse Reuse

              crawlpy releases are not available. You will need to build from source code and install.
              crawlpy has no build file. You will be need to create the build yourself to build the component from source.
              Installation instructions are not available. Examples and code snippets are available.

            Top functions reviewed by kandi - BETA

            kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
            Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of crawlpy
            Get all kandi verified functions for this library.

            crawlpy Key Features

            No Key Features are available at this moment for crawlpy.

            crawlpy Examples and Code Snippets

            No Code Snippets are available at this moment for crawlpy.

            Community Discussions

            No Community Discussions are available at this moment for crawlpy.Refer to stack overflow page for discussions.

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install crawlpy

            You can download it from GitHub.
            You can use crawlpy like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/mdevboss/crawlpy.git

          • CLI

            gh repo clone mdevboss/crawlpy

          • sshUrl

            git@github.com:mdevboss/crawlpy.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link