crawlpy | Python web spider/crawler
kandi X-RAY | crawlpy Summary
kandi X-RAY | crawlpy Summary
crawlpy is a Python library. crawlpy has no bugs, it has no vulnerabilities and it has low support. However crawlpy build file is not available. You can download it from GitHub.
Is the site you want to crawl running on http or https?. The domain or subdomain you want to spider. Nothing outside this domain/subdomain will be touched. 0: Crawl indefinetely until every subpage has been reached.1: Only crawl links on the initial page.2: Crawl links on the initial page and everything found on the links of that page.Note: when you do a login, the login page already counts as one level of depth by scrapy itself, but this is rewritten internally to subtract that depth again, so your output will not show that extra depth. Each array string element is treated as a substring (no regex) and is checked against a FQDN. If any of the specified substrings is found in that URL, it will not be crawled.Note: It does make sense, when you login somewhere, to ignore the logout page, as well as other pages that might delete/disable your current user, so you will not be kicked from your login session during crawl time. By default scrapy ignores pages with status code other than 2xx, so if you know that a 403 page contains actual content with links, just add this here.Note: There is no need to specify 200, as scrapy crawls them by default. true: Do a login prior crawlingfalse: do not login**Note:**When login is set to false, you do not need to fill in the rest of the variables inside the login section. Method required to execute the login. Relative login page (from the base domain, including leading slash) where the post or get will go to. A string that is found on the login page, when the login fails. post or get params. POST or GET params required to login.Examples: username, password, hidden-field-name. true: Login page has a dynamic CSRF token that you want to read out and submit along the normal submit data.false: Login does not require a CSRF token to be submitted.Note: If the login has a static (never-changing) CSRF field, just add the data into the fields sectionNote: Read below about built-in automatic CSRF detection and leave this off at first. The name of the input field which holds the CSRF token. true: Save webpages to diskfalse: Do not save webpages to disk. Absolute or relative path to store webpages to disk.
Is the site you want to crawl running on http or https?. The domain or subdomain you want to spider. Nothing outside this domain/subdomain will be touched. 0: Crawl indefinetely until every subpage has been reached.1: Only crawl links on the initial page.2: Crawl links on the initial page and everything found on the links of that page.Note: when you do a login, the login page already counts as one level of depth by scrapy itself, but this is rewritten internally to subtract that depth again, so your output will not show that extra depth. Each array string element is treated as a substring (no regex) and is checked against a FQDN. If any of the specified substrings is found in that URL, it will not be crawled.Note: It does make sense, when you login somewhere, to ignore the logout page, as well as other pages that might delete/disable your current user, so you will not be kicked from your login session during crawl time. By default scrapy ignores pages with status code other than 2xx, so if you know that a 403 page contains actual content with links, just add this here.Note: There is no need to specify 200, as scrapy crawls them by default. true: Do a login prior crawlingfalse: do not login**Note:**When login is set to false, you do not need to fill in the rest of the variables inside the login section. Method required to execute the login. Relative login page (from the base domain, including leading slash) where the post or get will go to. A string that is found on the login page, when the login fails. post or get params. POST or GET params required to login.Examples: username, password, hidden-field-name. true: Login page has a dynamic CSRF token that you want to read out and submit along the normal submit data.false: Login does not require a CSRF token to be submitted.Note: If the login has a static (never-changing) CSRF field, just add the data into the fields sectionNote: Read below about built-in automatic CSRF detection and leave this off at first. The name of the input field which holds the CSRF token. true: Save webpages to diskfalse: Do not save webpages to disk. Absolute or relative path to store webpages to disk.
Support
Quality
Security
License
Reuse
Support
crawlpy has a low active ecosystem.
It has 0 star(s) with 0 fork(s). There are no watchers for this library.
It had no major release in the last 6 months.
crawlpy has no issues reported. There are no pull requests.
It has a neutral sentiment in the developer community.
The latest version of crawlpy is current.
Quality
crawlpy has no bugs reported.
Security
crawlpy has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
License
crawlpy does not have a standard license declared.
Check the repository for any license declaration and review the terms closely.
Without a license, all rights are reserved, and you cannot use the library in your applications.
Reuse
crawlpy releases are not available. You will need to build from source code and install.
crawlpy has no build file. You will be need to create the build yourself to build the component from source.
Installation instructions are not available. Examples and code snippets are available.
Top functions reviewed by kandi - BETA
kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of crawlpy
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of crawlpy
crawlpy Key Features
No Key Features are available at this moment for crawlpy.
crawlpy Examples and Code Snippets
No Code Snippets are available at this moment for crawlpy.
Community Discussions
No Community Discussions are available at this moment for crawlpy.Refer to stack overflow page for discussions.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install crawlpy
You can download it from GitHub.
You can use crawlpy like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
You can use crawlpy like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
For any new features, suggestions and bugs create an issue on GitHub.
If you have any questions check and ask questions on community page Stack Overflow .
Find more information at:
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page