rcrawl | Rcrawl is a web crawler written entirely in ruby. It's limited right now by the fact that it will s
kandi X-RAY | rcrawl Summary
kandi X-RAY | rcrawl Summary
rcrawl is a Ruby library. rcrawl has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. You can download it from GitHub.
Remove an absolute URL from the URL Server. Download corresponding document from the internet, grabbing and processing robots.txt first, if available. Feed the document into a rewind input stream(ris) to be read/re-read as needed. Based on MIME type, invoke the process method of the. processing module associated with that MIME type. For example, a link extractor or tag counter module for text/html MIME types, or a gif stats module for image/gif. By default, all text/html MIME types will pass through the link extractor. Each link will be converted to an absolute URL and tested against a (ideally user-supplied) URL filter to determine if it should be downloaded. If the URL passes the filter (currently hard coded as Same Domain?), then call the URL-seen? test. Has the URL been seen before? Namely, is it in the URL Server or has it been downloaded already? If the URL is new, it is added to the URL Server. Back to step 1, repeat until the URL Server is empty.
Remove an absolute URL from the URL Server. Download corresponding document from the internet, grabbing and processing robots.txt first, if available. Feed the document into a rewind input stream(ris) to be read/re-read as needed. Based on MIME type, invoke the process method of the. processing module associated with that MIME type. For example, a link extractor or tag counter module for text/html MIME types, or a gif stats module for image/gif. By default, all text/html MIME types will pass through the link extractor. Each link will be converted to an absolute URL and tested against a (ideally user-supplied) URL filter to determine if it should be downloaded. If the URL passes the filter (currently hard coded as Same Domain?), then call the URL-seen? test. Has the URL been seen before? Namely, is it in the URL Server or has it been downloaded already? If the URL is new, it is added to the URL Server. Back to step 1, repeat until the URL Server is empty.
Support
Quality
Security
License
Reuse
Support
rcrawl has a low active ecosystem.
It has 0 star(s) with 0 fork(s). There are 1 watchers for this library.
It had no major release in the last 6 months.
rcrawl has no issues reported. There are no pull requests.
It has a neutral sentiment in the developer community.
The latest version of rcrawl is current.
Quality
rcrawl has no bugs reported.
Security
rcrawl has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
License
rcrawl is licensed under the MIT License. This license is Permissive.
Permissive licenses have the least restrictions, and you can use them in most projects.
Reuse
rcrawl releases are not available. You will need to build from source code and install.
Installation instructions are not available. Examples and code snippets are available.
Top functions reviewed by kandi - BETA
kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of rcrawl
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of rcrawl
rcrawl Key Features
No Key Features are available at this moment for rcrawl.
rcrawl Examples and Code Snippets
No Code Snippets are available at this moment for rcrawl.
Community Discussions
No Community Discussions are available at this moment for rcrawl.Refer to stack overflow page for discussions.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install rcrawl
You can download it from GitHub.
On a UNIX-like operating system, using your system’s package manager is easiest. However, the packaged Ruby version may not be the newest one. There is also an installer for Windows. Managers help you to switch between multiple Ruby versions on your system. Installers can be used to install a specific or multiple Ruby versions. Please refer ruby-lang.org for more information.
On a UNIX-like operating system, using your system’s package manager is easiest. However, the packaged Ruby version may not be the newest one. There is also an installer for Windows. Managers help you to switch between multiple Ruby versions on your system. Installers can be used to install a specific or multiple Ruby versions. Please refer ruby-lang.org for more information.
Support
For any new features, suggestions and bugs create an issue on GitHub.
If you have any questions check and ask questions on community page Stack Overflow .
Find more information at:
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page