muffet | A perl/moose spider. | Regex library
kandi X-RAY | muffet Summary
kandi X-RAY | muffet Summary
muffet is a web spider written in perl and moose. the problem that i was trying to solve was to spider newint.org to make a xapian index file, so its targetted at that usage. however, it can spit out xml for a google sitemap or raw text for debugging as well. bear in mind it doesn't respect robots.txt. however you can use a xpath_noindex in pages you want nofollowed. you can also specify the skip_urls parameter, which does a regex match and skips matching urls. i don't have the time to offer any kind of support, but i occasionally fix bugs or add features. usage: muffet.pl [-?] [long options...] -? --usage --help prints this usage information. --user user to attempt http auth with --pass password to attempt http auth with --format output in this format currently xapiani, sitemap or raw --xpath_body where to look for our body data as an xpath expression --xpath_sample where to look for our summary data as an xpath expression --xpath_category where to look for our title as an xpath expression --xpath_tags where to look for our tags as an xpath expression --xpath_title where to look for our title in html docs as an xpath expression --xpath_modified where to look for our modification time as an xpath expression --xpath_noindex where to look for our noindex elements as an xpath expression --xap_db_file database file for xapian output --xap_tmp_file temp file for xapian output defaults to
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of muffet
muffet Key Features
muffet Examples and Code Snippets
Community Discussions
Trending Discussions on muffet
QUESTION
I have a doc project where I have to test a massive ammount of urls and links. To do that I was using python 2 linkchecker. I upgraded to django 2.2 and python 3.6 and I am using a go binary called muffet (https://github.com/raviqqe/muffet).
linkchecker was gentle with the server, muffet on the other hand is more brutal (even with timeout options and other settings).
The problem I have is after some time, the requests timeout and the django local server crashes.
I heard about somme kind of queue or cache for the local django server.
Is anyone knows how to increase the django limit in order not to DDOS myself while I am running my tests before deployment (this tool is not running in production).
Or any out of the box thinking to solve this.
Just for you to know, I run the server in background, and call the tool on the localhost url. (from another terminal)
Thanks
Edit: https://github.com/django/django/blob/fba5d3b6e63fe4a07b1aa133186f997eeebf9aeb/django/core/servers/basehttp.py#L58 this seems something I can play with?
...ANSWER
Answered 2020-Feb-26 at 10:40It seems, like using a proper server within the container and runing muffet
command like muffet -t 30 -c 30 http://127.0.0.1
solve the issue.
Thanks to @Antwane to point me in the right direction ;)
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install muffet
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page