scrapyd | A service daemon to run Scrapy spiders | Continuous Deployment library
kandi X-RAY | scrapyd Summary
kandi X-RAY | scrapyd Summary
A service daemon to run Scrapy spiders
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Creates an application instance
- Get the value of a method
- Creates an instance of ScrapydResourceWrapper
- Render a list of jobs
- Convert a dict to native string representation
- Convert unicode to native str
- Render a spider
- Get a list of sphinx
- Spawn a new process
- Returns a list of command line arguments for the crawler
- Context manager for Scrapy
- Activate scraper
- List installed egg versions
- Put message into queue
- Start the scraper service
- List spiders
- Render a project
- Called when the process finished
- Return an application instance
- Render the document
- Delete a project
- Render a resource
- Return the maximum number of CPU cores
- Update a new egg
- Render a node
- Mark a process finished
scrapyd Key Features
scrapyd Examples and Code Snippets
# 启动服务
$ gunicorn --config gunicorn.conf.py spider_admin_pro.run:app
# -*- coding: utf-8 -*-
"""
$ gunicorn --config gunicorn.conf.py spider_admin_pro.run:app
"""
import multiprocessing
import os
from gevent import monkey
monkey.patch_all()
# 日
yaml配置文件 > env环境变量 > 默认配置
# flask 服务配置
PORT = 5002
HOST = '127.0.0.1'
# 登录账号密码
USERNAME = admin
PASSWORD = "123456"
JWT_KEY = FU0qnuV4t8rr1pvg93NZL3DLn6sHrR1sCQqRzachbo0=
# token过期时间,单位天
EXPIRES = 7
# scrapyd地址, 结尾不要加斜杆
SCRAPYD_SERVER =
[scrapyd]
application = scrapyd_mongodb.application.get_application
...
[scrapyd]
mongodb_name = scrapyd_mongodb
mongodb_host = 127.0.0.1
mongodb_port = 27017
mongodb_user = custom_user # (Optional)
mongodb_pass = custompwd # (Optional)
...
# init.py
import os
import io
PORT = os.environ['PORT']
with io.open("scrapyd.conf", 'r+', encoding='utf-8') as f:
f.read()
f.write(u'\nhttp_port = %s\n' % PORT)
SCRAPYD_SERVERS: "scrapyd_node_1:6800,scrapyd_node_2:6801,scrapyd_node_3:6802"
SCRAPYD_SERVERS: "scrapyd_node_1:6800,scrapyd_node_2:6800,scrapyd_node_3:6800"
ports:
- "6801:6800"
driver.execute_script("grecaptcha.render('dCF_input', {
sitekey: '6LdC3UgUAAAAAJIcyA3Ym4j_nCP-ainSgf1NoFku',
callback: distilCallbackGuard('distilCaptchaDoneCallback')})"
)
def make_folder(name):
try:
mkdir(name)
except FileExistsError:
pass
def make_folder(name):
if name not in listdir(path):
mkdir(name)
import sys
for arg in sys.argv:
print(arg)
sudo ufw status
sudo ufw allow 6800/tcp
sudo ufw reload
bind_address=0.0.0.0
bind_address=127.x.x.x
nohup scrapyd >& /dev/null &
AttributeError: module 'importlib._bootstrap' has no attribute 'SourceFileLoader'
Community Discussions
Trending Discussions on scrapyd
QUESTION
I created a simple python app on Heroku to launch scrapyd. The scrapyd service starts, but it launches on port 6800. Heroku requires you to bind it the $PORT variable, and I was able to run the heroku app locally. The logs from the process are included below. I looked at a package scrapy-heroku, but wasn't able to install it due to errors. The code in app.py of this package seems to provide some clues as to how it can be done. How can I implement this as a python command to start scrapyd on the port provided by Heroku?
Procfile:
...ANSWER
Answered 2022-Jan-24 at 06:29You just need to read the PORT environment variable and write it into your scrapyd config file. You can check out this code that does the same.
QUESTION
I am trying to change setting from command line while starting a scrapy crawler (Python 3.7). Therefore I am adding a init method, but I could not figure out how to change the class varible "delay" from within the init method.
Example minimal:
...ANSWER
Answered 2021-Nov-08 at 20:06I am trying to change setting from command line while starting a scrapy crawler (Python 3.7). Therefore I am adding a init method...
...
scrapy crawl test -a delay=5
According to scrapy docs. (Settings/Command line options section) it is requred to use
-s
parameter to update setting
scrapy crawl test -s DOWNLOAD_DELAY=5
It is not possible to update settings during runtime in spider code from
init
or other methods (details in related discussion on github Update spider settings during runtime #4196
QUESTION
I'm trying to scrape a specific website. The code I'm using to scrape it is the same as that being used to scrape many other sites successfully.
However, the resulting response.body
looks completely corrupt (segment below):
ANSWER
Answered 2021-May-12 at 12:48Thanks to Serhii's suggestion, I found that the issue was due to "accept-encoding": "gzip, deflate, br"
: I accepted compressed sites but did not handle them in scrapy.
Adding scrapy.downloadermiddlewares.httpcompression
or simply removing the accept-encoding
line fixes the issue.
QUESTION
I tried to run a couple of scrapyd services to have a simple cluster on my localhost, but only the first node works. For 2 others I get the following error
...ANSWER
Answered 2020-Nov-17 at 08:07The problem is in line:
QUESTION
The problem I had is I can't upload my .egg
file to scrapyd using
curl http://127.0.0.1:6800/addversion.json -F project=scraper_app -F version=r1 egg=@scraper_app-0.0.1-py3.8.egg
its returning an error message like this
{"node_name": "Workspace", "status": "error", "message": "b'egg'"}
So I'm using Django
and Scrapy
in the same project, and I had this folder structure
ANSWER
Answered 2020-May-31 at 14:55after I googled it more, and tried the scrapyd-client
but there are lots of problem with windows, it doesnt easy to use the scrapyd-deploy
, but I found a video on youtube that show me what is the correct way to install the scrapyd-client
.
so here is the correct way to install it.
Make sure you inside a
virtualenv
, and then install thescrapyd-client
withpip install git+https://github.com/scrapy/scrapyd.git
. So it doesnt show any error or any difficulties to install it
and then you can just run scrapyd-deploy
on the scrapy project folder.
QUESTION
I use scrapy to parse the site. Scrapy version 2.1.0 when I try to make an additional request:
...ANSWER
Answered 2020-May-19 at 13:32I think the problem here is that you are passing cb_kwargs to Request which Request in turn doesn't accept. From what I understand, cb_kwargs is new in Scrapy version 1.7, so you should check again if ScrapyD in your case is working with a version of Scrapy >= 1.7. Alternatively, to pass data to your callback, you could use Request's meta attribute.
QUESTION
It apears that either the documentation of scrapyd is wrong or that there is a bug. I want to retrieve the list of spiders from a deployed project. the docs tell me to do it this way:
...ANSWER
Answered 2020-May-09 at 07:53Maybe the url needs to be wrapped in double-quotes, Try
QUESTION
I am trying to deploy with scrapyd-deploy to a remote scrapyd server, which failes without error message:
...ANSWER
Answered 2020-May-08 at 05:47Moved from comment to answer:
In a similar sounding issue with the same error-code the problem was that Twisted
versions earlier than 18.9 don't support python-3.7. If you are using python-3.7 and your twisted version is below 18.9, try upgrading twisted to at least version 18.9:
QUESTION
I am trying to deploy a scrapy project via scrapyd-deploy to a remote scrapyd server. The project itself is functional and works perfectly on my local machine and on the remote server when I deploy it via git push prod to the remote server.
With scrapyd-deploy I get this error:
...% scrapyd-deploy example -p apo
ANSWER
Answered 2020-May-06 at 05:06os.mkdir
might be failing because it cannot create nested directories. You can useos.makedirs(img_dir, exist_ok=True)
instead.os.path.dirname(__file__)
will point to /tmp/... under scrapyd. Not sure if this is what you want. I would use an absolute path without callingos.path.dirname(__file__)
if you don't want images to be downloaded under /tmp.
QUESTION
I am trying to schedule a scrapy 2.1.0 spider with the help of scrapyd 1.2
...ANSWER
Answered 2020-May-05 at 04:02Before you can launch your spider with scrapyd, you'll have to deploy your spider first. You can do this by:
- Using addversion.json (https://scrapyd.readthedocs.io/en/latest/api.html#addversion-json)
- Using scrapyd-deploy (https://github.com/scrapy/scrapyd-client)
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install scrapyd
You can use scrapyd like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page