scrapyd | A service daemon to run Scrapy spiders | Continuous Deployment library

by scrapy Python Version: 1.4.3 License: BSD-3-Clause

X-Ray Key Features Code Snippets(10)Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | scrapyd Summary

scrapyd is a Python library typically used in Devops, Continuous Deployment, Docker applications. scrapyd has no bugs, it has no vulnerabilities, it has build file available, it has a Permissive License and it has high support. You can install using 'pip install scrapyd' or download it from GitHub, PyPI.

A service daemon to run Scrapy spiders

Support

Quality

Security

License

Reuse

Support

scrapyd has a highly active ecosystem.

It has 2656 star(s) with 558 fork(s). There are 90 watchers for this library.

It had no major release in the last 12 months.

There are 23 open issues and 254 have been closed. On average issues are closed in 500 days. There are 5 open pull requests and 0 closed requests.

It has a negative sentiment in the developer community.

The latest version of scrapyd is 1.4.3

Quality

scrapyd has 0 bugs and 0 code smells.

Security

scrapyd has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

scrapyd code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

scrapyd is licensed under the BSD-3-Clause License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

scrapyd releases are available to install and integrate.

Deployable package is available in PyPI.

Build file is available. You can build the component from source.

scrapyd saves you 671 person hours of effort in developing the same functionality from scratch.

It has 2142 lines of code, 268 functions and 41 files.

It has medium code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi has reviewed scrapyd and discovered the below as its top functions. This is intended to give you an instant insight into scrapyd implemented functionality, and help decide if they suit your requirements.

Creates an application instance
Get the value of a method
Creates an instance of ScrapydResourceWrapper
Render a list of jobs
Convert a dict to native string representation
Convert unicode to native str
Render a spider
Get a list of sphinx
Spawn a new process
Returns a list of command line arguments for the crawler
Context manager for Scrapy
Activate scraper
List installed egg versions
Put message into queue
Start the scraper service
List spiders
Render a project
Called when the process finished
Return an application instance
Render the document
Delete a project
Render a resource
Return the maximum number of CPU cores
Update a new egg
Render a node
Mark a process finished

Get all kandi verified functions for this library.

scrapyd Key Features

No Key Features are available at this moment for scrapyd.

scrapyd Examples and Code Snippets

Spider Admin Pro,部署优化

Python

Lines of Code : 86

License : No License

Copy

# 启动服务
$ gunicorn --config gunicorn.conf.py spider_admin_pro.run:app

# -*- coding: utf-8 -*-

"""
$ gunicorn --config gunicorn.conf.py spider_admin_pro.run:app
"""

import multiprocessing
import os

from gevent import monkey

monkey.patch_all()

# 日

Spider Admin Pro,配置参数

Python

Lines of Code : 57

License : No License

Copy

yaml配置文件 >  env环境变量 > 默认配置 


# flask 服务配置
PORT = 5002
HOST = '127.0.0.1'

# 登录账号密码
USERNAME = admin
PASSWORD = "123456"
JWT_KEY = FU0qnuV4t8rr1pvg93NZL3DLn6sHrR1sCQqRzachbo0=

# token过期时间，单位天
EXPIRES = 7

# scrapyd地址, 结尾不要加斜杆
SCRAPYD_SERVER =

Scrapyd-mongodb,Config

Python

Lines of Code : 13

License : Permissive (MIT)

Copy


[scrapyd]
application = scrapyd_mongodb.application.get_application
...


[scrapyd]
mongodb_name = scrapyd_mongodb
mongodb_host = 127.0.0.1
mongodb_port = 27017
mongodb_user = custom_user  # (Optional)
mongodb_pass = custompwd  # (Optional)
...

How to bind Heroku port to scrapyd

Python

Lines of Code : 9

License : Strong Copyleft (CC BY-SA 4.0)

Copy

# init.py
import os
import io

PORT = os.environ['PORT']
with io.open("scrapyd.conf", 'r+', encoding='utf-8') as f:
    f.read()
    f.write(u'\nhttp_port = %s\n' % PORT)

ScrapydWeb: Connection refused within docker-compose

Python

Lines of Code : 7

License : Strong Copyleft (CC BY-SA 4.0)

Copy

SCRAPYD_SERVERS: "scrapyd_node_1:6800,scrapyd_node_2:6801,scrapyd_node_3:6802"

SCRAPYD_SERVERS: "scrapyd_node_1:6800,scrapyd_node_2:6800,scrapyd_node_3:6800"

    ports:
      - "6801:6800"

Unable to download scrapyd on ec2

Python

Lines of Code : 5

License : Strong Copyleft (CC BY-SA 4.0)

Copy

driver.execute_script("grecaptcha.render('dCF_input', {
                        sitekey: '6LdC3UgUAAAAAJIcyA3Ym4j_nCP-ainSgf1NoFku',
                        callback: distilCallbackGuard('distilCaptchaDoneCallback')})"
)

Scrapyd: How to write data to json file?

Python

Lines of Code : 10

License : Strong Copyleft (CC BY-SA 4.0)

Copy

def make_folder(name):
    try:
        mkdir(name)
    except FileExistsError:
        pass

def make_folder(name):
    if name not in listdir(path):
        mkdir(name)

How to pass command-line keyword argument to class variable in scrapyd?

Python

Lines of Code : 4

License : Strong Copyleft (CC BY-SA 4.0)

Copy

import sys
for arg in sys.argv:
    print(arg)

I can't access scrapyd port 6800 from browser

Python

Lines of Code : 11

License : Strong Copyleft (CC BY-SA 4.0)

Copy

sudo ufw status

sudo ufw allow 6800/tcp
sudo ufw reload

bind_address=0.0.0.0

bind_address=127.x.x.x

nohup scrapyd >& /dev/null &

Push project to Heroku has been rejected if I change Python version

Python

Lines of Code : 2

License : Strong Copyleft (CC BY-SA 4.0)

Copy

AttributeError: module 'importlib._bootstrap' has no attribute 'SourceFileLoader'

Community Discussions

Trending Discussions on scrapyd

How to bind Heroku port to scrapyd

How to set class variable through __init__ in Python?

Scrapyd corrupting response?

ScrapydWeb: Connection refused within docker-compose

Can't add a .egg file to scrapyd addversion.json

Scrapy error: TypeError: __init__() got an unexpected keyword argument 'cb_kwargs'

Scrapyd: How to retrieve spiders or version of a scrapyd project?

scrapyd-deploy with "deploy failed (400)"

Scrapyd-Deploy: Errors due to using os path to set directory

Scrapyd: No active project - How to schedule spiders with scrapyd

QUESTION

How to bind Heroku port to scrapyd

Asked 2022-Jan-24 at 06:29

I created a simple python app on Heroku to launch scrapyd. The scrapyd service starts, but it launches on port 6800. Heroku requires you to bind it the $PORT variable, and I was able to run the heroku app locally. The logs from the process are included below. I looked at a package scrapy-heroku, but wasn't able to install it due to errors. The code in app.py of this package seems to provide some clues as to how it can be done. How can I implement this as a python command to start scrapyd on the port provided by Heroku?

Procfile:

...

ANSWER

Answered 2022-Jan-24 at 06:29

You just need to read the PORT environment variable and write it into your scrapyd config file. You can check out this code that does the same.

Source https://stackoverflow.com/questions/70829576

QUESTION

How to set class variable through __init__ in Python?

Asked 2021-Nov-08 at 20:06

I am trying to change setting from command line while starting a scrapy crawler (Python 3.7). Therefore I am adding a init method, but I could not figure out how to change the class varible "delay" from within the init method.

Example minimal:

...

ANSWER

Answered 2021-Nov-08 at 20:06

I am trying to change setting from command line while starting a scrapy crawler (Python 3.7). Therefore I am adding a init method...
...
scrapy crawl test -a delay=5

According to scrapy docs. (Settings/Command line options section) it is requred to use -s parameter to update setting
scrapy crawl test -s DOWNLOAD_DELAY=5
It is not possible to update settings during runtime in spider code from init or other methods (details in related discussion on github Update spider settings during runtime #4196

Source https://stackoverflow.com/questions/69882916

QUESTION

Scrapyd corrupting response?

Asked 2021-May-12 at 12:48

I'm trying to scrape a specific website. The code I'm using to scrape it is the same as that being used to scrape many other sites successfully.

However, the resulting response.body looks completely corrupt (segment below):

...

ANSWER

Answered 2021-May-12 at 12:48

Thanks to Serhii's suggestion, I found that the issue was due to "accept-encoding": "gzip, deflate, br": I accepted compressed sites but did not handle them in scrapy.

Adding scrapy.downloadermiddlewares.httpcompression or simply removing the accept-encoding line fixes the issue.

Source https://stackoverflow.com/questions/67434926

QUESTION

ScrapydWeb: Connection refused within docker-compose

Asked 2020-Nov-17 at 08:07

I tried to run a couple of scrapyd services to have a simple cluster on my localhost, but only the first node works. For 2 others I get the following error

...

ANSWER

Answered 2020-Nov-17 at 08:07

The problem is in line:

Source https://stackoverflow.com/questions/64870964

QUESTION

Can't add a .egg file to scrapyd addversion.json

Asked 2020-May-31 at 14:55

The problem I had is I can't upload my .egg file to scrapyd using

curl http://127.0.0.1:6800/addversion.json -F project=scraper_app -F version=r1 egg=@scraper_app-0.0.1-py3.8.egg

its returning an error message like this

{"node_name": "Workspace", "status": "error", "message": "b'egg'"}

So I'm using Django and Scrapy in the same project, and I had this folder structure

...

ANSWER

Answered 2020-May-31 at 14:55

after I googled it more, and tried the scrapyd-client but there are lots of problem with windows, it doesnt easy to use the scrapyd-deploy, but I found a video on youtube that show me what is the correct way to install the scrapyd-client.

so here is the correct way to install it.

Make sure you inside a virtualenv, and then install the scrapyd-client with pip install git+https://github.com/scrapy/scrapyd.git. So it doesnt show any error or any difficulties to install it

and then you can just run scrapyd-deploy on the scrapy project folder.

Source https://stackoverflow.com/questions/62115241

QUESTION

Scrapy error: TypeError: __init__() got an unexpected keyword argument 'cb_kwargs'

Asked 2020-May-19 at 13:32

I use scrapy to parse the site. Scrapy version 2.1.0 when I try to make an additional request:

...

ANSWER

Answered 2020-May-19 at 13:32

I think the problem here is that you are passing cb_kwargs to Request which Request in turn doesn't accept. From what I understand, cb_kwargs is new in Scrapy version 1.7, so you should check again if ScrapyD in your case is working with a version of Scrapy >= 1.7. Alternatively, to pass data to your callback, you could use Request's meta attribute.

Source https://stackoverflow.com/questions/61887797

QUESTION

Scrapyd: How to retrieve spiders or version of a scrapyd project?

Asked 2020-May-09 at 07:53

It apears that either the documentation of scrapyd is wrong or that there is a bug. I want to retrieve the list of spiders from a deployed project. the docs tell me to do it this way:

...

ANSWER

Answered 2020-May-09 at 07:53

Maybe the url needs to be wrapped in double-quotes, Try

Source https://stackoverflow.com/questions/61692350

QUESTION

scrapyd-deploy with "deploy failed (400)"

Asked 2020-May-08 at 05:47

I am trying to deploy with scrapyd-deploy to a remote scrapyd server, which failes without error message:

...

ANSWER

Answered 2020-May-08 at 05:47

Moved from comment to answer:
In a similar sounding issue with the same error-code the problem was that Twisted versions earlier than 18.9 don't support python-3.7. If you are using python-3.7 and your twisted version is below 18.9, try upgrading twisted to at least version 18.9:

Source https://stackoverflow.com/questions/61666296

QUESTION

Scrapyd-Deploy: Errors due to using os path to set directory

Asked 2020-May-06 at 05:06

I am trying to deploy a scrapy project via scrapyd-deploy to a remote scrapyd server. The project itself is functional and works perfectly on my local machine and on the remote server when I deploy it via git push prod to the remote server.

With scrapyd-deploy I get this error:

% scrapyd-deploy example -p apo

...

ANSWER

Answered 2020-May-06 at 05:06

os.mkdir might be failing because it cannot create nested directories. You can use os.makedirs(img_dir, exist_ok=True) instead.
os.path.dirname(__file__) will point to /tmp/... under scrapyd. Not sure if this is what you want. I would use an absolute path without calling os.path.dirname(__file__) if you don't want images to be downloaded under /tmp.

Source https://stackoverflow.com/questions/61620407

QUESTION

Scrapyd: No active project - How to schedule spiders with scrapyd

Asked 2020-May-05 at 04:02

I am trying to schedule a scrapy 2.1.0 spider with the help of scrapyd 1.2

...

ANSWER

Answered 2020-May-05 at 04:02

Before you can launch your spider with scrapyd, you'll have to deploy your spider first. You can do this by:

Using addversion.json (https://scrapyd.readthedocs.io/en/latest/api.html#addversion-json)
Using scrapyd-deploy (https://github.com/scrapy/scrapyd-client)

Source https://stackoverflow.com/questions/61602283

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install scrapyd

You can install using 'pip install scrapyd' or download it from GitHub, PyPI.
You can use scrapyd like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: