scrapyd | A service daemon to run Scrapy spiders | Continuous Deployment library

 by   scrapy Python Version: 1.4.3 License: BSD-3-Clause

kandi X-RAY | scrapyd Summary

kandi X-RAY | scrapyd Summary

scrapyd is a Python library typically used in Devops, Continuous Deployment, Docker applications. scrapyd has no bugs, it has no vulnerabilities, it has build file available, it has a Permissive License and it has high support. You can install using 'pip install scrapyd' or download it from GitHub, PyPI.

A service daemon to run Scrapy spiders
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              scrapyd has a highly active ecosystem.
              It has 2656 star(s) with 558 fork(s). There are 90 watchers for this library.
              OutlinedDot
              It had no major release in the last 12 months.
              There are 23 open issues and 254 have been closed. On average issues are closed in 500 days. There are 5 open pull requests and 0 closed requests.
              OutlinedDot
              It has a negative sentiment in the developer community.
              The latest version of scrapyd is 1.4.3

            kandi-Quality Quality

              scrapyd has 0 bugs and 0 code smells.

            kandi-Security Security

              scrapyd has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              scrapyd code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              scrapyd is licensed under the BSD-3-Clause License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              scrapyd releases are available to install and integrate.
              Deployable package is available in PyPI.
              Build file is available. You can build the component from source.
              scrapyd saves you 671 person hours of effort in developing the same functionality from scratch.
              It has 2142 lines of code, 268 functions and 41 files.
              It has medium code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi has reviewed scrapyd and discovered the below as its top functions. This is intended to give you an instant insight into scrapyd implemented functionality, and help decide if they suit your requirements.
            • Creates an application instance
            • Get the value of a method
            • Creates an instance of ScrapydResourceWrapper
            • Render a list of jobs
            • Convert a dict to native string representation
            • Convert unicode to native str
            • Render a spider
            • Get a list of sphinx
            • Spawn a new process
            • Returns a list of command line arguments for the crawler
            • Context manager for Scrapy
            • Activate scraper
            • List installed egg versions
            • Put message into queue
            • Start the scraper service
            • List spiders
            • Render a project
            • Called when the process finished
            • Return an application instance
            • Render the document
            • Delete a project
            • Render a resource
            • Return the maximum number of CPU cores
            • Update a new egg
            • Render a node
            • Mark a process finished
            Get all kandi verified functions for this library.

            scrapyd Key Features

            No Key Features are available at this moment for scrapyd.

            scrapyd Examples and Code Snippets

            Spider Admin Pro,部署优化
            Pythondot img1Lines of Code : 86dot img1no licencesLicense : No License
            copy iconCopy
            # 启动服务
            $ gunicorn --config gunicorn.conf.py spider_admin_pro.run:app
            
            # -*- coding: utf-8 -*-
            
            """
            $ gunicorn --config gunicorn.conf.py spider_admin_pro.run:app
            """
            
            import multiprocessing
            import os
            
            from gevent import monkey
            
            monkey.patch_all()
            
            # 日  
            Spider Admin Pro,配置参数
            Pythondot img2Lines of Code : 57dot img2no licencesLicense : No License
            copy iconCopy
            yaml配置文件 >  env环境变量 > 默认配置 
            
            
            # flask 服务配置
            PORT = 5002
            HOST = '127.0.0.1'
            
            # 登录账号密码
            USERNAME = admin
            PASSWORD = "123456"
            JWT_KEY = FU0qnuV4t8rr1pvg93NZL3DLn6sHrR1sCQqRzachbo0=
            
            # token过期时间,单位天
            EXPIRES = 7
            
            # scrapyd地址, 结尾不要加斜杆
            SCRAPYD_SERVER =   
            Scrapyd-mongodb,Config
            Pythondot img3Lines of Code : 13dot img3License : Permissive (MIT)
            copy iconCopy
            
            [scrapyd]
            application = scrapyd_mongodb.application.get_application
            ...
            
            
            [scrapyd]
            mongodb_name = scrapyd_mongodb
            mongodb_host = 127.0.0.1
            mongodb_port = 27017
            mongodb_user = custom_user  # (Optional)
            mongodb_pass = custompwd  # (Optional)
            ...
            
              
            How to bind Heroku port to scrapyd
            Pythondot img4Lines of Code : 9dot img4License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            # init.py
            import os
            import io
            
            PORT = os.environ['PORT']
            with io.open("scrapyd.conf", 'r+', encoding='utf-8') as f:
                f.read()
                f.write(u'\nhttp_port = %s\n' % PORT)
            
            ScrapydWeb: Connection refused within docker-compose
            Pythondot img5Lines of Code : 7dot img5License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            SCRAPYD_SERVERS: "scrapyd_node_1:6800,scrapyd_node_2:6801,scrapyd_node_3:6802"
            
            SCRAPYD_SERVERS: "scrapyd_node_1:6800,scrapyd_node_2:6800,scrapyd_node_3:6800"
            
                ports:
                  - "6801:6800"
            
            Unable to download scrapyd on ec2
            Pythondot img6Lines of Code : 5dot img6License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            driver.execute_script("grecaptcha.render('dCF_input', {
                                    sitekey: '6LdC3UgUAAAAAJIcyA3Ym4j_nCP-ainSgf1NoFku',
                                    callback: distilCallbackGuard('distilCaptchaDoneCallback')})"
            )
            
            Scrapyd: How to write data to json file?
            Pythondot img7Lines of Code : 10dot img7License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            def make_folder(name):
                try:
                    mkdir(name)
                except FileExistsError:
                    pass
            
            def make_folder(name):
                if name not in listdir(path):
                    mkdir(name)
            
            How to pass command-line keyword argument to class variable in scrapyd?
            Pythondot img8Lines of Code : 4dot img8License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            import sys
            for arg in sys.argv:
                print(arg)
            
            I can't access scrapyd port 6800 from browser
            Pythondot img9Lines of Code : 11dot img9License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            sudo ufw status
            
            sudo ufw allow 6800/tcp
            sudo ufw reload
            
            bind_address=0.0.0.0
            
            bind_address=127.x.x.x
            
            nohup scrapyd >& /dev/null &
            Push project to Heroku has been rejected if I change Python version
            Pythondot img10Lines of Code : 2dot img10License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            AttributeError: module 'importlib._bootstrap' has no attribute 'SourceFileLoader'
            

            Community Discussions

            QUESTION

            How to bind Heroku port to scrapyd
            Asked 2022-Jan-24 at 06:29

            I created a simple python app on Heroku to launch scrapyd. The scrapyd service starts, but it launches on port 6800. Heroku requires you to bind it the $PORT variable, and I was able to run the heroku app locally. The logs from the process are included below. I looked at a package scrapy-heroku, but wasn't able to install it due to errors. The code in app.py of this package seems to provide some clues as to how it can be done. How can I implement this as a python command to start scrapyd on the port provided by Heroku?

            Procfile:

            ...

            ANSWER

            Answered 2022-Jan-24 at 06:29

            You just need to read the PORT environment variable and write it into your scrapyd config file. You can check out this code that does the same.

            Source https://stackoverflow.com/questions/70829576

            QUESTION

            How to set class variable through __init__ in Python?
            Asked 2021-Nov-08 at 20:06

            I am trying to change setting from command line while starting a scrapy crawler (Python 3.7). Therefore I am adding a init method, but I could not figure out how to change the class varible "delay" from within the init method.

            Example minimal:

            ...

            ANSWER

            Answered 2021-Nov-08 at 20:06

            I am trying to change setting from command line while starting a scrapy crawler (Python 3.7). Therefore I am adding a init method...
            ...
            scrapy crawl test -a delay=5

            1. According to scrapy docs. (Settings/Command line options section) it is requred to use -s parameter to update setting
              scrapy crawl test -s DOWNLOAD_DELAY=5

            2. It is not possible to update settings during runtime in spider code from init or other methods (details in related discussion on github Update spider settings during runtime #4196

            Source https://stackoverflow.com/questions/69882916

            QUESTION

            Scrapyd corrupting response?
            Asked 2021-May-12 at 12:48

            I'm trying to scrape a specific website. The code I'm using to scrape it is the same as that being used to scrape many other sites successfully.

            However, the resulting response.body looks completely corrupt (segment below):

            ...

            ANSWER

            Answered 2021-May-12 at 12:48

            Thanks to Serhii's suggestion, I found that the issue was due to "accept-encoding": "gzip, deflate, br": I accepted compressed sites but did not handle them in scrapy.

            Adding scrapy.downloadermiddlewares.httpcompression or simply removing the accept-encoding line fixes the issue.

            Source https://stackoverflow.com/questions/67434926

            QUESTION

            ScrapydWeb: Connection refused within docker-compose
            Asked 2020-Nov-17 at 08:07

            I tried to run a couple of scrapyd services to have a simple cluster on my localhost, but only the first node works. For 2 others I get the following error

            ...

            ANSWER

            Answered 2020-Nov-17 at 08:07

            The problem is in line:

            Source https://stackoverflow.com/questions/64870964

            QUESTION

            Can't add a .egg file to scrapyd addversion.json
            Asked 2020-May-31 at 14:55

            The problem I had is I can't upload my .egg file to scrapyd using

            curl http://127.0.0.1:6800/addversion.json -F project=scraper_app -F version=r1 egg=@scraper_app-0.0.1-py3.8.egg

            its returning an error message like this

            {"node_name": "Workspace", "status": "error", "message": "b'egg'"}

            So I'm using Django and Scrapy in the same project, and I had this folder structure

            ...

            ANSWER

            Answered 2020-May-31 at 14:55

            after I googled it more, and tried the scrapyd-client but there are lots of problem with windows, it doesnt easy to use the scrapyd-deploy, but I found a video on youtube that show me what is the correct way to install the scrapyd-client.

            so here is the correct way to install it.

            Make sure you inside a virtualenv, and then install the scrapyd-client with pip install git+https://github.com/scrapy/scrapyd.git. So it doesnt show any error or any difficulties to install it

            and then you can just run scrapyd-deploy on the scrapy project folder.

            Source https://stackoverflow.com/questions/62115241

            QUESTION

            Scrapy error: TypeError: __init__() got an unexpected keyword argument 'cb_kwargs'
            Asked 2020-May-19 at 13:32

            I use scrapy to parse the site. Scrapy version 2.1.0 when I try to make an additional request:

            ...

            ANSWER

            Answered 2020-May-19 at 13:32

            I think the problem here is that you are passing cb_kwargs to Request which Request in turn doesn't accept. From what I understand, cb_kwargs is new in Scrapy version 1.7, so you should check again if ScrapyD in your case is working with a version of Scrapy >= 1.7. Alternatively, to pass data to your callback, you could use Request's meta attribute.

            Source https://stackoverflow.com/questions/61887797

            QUESTION

            Scrapyd: How to retrieve spiders or version of a scrapyd project?
            Asked 2020-May-09 at 07:53

            It apears that either the documentation of scrapyd is wrong or that there is a bug. I want to retrieve the list of spiders from a deployed project. the docs tell me to do it this way:

            ...

            ANSWER

            Answered 2020-May-09 at 07:53

            Maybe the url needs to be wrapped in double-quotes, Try

            Source https://stackoverflow.com/questions/61692350

            QUESTION

            scrapyd-deploy with "deploy failed (400)"
            Asked 2020-May-08 at 05:47

            I am trying to deploy with scrapyd-deploy to a remote scrapyd server, which failes without error message:

            ...

            ANSWER

            Answered 2020-May-08 at 05:47

            Moved from comment to answer:
            In a similar sounding issue with the same error-code the problem was that Twisted versions earlier than 18.9 don't support python-3.7. If you are using python-3.7 and your twisted version is below 18.9, try upgrading twisted to at least version 18.9:

            Source https://stackoverflow.com/questions/61666296

            QUESTION

            Scrapyd-Deploy: Errors due to using os path to set directory
            Asked 2020-May-06 at 05:06

            I am trying to deploy a scrapy project via scrapyd-deploy to a remote scrapyd server. The project itself is functional and works perfectly on my local machine and on the remote server when I deploy it via git push prod to the remote server.

            With scrapyd-deploy I get this error:

            % scrapyd-deploy example -p apo

            ...

            ANSWER

            Answered 2020-May-06 at 05:06
            • os.mkdir might be failing because it cannot create nested directories. You can use os.makedirs(img_dir, exist_ok=True) instead.
            • os.path.dirname(__file__) will point to /tmp/... under scrapyd. Not sure if this is what you want. I would use an absolute path without calling os.path.dirname(__file__) if you don't want images to be downloaded under /tmp.

            Source https://stackoverflow.com/questions/61620407

            QUESTION

            Scrapyd: No active project - How to schedule spiders with scrapyd
            Asked 2020-May-05 at 04:02

            I am trying to schedule a scrapy 2.1.0 spider with the help of scrapyd 1.2

            ...

            ANSWER

            Answered 2020-May-05 at 04:02

            Before you can launch your spider with scrapyd, you'll have to deploy your spider first. You can do this by:

            Source https://stackoverflow.com/questions/61602283

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install scrapyd

            You can install using 'pip install scrapyd' or download it from GitHub, PyPI.
            You can use scrapyd like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            Install
          • PyPI

            pip install scrapyd

          • CLONE
          • HTTPS

            https://github.com/scrapy/scrapyd.git

          • CLI

            gh repo clone scrapy/scrapyd

          • sshUrl

            git@github.com:scrapy/scrapyd.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link