scrapy-redis | Redis-based components for Scrapy | Crawler library

 by   rmax Python Version: 0.7.3 License: MIT

kandi X-RAY | scrapy-redis Summary

kandi X-RAY | scrapy-redis Summary

scrapy-redis is a Python library typically used in Automation, Crawler applications. scrapy-redis has no bugs, it has no vulnerabilities, it has build file available, it has a Permissive License and it has medium support. You can download it from GitHub.

Redis-based components for Scrapy.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              scrapy-redis has a medium active ecosystem.
              It has 5279 star(s) with 1578 fork(s). There are 277 watchers for this library.
              OutlinedDot
              It had no major release in the last 12 months.
              There are 40 open issues and 141 have been closed. On average issues are closed in 600 days. There are 1 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of scrapy-redis is 0.7.3

            kandi-Quality Quality

              scrapy-redis has 0 bugs and 0 code smells.

            kandi-Security Security

              scrapy-redis has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              scrapy-redis code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              scrapy-redis is licensed under the MIT License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              scrapy-redis releases are available to install and integrate.
              Build file is available. You can build the component from source.
              scrapy-redis saves you 473 person hours of effort in developing the same functionality from scratch.
              It has 1203 lines of code, 144 functions and 30 files.
              It has medium code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi has reviewed scrapy-redis and discovered the below as its top functions. This is intended to give you an instant insight into scrapy-redis implemented functionality, and help decide if they suit your requirements.
            • Creates a new DUPE filter from the given settings
            • Return a redis instance from the given settings
            • Return an instance of the Redis client
            • Get stats for a given spider
            • Generate a key for the stats key
            • Convert bytes to str
            • Create an object from a crawler
            • Setup redis connection
            • Serialize an item
            • Return the key for a spider
            • Push a request to the queue
            • Encode the request
            • Remove the item from the queue
            • Decode a request
            • Create a RedisClient from a spider
            • Close the stream
            • Remove an item from the server
            • Remove an item from the queue
            • Read rst file
            • Close a spider
            • Return an instance of redis client
            • Process items from the Redis queue
            Get all kandi verified functions for this library.

            scrapy-redis Key Features

            No Key Features are available at this moment for scrapy-redis.

            scrapy-redis Examples and Code Snippets

            使用
            Pythondot img1Lines of Code : 34dot img1no licencesLicense : No License
            copy iconCopy
            $ git clone https://github.com/KDF5000/RSpider.git
            
            # 修改scrapy默认的调度器为scrapy重写的调度器 启动从reids缓存读取队列调度爬虫
            SCHEDULER = "scrapy_redis.scheduler.Scheduler"
            
            # 调度状态持久化,不清理redis缓存,允许暂停/启动爬虫
            SCHEDULER_PERSIST = True
            
            # 请求调度使用优先队列(默认)
            #SCHEDULER_QUEUE_CLASS = 's  
            运行环境:
            Pythondot img2Lines of Code : 28dot img2no licencesLicense : No License
            copy iconCopy
            CREATE TABLE `house` (
              `id` int(11) NOT NULL AUTO_INCREMENT,
              `name` varchar(50) DEFAULT NULL,
              `price` varchar(50) DEFAULT NULL,
              `open_date` varchar(50) DEFAULT NULL,
              `address` varchar(255) DEFAULT NULL,
              `lon_lat` varchar(50) DEFAULT NULL,  
            快速开始
            Pythondot img3Lines of Code : 24dot img3no licencesLicense : No License
            copy iconCopy
            python3 -m pip install scrapy-redis-expiredupefilter
            
            # 使用支持 TTL DUPEFILTER 调度器
            SCHEDULER = 'scrapy_redis_expiredupefilter.scheduler.Scheduler'
            # 带有 TTL 的 DUPEFILTER
            DUPEFILTER_CLASS = 'scrapy_redis_expiredupefilter.dupefilter.RFPDupeFilter'
            # REDIS连  

            Community Discussions

            QUESTION

            Where should I bind the db/redis connection to on scrapy?
            Asked 2020-Jul-14 at 05:02

            Sorry to disturb you guys. This is bad question, seems what really confused me is how ItemPipeline works in scrapy. I'll close it and start a new question.

            Where should I bind the db/redis connection to on scrapy, Spider or Pipeline.

            In the scrapy document, mongo db connection is bind on Pipeline. But it could be also be bound to the Spider(It's also what extension scrapy-redis does). The later solution brings the benefit that the spider is accessible in more places besides pipeline, like middlewares.

            So, which is the better way to do it?

            I'm confused about that pipelines are run in parallel (this is what the doc says). Does it mean there're multiple instances of MyCustomPipeline?

            Besides, connection pool of redis/db is preferred?

            I just lack the field experience to make the decision. Need your help. Thanks in advance.

            As the doc says, ItemPipeline is run in parallel. How? Are there duplicate instances of the ItemPipeline run in threads. (I noticed FilesPipeline uses deferred thread to save files into s3). Or there's only one instance of each pipeline and runs in the main event loop. If it's the later case, the connection pool doesn't seems to help. Cause when you use a redis connection, it's blocked. Only one connection could be used at the same time.

            ...

            ANSWER

            Answered 2020-Jul-11 at 13:07

            The best practice would be to bind the connection in the pipelines, in order to follow with the separation of concerns principle.

            Scrapy uses the same parallelism infrastructure for executing requests and processing items, as your spider yields items, scrapy will call the process_item method from the pipeline instance. Check it here.

            A single instance of every pipeline is instantiated during the spider instantiation.

            Besides, connection pool of redis/db is preferred?

            Sorry, don't think I can help with this one.

            Source https://stackoverflow.com/questions/62839567

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install scrapy-redis

            You can download it from GitHub.
            You can use scrapy-redis like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries

            Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link

            Explore Related Topics

            Consider Popular Crawler Libraries

            scrapy

            by scrapy

            cheerio

            by cheeriojs

            winston

            by winstonjs

            pyspider

            by binux

            colly

            by gocolly

            Try Top Libraries by rmax

            dirbot-mysql

            by rmaxPython

            django-dummyimage

            by rmaxPython

            scrapy-boilerplate

            by rmaxPython

            scrapydo

            by rmaxJupyter Notebook