scrapy-redis | Redis-based components for Scrapy | Crawler library

by rmax Python Version: 0.7.3 License: MIT

X-Ray Key Features Code Snippets(3)Community Discussions(1)Vulnerabilities Install Support

kandi X-RAY | scrapy-redis Summary

scrapy-redis is a Python library typically used in Automation, Crawler applications. scrapy-redis has no bugs, it has no vulnerabilities, it has build file available, it has a Permissive License and it has medium support. You can download it from GitHub.

Redis-based components for Scrapy.

Support

Quality

Security

License

Reuse

Support

scrapy-redis has a medium active ecosystem.

It has 5279 star(s) with 1578 fork(s). There are 277 watchers for this library.

It had no major release in the last 12 months.

There are 40 open issues and 141 have been closed. On average issues are closed in 600 days. There are 1 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of scrapy-redis is 0.7.3

Quality

scrapy-redis has 0 bugs and 0 code smells.

Security

scrapy-redis has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

scrapy-redis code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

scrapy-redis is licensed under the MIT License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

scrapy-redis releases are available to install and integrate.

Build file is available. You can build the component from source.

scrapy-redis saves you 473 person hours of effort in developing the same functionality from scratch.

It has 1203 lines of code, 144 functions and 30 files.

It has medium code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi has reviewed scrapy-redis and discovered the below as its top functions. This is intended to give you an instant insight into scrapy-redis implemented functionality, and help decide if they suit your requirements.

Creates a new DUPE filter from the given settings
Return a redis instance from the given settings
Return an instance of the Redis client
Get stats for a given spider
Generate a key for the stats key
Convert bytes to str
Create an object from a crawler
Setup redis connection
Serialize an item
Return the key for a spider
Push a request to the queue
Encode the request
Remove the item from the queue
Decode a request
Create a RedisClient from a spider
Close the stream
Remove an item from the server
Remove an item from the queue
Read rst file
Close a spider
Return an instance of redis client
Process items from the Redis queue

Get all kandi verified functions for this library.

scrapy-redis Key Features

No Key Features are available at this moment for scrapy-redis.

scrapy-redis Examples and Code Snippets

使用

Python

Lines of Code : 34

License : No License

Copy

$ git clone https://github.com/KDF5000/RSpider.git

# 修改scrapy默认的调度器为scrapy重写的调度器 启动从reids缓存读取队列调度爬虫
SCHEDULER = "scrapy_redis.scheduler.Scheduler"

# 调度状态持久化，不清理redis缓存，允许暂停/启动爬虫
SCHEDULER_PERSIST = True

# 请求调度使用优先队列（默认)
#SCHEDULER_QUEUE_CLASS = 's

运行环境:

Python

Lines of Code : 28

License : No License

Copy

CREATE TABLE `house` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(50) DEFAULT NULL,
  `price` varchar(50) DEFAULT NULL,
  `open_date` varchar(50) DEFAULT NULL,
  `address` varchar(255) DEFAULT NULL,
  `lon_lat` varchar(50) DEFAULT NULL,

快速开始

Python

Lines of Code : 24

License : No License

Copy

python3 -m pip install scrapy-redis-expiredupefilter

# 使用支持 TTL DUPEFILTER 调度器
SCHEDULER = 'scrapy_redis_expiredupefilter.scheduler.Scheduler'
# 带有 TTL 的 DUPEFILTER
DUPEFILTER_CLASS = 'scrapy_redis_expiredupefilter.dupefilter.RFPDupeFilter'
# REDIS连

Community Discussions

Trending Discussions on scrapy-redis

Where should I bind the db/redis connection to on scrapy?

QUESTION

Where should I bind the db/redis connection to on scrapy?

Asked 2020-Jul-14 at 05:02

Sorry to disturb you guys. This is bad question, seems what really confused me is how ItemPipeline works in scrapy. I'll close it and start a new question.

Where should I bind the db/redis connection to on scrapy, Spider or Pipeline.

In the scrapy document, mongo db connection is bind on Pipeline. But it could be also be bound to the Spider(It's also what extension scrapy-redis does). The later solution brings the benefit that the spider is accessible in more places besides pipeline, like middlewares.

So, which is the better way to do it?

I'm confused about that pipelines are run in parallel (this is what the doc says). Does it mean there're multiple instances of MyCustomPipeline?

Besides, connection pool of redis/db is preferred?

I just lack the field experience to make the decision. Need your help. Thanks in advance.

As the doc says, ItemPipeline is run in parallel. How? Are there duplicate instances of the ItemPipeline run in threads. (I noticed FilesPipeline uses deferred thread to save files into s3). Or there's only one instance of each pipeline and runs in the main event loop. If it's the later case, the connection pool doesn't seems to help. Cause when you use a redis connection, it's blocked. Only one connection could be used at the same time.

...

ANSWER

Answered 2020-Jul-11 at 13:07

The best practice would be to bind the connection in the pipelines, in order to follow with the separation of concerns principle.

Scrapy uses the same parallelism infrastructure for executing requests and processing items, as your spider yields items, scrapy will call the process_item method from the pipeline instance. Check it here.

A single instance of every pipeline is instantiated during the spider instantiation.

Besides, connection pool of redis/db is preferred?

Sorry, don't think I can help with this one.

Source https://stackoverflow.com/questions/62839567

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install scrapy-redis

You can download it from GitHub.
You can use scrapy-redis like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: