ershoufang | use scrapy crawl the second-hand house price | Crawler library
kandi X-RAY | ershoufang Summary
kandi X-RAY | ershoufang Summary
This project can help you crawl second-hand houses data from (lianjia.com)[lianjia.com]. This project is a scrapy project, Scrapy is an opensource crawling framework, it's pretty to use. And I use xpath to resolve second-hand houses data from html souce code. Before you run my code, you must install scrapy on your machine, you can use pip install it very quickly. pip install scrapy. Use this command to start it, it will export data into CSV format. scrapy crawl ershoufang -o item.csv. ershoufang is the name of spider. Actually, there are three spiders in this project ershoufang yanjiao and ershoufanghz, they correspond to the second-hand houses data spider of Beijing, Beijing Yanjiao, and Hangzhou. You can also create your own spider, just need to change the URL in the file ershoufang/spider/ershoufang.py line 8. In case of be banned by lianjian, I configured somethings in settings.py. eg. disable cookie, change the bot_name, crawl only one time each minute, use different use_agent(I commented it out), etc. I don't use proxy, because I don't have proxy IP source. 在你开始运行代码前,你必须先在你电脑上安装scrapy,用pip命令很快就可以装好。 pip install scrapy. 用这条命令启动代码,把数据导出成csv格式。 scrapy crawl ershoufang -o item.csv. 为了防止被链家禁掉,我在settings.py 做了一些配置,比如禁掉cookie,改了bot_name, 每分钟只爬一次,用不同的use_agent(配置被我注释掉了)…… 我没用proxy,因为我没proxy ip资源。.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Parse the meteostation response .
- Return a generator of start requests .
- Process response results .
- Called when an exception is raised .
- Adds a random user - agent to the request .
- Process an item .
- Initialize client .
ershoufang Key Features
ershoufang Examples and Code Snippets
Community Discussions
Trending Discussions on ershoufang
QUESTION
Given a link from here:
I would like to loop all the counties and then all commercial districts, the save them as a txt file with json format as follows:
...ANSWER
Answered 2021-Mar-28 at 16:10Is this what you want?
QUESTION
When executing the first yield it will not go into the function parse_url and when executing the second yield it will not go back the function parse and it just end. During the whole process, there are no exceptions. I don't know how to deal with this problem, I need help.
...ANSWER
Answered 2017-Jun-24 at 12:49If you carefully looked at the logs then you might have noticed that scrapy
filtered offsite domain requests. This means when scrapy
tried to ping short.58.com
and jxjump.58.com
, it did not follow through. You can add those domains to the allowed_domains
filter in your Spider class and you will see the requests being sent.
Replace:
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install ershoufang
You can use ershoufang like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page