scrapa | Python 3 AsyncIO powered scraping framework with batteries
kandi X-RAY | scrapa Summary
kandi X-RAY | scrapa Summary
Python 3 AsyncIO powered scraping framework with batteries included
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of scrapa
scrapa Key Features
scrapa Examples and Code Snippets
Community Discussions
Trending Discussions on scrapa
QUESTION
I am basically trying to start an HTTP server which will respond with content from a website which I can crawl using Scrapy. In order to start crawling the website I need to login to it and to do so I need to access a DB with credentials and such. The main issue here is that I need everything to be fully asynchronous and so far I am struggling to find a combination that will make everything work properly without many sloppy implementations.
I already got Klein + Scrapy working but when I get to implementing DB accesses I get all messed up in my head. Is there any way to make PyMongo asynchronous with twisted or something (yes, I have seen TxMongo but the documentation is quite bad and I would like to avoid it. I have also found an implementation with adbapi but I would like something more similar to PyMongo).
Trying to think things through the other way around I'm sure aiohttp has many more options to implement async
db accesses and stuff but then I find myself at an impasse with Scrapy integration.
I have seen things like scrapa, scrapyd and ScrapyRT but those don't really work for me. Are there any other options?
Finally, if nothing works, I'll just use aiohttp and instead of Scrapy I'll do the requests to the websito to scrap manually and use beautifulsoup or something like that to get the info I need from the response. Any advice on how to proceed down that road?
Thanks for your attention, I'm quite a noob in this area so I don't know if I'm making complete sense. Regardless, any help will be appreciated :)
...ANSWER
Answered 2018-Jul-25 at 18:55Is there any way to make pymongo asynchronous with twisted
No. pymongo is designed as a synchronous library, and there is no way you can make it asynchronous without basically rewriting it (you could use threads or processes, but that is not what you asked, also you can run into issues with thread-safeness of the code).
Trying to think things through the other way around I'm sure aiohttp has many more options to implement async db accesses and stuff
It doesn't. aiohttp
is a http library - it can do http asynchronously and that is all, it has nothing to help you access databases. You'd have to basically rewrite pymongo on top of it.
Finally, if nothing works, I'll just use aiohttp and instead of scrapy I'll do the requests to the websito to scrap manually and use beautifulsoup or something like that to get the info I need from the response.
That means lots of work for not using scrapy, and it won't help you with the pymongo issue - you still have to rewrite pymongo!
My suggestion is - learn txmongo
! If you can't and want to rewrite it, use twisted.web
to write it instead of aiohttp
since then you can continue using scrapy
!
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install scrapa
No Installation instructions are available at this moment for scrapa.Refer to component home page for details.
Support
If you have any questions vist the community on GitHub, Stack Overflow.
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page