scrapa | Python 3 AsyncIO powered scraping framework with batteries

 by   stefanw Python Version: Current License: MIT

kandi X-RAY | scrapa Summary

kandi X-RAY | scrapa Summary

null

Python 3 AsyncIO powered scraping framework with batteries included
Support
    Quality
      Security
        License
          Reuse

            Top functions reviewed by kandi - BETA

            kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
            Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of scrapa
            Get all kandi verified functions for this library.

            scrapa Key Features

            No Key Features are available at this moment for scrapa.

            scrapa Examples and Code Snippets

            No Code Snippets are available at this moment for scrapa.

            Community Discussions

            QUESTION

            Async HTTP server with scrapy and mongodb in python
            Asked 2018-Jul-26 at 03:46

            I am basically trying to start an HTTP server which will respond with content from a website which I can crawl using Scrapy. In order to start crawling the website I need to login to it and to do so I need to access a DB with credentials and such. The main issue here is that I need everything to be fully asynchronous and so far I am struggling to find a combination that will make everything work properly without many sloppy implementations.

            I already got Klein + Scrapy working but when I get to implementing DB accesses I get all messed up in my head. Is there any way to make PyMongo asynchronous with twisted or something (yes, I have seen TxMongo but the documentation is quite bad and I would like to avoid it. I have also found an implementation with adbapi but I would like something more similar to PyMongo).

            Trying to think things through the other way around I'm sure aiohttp has many more options to implement async db accesses and stuff but then I find myself at an impasse with Scrapy integration.

            I have seen things like scrapa, scrapyd and ScrapyRT but those don't really work for me. Are there any other options?

            Finally, if nothing works, I'll just use aiohttp and instead of Scrapy I'll do the requests to the websito to scrap manually and use beautifulsoup or something like that to get the info I need from the response. Any advice on how to proceed down that road?

            Thanks for your attention, I'm quite a noob in this area so I don't know if I'm making complete sense. Regardless, any help will be appreciated :)

            ...

            ANSWER

            Answered 2018-Jul-25 at 18:55

            Is there any way to make pymongo asynchronous with twisted

            No. pymongo is designed as a synchronous library, and there is no way you can make it asynchronous without basically rewriting it (you could use threads or processes, but that is not what you asked, also you can run into issues with thread-safeness of the code).

            Trying to think things through the other way around I'm sure aiohttp has many more options to implement async db accesses and stuff

            It doesn't. aiohttp is a http library - it can do http asynchronously and that is all, it has nothing to help you access databases. You'd have to basically rewrite pymongo on top of it.

            Finally, if nothing works, I'll just use aiohttp and instead of scrapy I'll do the requests to the websito to scrap manually and use beautifulsoup or something like that to get the info I need from the response.

            That means lots of work for not using scrapy, and it won't help you with the pymongo issue - you still have to rewrite pymongo!

            My suggestion is - learn txmongo! If you can't and want to rewrite it, use twisted.web to write it instead of aiohttp since then you can continue using scrapy!

            Source https://stackoverflow.com/questions/51525645

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install scrapa

            No Installation instructions are available at this moment for scrapa.Refer to component home page for details.

            Support

            For feature suggestions, bugs create an issue on GitHub
            If you have any questions vist the community on GitHub, Stack Overflow.
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • sshUrl

            git@github.com:stefanw/scrapa.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link