GoodreadsScraper | Scrape data from Goodreads using Scrapy and Selenium books | Crawler library

 by   havanagrawal Python Version: Current License: MIT

kandi X-RAY | GoodreadsScraper Summary

kandi X-RAY | GoodreadsScraper Summary

GoodreadsScraper is a Python library typically used in Automation, Crawler, Selenium applications. GoodreadsScraper has no bugs, it has no vulnerabilities, it has build file available, it has a Permissive License and it has low support. You can download it from GitHub.

This is a Python + Scrapy (+ Selenium) based web crawler that fetches book and author data from Goodreads. This can be used for collecting a large data set in a short period of time, for a data analysis/visualization project. With appropriate controls, the crawler can collect metadata for ~50 books per minute (~3000 per hour). If you want to be more aggressive (at the risk of getting your IP blocked by Goodreads), you can set the DOWNLOAD_DELAY to a smaller value in settings.py, but this is not recommended.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              GoodreadsScraper has a low active ecosystem.
              It has 82 star(s) with 19 fork(s). There are 1 watchers for this library.
              OutlinedDot
              It had no major release in the last 6 months.
              There are 2 open issues and 6 have been closed. On average issues are closed in 44 days. There are 1 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of GoodreadsScraper is current.

            kandi-Quality Quality

              GoodreadsScraper has 0 bugs and 0 code smells.

            kandi-Security Security

              GoodreadsScraper has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              GoodreadsScraper code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              GoodreadsScraper is licensed under the MIT License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              GoodreadsScraper releases are not available. You will need to build from source code and install.
              Build file is available. You can build the component from source.
              Installation instructions, examples and code snippets are available.
              It has 537 lines of code, 52 functions and 14 files.
              It has medium code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi has reviewed GoodreadsScraper and discovered the below as its top functions. This is intended to give you an instant insight into GoodreadsScraper implemented functionality, and help decide if they suit your requirements.
            • Get a list of books
            • Crawl a spider
            • Add a task to the progress bar
            • Extract dates from maybe_dates
            • Parse date
            • Try to get Amazon Kindle BookDetails
            • Gets an Amazon BookDetail for a given URL
            • Parse a BeautifulSoup response
            • Parse the author
            • Get Amazon Kindle BookDetail
            • Parse command line arguments
            • Scrawl your books
            • Scrawl for all authors
            • Convert publish_date to date
            • One - hot encode genres
            • Creates a chrome browser
            • Crawl crawl
            • Replace missing list columns
            Get all kandi verified functions for this library.

            GoodreadsScraper Key Features

            No Key Features are available at this moment for GoodreadsScraper.

            GoodreadsScraper Examples and Code Snippets

            No Code Snippets are available at this moment for GoodreadsScraper.

            Community Discussions

            QUESTION

            Scraping URLs with Python and selenium
            Asked 2019-Sep-08 at 22:21

            I am trying to get a python selenium script working that should do the following:

            1. Take text file, BookTitle.txt that is a list of Book Titles.

            2. Using Python/Selenium then searches the site, GoodReads.com for that title.

            3. Takes the URL for the result and makes a new .CSV file with column 1=book title and column 2=Site URL

            4. I hope that we can get this working, then please help me with step by step to get it to run.

            ...

            ANSWER

            Answered 2019-Sep-08 at 21:57

            There are couple of errors I cansee for now:

            1) you have to uncomment chrome options and comment firefox' as you're passing the chromedriver later in code

            Source https://stackoverflow.com/questions/57845827

            QUESTION

            Web scraping python error (NameError: name 'reload' is not defined)
            Asked 2019-Sep-03 at 00:45

            Trying to do some web scraping with python and getting an error.

            I am not sure what this trackback error means, I am running it in Python3, can anyone help?

            Traceback (most recent call last): File "/home/l/gDrive/AudioBookReviews/WebScraping/GoodreadsScraper.py", line 3, in reload(sys) NameError: name 'reload' is not defined

            ...

            ANSWER

            Answered 2019-Sep-03 at 00:45

            reload is not supported in Python3 anymore

            You should remove these lines

            Source https://stackoverflow.com/questions/57742724

            QUESTION

            Python web scraping, getting a FileNotFound error
            Asked 2019-Sep-01 at 11:21

            I am trying to run the following script to get some book data from goodreads.com starting with just a list of titles. I have had this code working recently but am now getting the following error:

            ...

            ANSWER

            Answered 2019-Sep-01 at 01:47

            as the error states [Errno 2] No such file or directory

            Could be permissions or your path is wrong.

            Source https://stackoverflow.com/questions/57742671

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install GoodreadsScraper

            For crawling, install requirements.txt.

            Support

            Fixes and improvements are more than welcome, so raise an issue or send a PR!.
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/havanagrawal/GoodreadsScraper.git

          • CLI

            gh repo clone havanagrawal/GoodreadsScraper

          • sshUrl

            git@github.com:havanagrawal/GoodreadsScraper.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link

            Explore Related Topics

            Consider Popular Crawler Libraries

            scrapy

            by scrapy

            cheerio

            by cheeriojs

            winston

            by winstonjs

            pyspider

            by binux

            colly

            by gocolly

            Try Top Libraries by havanagrawal

            c2c2017

            by havanagrawalJava

            clomask

            by havanagrawalJupyter Notebook

            wikidata-toolkit

            by havanagrawalPython

            learning-ml

            by havanagrawalJupyter Notebook

            triovision

            by havanagrawalJava