GoodreadsScraper | Scrape data from Goodreads using Scrapy and Selenium books | Crawler library
kandi X-RAY | GoodreadsScraper Summary
kandi X-RAY | GoodreadsScraper Summary
This is a Python + Scrapy (+ Selenium) based web crawler that fetches book and author data from Goodreads. This can be used for collecting a large data set in a short period of time, for a data analysis/visualization project. With appropriate controls, the crawler can collect metadata for ~50 books per minute (~3000 per hour). If you want to be more aggressive (at the risk of getting your IP blocked by Goodreads), you can set the DOWNLOAD_DELAY to a smaller value in settings.py, but this is not recommended.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Get a list of books
- Crawl a spider
- Add a task to the progress bar
- Extract dates from maybe_dates
- Parse date
- Try to get Amazon Kindle BookDetails
- Gets an Amazon BookDetail for a given URL
- Parse a BeautifulSoup response
- Parse the author
- Get Amazon Kindle BookDetail
- Parse command line arguments
- Scrawl your books
- Scrawl for all authors
- Convert publish_date to date
- One - hot encode genres
- Creates a chrome browser
- Crawl crawl
- Replace missing list columns
GoodreadsScraper Key Features
GoodreadsScraper Examples and Code Snippets
Community Discussions
Trending Discussions on GoodreadsScraper
QUESTION
I am trying to get a python selenium script working that should do the following:
...
Take text file, BookTitle.txt that is a list of Book Titles.
Using Python/Selenium then searches the site, GoodReads.com for that title.
Takes the URL for the result and makes a new .CSV file with column 1=book title and column 2=Site URL
I hope that we can get this working, then please help me with step by step to get it to run.
ANSWER
Answered 2019-Sep-08 at 21:57There are couple of errors I cansee for now:
1) you have to uncomment chrome options and comment firefox' as you're passing the chromedriver later in code
QUESTION
Trying to do some web scraping with python and getting an error.
I am not sure what this trackback error means, I am running it in Python3, can anyone help?
Traceback (most recent call last): File "/home/l/gDrive/AudioBookReviews/WebScraping/GoodreadsScraper.py", line 3, in reload(sys) NameError: name 'reload' is not defined
...ANSWER
Answered 2019-Sep-03 at 00:45reload is not supported in Python3 anymore
You should remove these lines
QUESTION
I am trying to run the following script to get some book data from goodreads.com starting with just a list of titles. I have had this code working recently but am now getting the following error:
...ANSWER
Answered 2019-Sep-01 at 01:47as the error states [Errno 2] No such file or directory
Could be permissions or your path is wrong.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install GoodreadsScraper
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page