Python-Tutorials | Python Bootcamp tutorials , with HTML versions | Learning library
kandi X-RAY | Python-Tutorials Summary
kandi X-RAY | Python-Tutorials Summary
Authors: Chris Burns, Shannon Patel and Amber (Carnegie Observatories). Just a safe place to put our ipython notebook tutorials for the summer bootcamp held at the Carnegie Observatories. The notebooks make reference to the data files that are also located in this repository. There are also some sample scripts that can be useful seeing how to do some of the exercises.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of Python-Tutorials
Python-Tutorials Key Features
Python-Tutorials Examples and Code Snippets
Community Discussions
Trending Discussions on Python-Tutorials
QUESTION
I am writing a project from the Automate The Boring Stuff book. The task is the following:
Image Site Downloader
Write a program that goes to a photo-sharing site like Flickr or Imgur, searches for a category of photos, and then downloads all the resulting images. You could write a program that works with any photo site that has a search feature.
Here is my code:
...ANSWER
Answered 2020-Jul-26 at 11:34First off - scraping 4 million results from a website like Flicker is likely to be unethical. Web scrapers should do their best to respect the website from which they are scraping by minimizing their load on servers. 4 million requests in a short amount of time is likely to get your IP banned. If you used proxies you could get around this but again - highly unethical. You also run into the risk of copyright issues since a lot of the images on flicker are subject to copyright.
If you were to go about doing this you would have to use Scrapy and possibly a Scrapy-Selenium combo. Scrapy is great for running concurrent requests meaning you can request a large number of images at the same time. You can learn more about Scrapy here:https://docs.scrapy.org/en/latest/
The workflow would look something like this:
- Scrapy makes a request to the website for the html - parse through it to find all tags with class='overlay no-outline'
- Scrapy makes a request to each url concurrently. This means that the urls won't be followed one by one but instead side by side.
- As the images are returned they get added to your database/storage space
- Scrapy (maybe Selenium) scrolls the infinitely scrolling page and repeats without iterating over already checked images (keep index of last scanned item).
This is what Scrapy would entail but I strongly recommend not attempting to scrape 4 million elements. You would probably find that the performance issues you run into would not be worth your time especially since this is supposed to be a learning experience and you will likely never have to scrape that many elements.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install Python-Tutorials
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page