python-tutorials | Blog posts on Python | Architecture library
kandi X-RAY | python-tutorials Summary
kandi X-RAY | python-tutorials Summary
threads: Python threads synchronization: Locks, RLocks, Semaphores, Conditions, Events and Queues.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- main function for fetching urls
- Initialize threading .
- Run the worker .
python-tutorials Key Features
python-tutorials Examples and Code Snippets
Community Discussions
Trending Discussions on python-tutorials
QUESTION
I am writing a project from the Automate The Boring Stuff book. The task is the following:
Image Site Downloader
Write a program that goes to a photo-sharing site like Flickr or Imgur, searches for a category of photos, and then downloads all the resulting images. You could write a program that works with any photo site that has a search feature.
Here is my code:
...ANSWER
Answered 2020-Jul-26 at 11:34First off - scraping 4 million results from a website like Flicker is likely to be unethical. Web scrapers should do their best to respect the website from which they are scraping by minimizing their load on servers. 4 million requests in a short amount of time is likely to get your IP banned. If you used proxies you could get around this but again - highly unethical. You also run into the risk of copyright issues since a lot of the images on flicker are subject to copyright.
If you were to go about doing this you would have to use Scrapy and possibly a Scrapy-Selenium combo. Scrapy is great for running concurrent requests meaning you can request a large number of images at the same time. You can learn more about Scrapy here:https://docs.scrapy.org/en/latest/
The workflow would look something like this:
- Scrapy makes a request to the website for the html - parse through it to find all tags with class='overlay no-outline'
- Scrapy makes a request to each url concurrently. This means that the urls won't be followed one by one but instead side by side.
- As the images are returned they get added to your database/storage space
- Scrapy (maybe Selenium) scrolls the infinitely scrolling page and repeats without iterating over already checked images (keep index of last scanned item).
This is what Scrapy would entail but I strongly recommend not attempting to scrape 4 million elements. You would probably find that the performance issues you run into would not be worth your time especially since this is supposed to be a learning experience and you will likely never have to scrape that many elements.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install python-tutorials
You can use python-tutorials like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page