Building web scraping automation with Python

by Hans

Automate repetitive web data processing tasks with these Python libraries.

Use the open source, cloud APIs, or public libraries listed below in your application development based on your technology preferences, such as primary language. The below list also provides a view of the components' rating on different dimensions such as community support availability, security vulnerability, and overall quality, helping you make an informed choice for implementation and maintenance of your application. Please review the components carefully, having a no license alert or proprietary license, and use them appropriately in your applications. Please check the component page for the exact license of the component. You can also get information on the component's features, installation steps, top code snippets, and top community discussions on the component details page. The links to package managers are listed for download, where packages are readily available. Otherwise, build from the respective repositories for use in your application. You can also use the source code from the repositories in your applications based on the respective license types.

Working with HTTP to request a web page

r

requestsby psf

A simple, yet elegant HTTP library.

Python Updated: 3 mo ago License: Permissive

Support
Quality
Security
License
Reuse
g

grequestsby spyoungtech

Requests + Gevent = <3

Python Updated: 6 mo ago License: Permissive

Support
Quality
Security
License
Reuse
h

httplib2by httplib2

Small, fast HTTP client library for Python. Features persistent connections, cache, and Google App Engine support. Originally written by Joe Gregorio, now supported by community.

Python Updated: 1 mo ago License: Proprietary

Support
Quality
Security
License
Reuse

Complete web scraping framework

s

scrapyby scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

Python Updated: 3 mo ago License: Proprietary

Support
Quality
Security
License
Reuse

Parsing HTML, XML

B

BeautifulSoup4by il-vladislav

BeautifulSoup 4 for Python 3.3

Python Updated: 7 mo ago License: No License

Support
Quality
Security
License
Reuse
l

lxmlby lxml

The lxml XML toolkit for Python

Python Updated: 5 d ago License: Permissive

Support
Quality
Security
License
Reuse
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from
over 430 million Knowledge Items