:arrow_double_down: Dumb downloader that scrapes the web
Support
Quality
Security
License
Reuse
An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.
Support
Quality
Security
License
Reuse
Pythonic HTML Parsing for Humans™
Support
Quality
Security
License
Reuse
News, full-text, and article metadata extraction in Python 3. Advanced docs:
Support
Quality
Security
License
Reuse
Goutte, a simple PHP Web Scraper
Support
Quality
Security
License
Reuse
Visual scraping for Scrapy
Support
Quality
Security
License
Reuse
Download pictures (or videos) along with their captions and other metadata from Instagram.
Support
Quality
Security
License
Reuse
Scrapes an instagram user's photos and videos
Support
Quality
Security
License
Reuse
The next web scraper. See through the <html> noise.
Support
Quality
Security
License
Reuse
Declarative web scraping
Support
Quality
Security
License
Reuse
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
Support
Quality
Security
License
Reuse
Code samples from the book Web Scraping with Python http://shop.oreilly.com/product/0636920034391.do
Support
Quality
Security
License
Reuse
🔮 A Node.js scraper for humans.
Support
Quality
Security
License
Reuse
Html Content / Article Extractor, web scrapping lib in Python
Support
Quality
Security
License
Reuse
Scrape all the media from an OnlyFans account - Updated regularly
Support
Quality
Security
License
Reuse
A Python module to bypass Cloudflare's anti-bot page.
Support
Quality
Security
License
Reuse
Up-to-date simple useragent faker with real world database
Support
Quality
Security
License
Reuse
Scrapy+Splash for JavaScript integration
Support
Quality
Security
License
Reuse
Emby/Jellyfin 的一个日本电影刮削器插件,可以从某些网站抓取影片信息。
Support
Quality
Security
License
Reuse
A powerful browser crawler for web vulnerability scanners
Support
Quality
Security
License
Reuse
Getting started with Puppeteer and Chrome Headless for Data Mining
Support
Quality
Security
License
Reuse
scrapes medias, likes, followers, tags and all metadata. Inspired by instagram-php-scraper,bot
Support
Quality
Security
License
Reuse
JSFinder is a tool for quickly extracting URLs and subdomains from JS files on a website.
Support
Quality
Security
License
Reuse
Get unified metadata from websites using Open Graph, Microdata, RDFa, Twitter Cards, JSON-LD, HTML, and more.
Support
Quality
Security
License
Reuse
Node.js scraper to get data from Google Play
Support
Quality
Security
License
Reuse
Another API-less Instagram pictures and videos downloader.
Support
Quality
Security
License
Reuse
Scrape Facebook public pages without an API key
Support
Quality
Security
License
Reuse
An OSINT tool to search for accounts by username in social networks.
Support
Quality
Security
License
Reuse
Scrape job websites into a single spreadsheet with no duplicates.
Support
Quality
Security
License
Reuse
Scrapy project to scrape public web directories (educational) [DEPRECATED]
Support
Quality
Security
License
Reuse
news-please - an integrated web crawler and information extractor for news that just works
Support
Quality
Security
License
Reuse
A batteries-included framework for easy web-scraping. Just add CSS! (Or do more.)
Support
Quality
Security
License
Reuse
event website listing to Open Event format scraper and converter
Support
Quality
Security
License
Reuse
Scrapers for loklak in javascript
Support
Quality
Security
License
Reuse
HTML parsing and querying with CSS selectors
Support
Quality
Security
License
Reuse
Simple web scraping for R
Support
Quality
Security
License
Reuse
A high available,high performance distributed messaging system.
Support
Quality
Security
License
Reuse
Lightweight Ruby web crawler/scraper with an elegant DSL which extracts structured data from pages.
Support
Quality
Security
License
Reuse
A web privacy measurement framework
Support
Quality
Security
License
Reuse
w
web-scraper-chrome-extensionby martinsbalodis
JavaScript 
1212
Version:Current
License: Weak Copyleft (LGPL-3.0)
Web data extraction tool implemented as chrome extension
Support
Quality
Security
License
Reuse
s
search-script-scrapeby stanfordjournalism
Python 
1206
Version:Current
License: No License (No License)
101 real world web scraping exercises in Python 3 for data journalists
Support
Quality
Security
License
Reuse
This is a sample Scrapy project for educational purposes
Support
Quality
Security
License
Reuse
A library that scrapes Linkedin for user data
Support
Quality
Security
License
Reuse
Toutatis is a tool that allows you to extract information from instagrams accounts such as e-mails, phone numbers and more
Support
Quality
Security
License
Reuse
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Support
Quality
Security
License
Reuse
artoo.js - the client-side scraping companion.
Support
Quality
Security
License
Reuse
A web-scraping framework written in Javascript, using PhantomJS and jQuery
Support
Quality
Security
License
Reuse
Ruby gem for web scraping purposes. It scrapes a given URL, and returns you its title, meta description, meta keywords, links, images...
Support
Quality
Security
License
Reuse
Information Gathering Instagram.
Support
Quality
Security
License
Reuse
🤖 Scrape data from HTML websites automatically by just providing examples
Support
Quality
Security
License
Reuse
y
you-getby soimort
:arrow_double_down: Dumb downloader that scrapes the web
Python
47551
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
t
twintby twintproject
An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.
Python
15023
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
r
requests-htmlby psf
Pythonic HTML Parsing for Humans™
Python
13156
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
n
newspaperby codelucas
News, full-text, and article metadata extraction in Python 3. Advanced docs:
Python
12865
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
G
Goutteby FriendsOfPHP
Goutte, a simple PHP Web Scraper
PHP
9229
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
p
portiaby scrapinghub
Visual scraping for Scrapy
Python
8890
Updated: 2 y ago
License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
i
instaloaderby instaloader
Download pictures (or videos) along with their captions and other metadata from Instagram.
Python
6040
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
i
instagram-scraperby arc298
Scrapes an instagram user's photos and videos
Python
5727
Updated: 3 y ago
License: Permissive (Unlicense)
Support
Quality
Security
License
Reuse
x
x-rayby matthewmueller
The next web scraper. See through the <html> noise.
JavaScript
5710
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
f
Support
Quality
Security
License
Reuse
a
autoscraperby alirezamika
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
Python
5239
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
p
python-scrapingby REMitchell
Code samples from the book Web Scraping with Python http://shop.oreilly.com/product/0636920034391.do
Jupyter Notebook
3993
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
s
scrape-itby IonicaBizau
🔮 A Node.js scraper for humans.
JavaScript
3917
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
p
python-gooseby grangier
Html Content / Article Extractor, web scrapping lib in Python
HTML
3874
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
O
OnlyFansby DIGITALCRIMINAL
Scrape all the media from an OnlyFans account - Updated regularly
Python
3419
Updated: 2 y ago
License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
c
cloudflare-scrapeby Anorov
A Python module to bypass Cloudflare's anti-bot page.
Python
3074
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
f
fake-useragentby fake-useragent
Up-to-date simple useragent faker with real world database
HTML
3047
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
scrapy-splashby scrapy-plugins
Scrapy+Splash for JavaScript integration
Python
2900
Updated: 2 y ago
License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
E
Emby.Plugins.JavScraperby JavScraper
Emby/Jellyfin 的一个日本电影刮削器插件,可以从某些网站抓取影片信息。
C#
2622
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
c
crawlergoby Qianlitp
A powerful browser crawler for web vulnerability scanners
Go
2474
Updated: 2 y ago
License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
t
thalby emadehsan
Getting started with Puppeteer and Chrome Headless for Data Mining
JavaScript
2362
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
i
instagram-scraperby realsirjoe
scrapes medias, likes, followers, tags and all metadata. Inspired by instagram-php-scraper,bot
Python
2204
Updated: 3 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
J
JSFinderby Threezh1
JSFinder is a tool for quickly extracting URLs and subdomains from JS files on a website.
Python
2091
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
m
metascraperby microlinkhq
Get unified metadata from websites using Open Graph, Microdata, RDFa, Twitter Cards, JSON-LD, HTML, and more.
HTML
2049
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
g
google-play-scraperby facundoolano
Node.js scraper to get data from Google Play
JavaScript
1979
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
I
InstaLooterby althonos
Another API-less Instagram pictures and videos downloader.
Python
1871
Updated: 2 y ago
License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
f
facebook-scraperby kevinzg
Scrape Facebook public pages without an API key
Python
1763
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
b
blackbirdby p1ngul1n0
An OSINT tool to search for accounts by username in social networks.
Python
1697
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
J
JobFunnelby PaulMcInnis
Scrape job websites into a single spreadsheet with no duplicates.
Python
1655
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
d
dirbotby scrapy
Scrapy project to scrape public web directories (educational) [DEPRECATED]
Python
1627
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
n
news-pleaseby fhamborg
news-please - an integrated web crawler and information extractor for news that just works
Python
1626
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
u
uptonby propublica
A batteries-included framework for easy web-scraping. Just add CSS! (Or do more.)
HTML
1613
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
e
event-collectby fossasia
event website listing to Open Event format scraper and converter
Python
1510
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
l
loklak_scraper_jsby fossasia
Scrapers for loklak in javascript
JavaScript
1476
Updated: 2 y ago
License: Weak Copyleft (LGPL-2.1)
Support
Quality
Security
License
Reuse
s
scraperby causal-agent
HTML parsing and querying with CSS selectors
Rust
1407
Updated: 2 y ago
License: Permissive (ISC)
Support
Quality
Security
License
Reuse
r
Support
Quality
Security
License
Reuse
M
Metamorphosisby killme2008
A high available,high performance distributed messaging system.
Java
1320
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
w
wombatby felipecsl
Lightweight Ruby web crawler/scraper with an elegant DSL which extracts structured data from pages.
Ruby
1281
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
O
OpenWPMby openwpm
A web privacy measurement framework
Python
1254
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
w
web-scraper-chrome-extensionby martinsbalodis
Web data extraction tool implemented as chrome extension
JavaScript
1212
Updated: 2 y ago
License: Weak Copyleft (LGPL-3.0)
Support
Quality
Security
License
Reuse
s
search-script-scrapeby stanfordjournalism
101 real world web scraping exercises in Python 3 for data journalists
Python
1206
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
q
quotesbotby scrapy
This is a sample Scrapy project for educational purposes
Python
1191
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
l
linkedin_scraperby joeyism
A library that scrapes Linkedin for user data
Python
1158
Updated: 2 y ago
License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
t
toutatisby megadose
Toutatis is a tool that allows you to extract information from instagrams accounts such as e-mails, phone numbers and more
Python
1157
Updated: 2 y ago
License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
t
trafilaturaby adbar
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Python
1105
Updated: 2 y ago
License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
a
artooby medialab
artoo.js - the client-side scraping companion.
JavaScript
1086
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
p
pjscrapeby nrabinowitz
A web-scraping framework written in Javascript, using PhantomJS and jQuery
JavaScript
1006
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
m
metainspectorby metainspector
Ruby gem for web scraping purposes. It scrapes a given URL, and returns you its title, meta description, meta keywords, links, images...
Ruby
990
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
o
osi.igby th3unkn0n
Information Gathering Instagram.
Python
986
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
m
mlscraperby lorey
🤖 Scrape data from HTML websites automatically by just providing examples
Python
975
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse