:arrow_double_down: Dumb downloader that scrapes the web
Support
Quality
Security
License
Reuse
An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.
Support
Quality
Security
License
Reuse
Pythonic HTML Parsing for Humans™
Support
Quality
Security
License
Reuse
News, full-text, and article metadata extraction in Python 3. Advanced docs:
Support
Quality
Security
License
Reuse
Goutte, a simple PHP Web Scraper
Support
Quality
Security
License
Reuse
Visual scraping for Scrapy
Support
Quality
Security
License
Reuse
Download pictures (or videos) along with their captions and other metadata from Instagram.
Support
Quality
Security
License
Reuse
Scrapes an instagram user's photos and videos
Support
Quality
Security
License
Reuse
The next web scraper. See through the <html> noise.
Support
Quality
Security
License
Reuse
Declarative web scraping
Support
Quality
Security
License
Reuse
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
Support
Quality
Security
License
Reuse
Code samples from the book Web Scraping with Python http://shop.oreilly.com/product/0636920034391.do
Support
Quality
Security
License
Reuse
🔮 A Node.js scraper for humans.
Support
Quality
Security
License
Reuse
Html Content / Article Extractor, web scrapping lib in Python
Support
Quality
Security
License
Reuse
Scrape all the media from an OnlyFans account - Updated regularly
Support
Quality
Security
License
Reuse
A Python module to bypass Cloudflare's anti-bot page.
Support
Quality
Security
License
Reuse
Up-to-date simple useragent faker with real world database
Support
Quality
Security
License
Reuse
Scrapy+Splash for JavaScript integration
Support
Quality
Security
License
Reuse
Emby/Jellyfin 的一个日本电影刮削器插件,可以从某些网站抓取影片信息。
Support
Quality
Security
License
Reuse
A powerful browser crawler for web vulnerability scanners
Support
Quality
Security
License
Reuse
Getting started with Puppeteer and Chrome Headless for Data Mining
Support
Quality
Security
License
Reuse
scrapes medias, likes, followers, tags and all metadata. Inspired by instagram-php-scraper,bot
Support
Quality
Security
License
Reuse
JSFinder is a tool for quickly extracting URLs and subdomains from JS files on a website.
Support
Quality
Security
License
Reuse
Get unified metadata from websites using Open Graph, Microdata, RDFa, Twitter Cards, JSON-LD, HTML, and more.
Support
Quality
Security
License
Reuse
Node.js scraper to get data from Google Play
Support
Quality
Security
License
Reuse
Another API-less Instagram pictures and videos downloader.
Support
Quality
Security
License
Reuse
Scrape Facebook public pages without an API key
Support
Quality
Security
License
Reuse
An OSINT tool to search for accounts by username in social networks.
Support
Quality
Security
License
Reuse
Scrape job websites into a single spreadsheet with no duplicates.
Support
Quality
Security
License
Reuse
Scrapy project to scrape public web directories (educational) [DEPRECATED]
Support
Quality
Security
License
Reuse
news-please - an integrated web crawler and information extractor for news that just works
Support
Quality
Security
License
Reuse
A batteries-included framework for easy web-scraping. Just add CSS! (Or do more.)
Support
Quality
Security
License
Reuse
event website listing to Open Event format scraper and converter
Support
Quality
Security
License
Reuse
Scrapers for loklak in javascript
Support
Quality
Security
License
Reuse
HTML parsing and querying with CSS selectors
Support
Quality
Security
License
Reuse
Simple web scraping for R
Support
Quality
Security
License
Reuse
A high available,high performance distributed messaging system.
Support
Quality
Security
License
Reuse
Lightweight Ruby web crawler/scraper with an elegant DSL which extracts structured data from pages.
Support
Quality
Security
License
Reuse
A web privacy measurement framework
Support
Quality
Security
License
Reuse
w
web-scraper-chrome-extensionby martinsbalodis
JavaScript 1212 Version:Current License: Weak Copyleft (LGPL-3.0)
Web data extraction tool implemented as chrome extension
Support
Quality
Security
License
Reuse
s
search-script-scrapeby stanfordjournalism
Python 1206 Version:Current License: No License (No License)
101 real world web scraping exercises in Python 3 for data journalists
Support
Quality
Security
License
Reuse
This is a sample Scrapy project for educational purposes
Support
Quality
Security
License
Reuse
A library that scrapes Linkedin for user data
Support
Quality
Security
License
Reuse
Toutatis is a tool that allows you to extract information from instagrams accounts such as e-mails, phone numbers and more
Support
Quality
Security
License
Reuse
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Support
Quality
Security
License
Reuse
artoo.js - the client-side scraping companion.
Support
Quality
Security
License
Reuse
A web-scraping framework written in Javascript, using PhantomJS and jQuery
Support
Quality
Security
License
Reuse
Ruby gem for web scraping purposes. It scrapes a given URL, and returns you its title, meta description, meta keywords, links, images...
Support
Quality
Security
License
Reuse
Information Gathering Instagram.
Support
Quality
Security
License
Reuse
🤖 Scrape data from HTML websites automatically by just providing examples
Support
Quality
Security
License
Reuse
y
you-getby soimort
:arrow_double_down: Dumb downloader that scrapes the web
Python 47551Updated: 1 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
t
twintby twintproject
An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.
Python 15023Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
r
requests-htmlby psf
Pythonic HTML Parsing for Humans™
Python 13156Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
n
newspaperby codelucas
News, full-text, and article metadata extraction in Python 3. Advanced docs:
Python 12865Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
G
Goutteby FriendsOfPHP
Goutte, a simple PHP Web Scraper
PHP 9229Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
p
portiaby scrapinghub
Visual scraping for Scrapy
Python 8890Updated: 1 y ago License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
i
instaloaderby instaloader
Download pictures (or videos) along with their captions and other metadata from Instagram.
Python 6040Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
i
instagram-scraperby arc298
Scrapes an instagram user's photos and videos
Python 5727Updated: 3 y ago License: Permissive (Unlicense)
Support
Quality
Security
License
Reuse
x
x-rayby matthewmueller
The next web scraper. See through the <html> noise.
JavaScript 5710Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
f
Support
Quality
Security
License
Reuse
a
autoscraperby alirezamika
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
Python 5239Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
p
python-scrapingby REMitchell
Code samples from the book Web Scraping with Python http://shop.oreilly.com/product/0636920034391.do
Jupyter Notebook 3993Updated: 1 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
s
scrape-itby IonicaBizau
🔮 A Node.js scraper for humans.
JavaScript 3917Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
p
python-gooseby grangier
Html Content / Article Extractor, web scrapping lib in Python
HTML 3874Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
O
OnlyFansby DIGITALCRIMINAL
Scrape all the media from an OnlyFans account - Updated regularly
Python 3419Updated: 1 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
c
cloudflare-scrapeby Anorov
A Python module to bypass Cloudflare's anti-bot page.
Python 3074Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
f
fake-useragentby fake-useragent
Up-to-date simple useragent faker with real world database
HTML 3047Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
scrapy-splashby scrapy-plugins
Scrapy+Splash for JavaScript integration
Python 2900Updated: 2 y ago License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
E
Emby.Plugins.JavScraperby JavScraper
Emby/Jellyfin 的一个日本电影刮削器插件,可以从某些网站抓取影片信息。
C# 2622Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
c
crawlergoby Qianlitp
A powerful browser crawler for web vulnerability scanners
Go 2474Updated: 1 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
t
thalby emadehsan
Getting started with Puppeteer and Chrome Headless for Data Mining
JavaScript 2362Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
i
instagram-scraperby realsirjoe
scrapes medias, likes, followers, tags and all metadata. Inspired by instagram-php-scraper,bot
Python 2204Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
J
JSFinderby Threezh1
JSFinder is a tool for quickly extracting URLs and subdomains from JS files on a website.
Python 2091Updated: 1 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
m
metascraperby microlinkhq
Get unified metadata from websites using Open Graph, Microdata, RDFa, Twitter Cards, JSON-LD, HTML, and more.
HTML 2049Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
g
google-play-scraperby facundoolano
Node.js scraper to get data from Google Play
JavaScript 1979Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
I
InstaLooterby althonos
Another API-less Instagram pictures and videos downloader.
Python 1871Updated: 1 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
f
facebook-scraperby kevinzg
Scrape Facebook public pages without an API key
Python 1763Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
b
blackbirdby p1ngul1n0
An OSINT tool to search for accounts by username in social networks.
Python 1697Updated: 1 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
J
JobFunnelby PaulMcInnis
Scrape job websites into a single spreadsheet with no duplicates.
Python 1655Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
d
dirbotby scrapy
Scrapy project to scrape public web directories (educational) [DEPRECATED]
Python 1627Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
n
news-pleaseby fhamborg
news-please - an integrated web crawler and information extractor for news that just works
Python 1626Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
u
uptonby propublica
A batteries-included framework for easy web-scraping. Just add CSS! (Or do more.)
HTML 1613Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
e
event-collectby fossasia
event website listing to Open Event format scraper and converter
Python 1510Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
l
loklak_scraper_jsby fossasia
Scrapers for loklak in javascript
JavaScript 1476Updated: 2 y ago License: Weak Copyleft (LGPL-2.1)
Support
Quality
Security
License
Reuse
s
scraperby causal-agent
HTML parsing and querying with CSS selectors
Rust 1407Updated: 1 y ago License: Permissive (ISC)
Support
Quality
Security
License
Reuse
r
Support
Quality
Security
License
Reuse
M
Metamorphosisby killme2008
A high available,high performance distributed messaging system.
Java 1320Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
w
wombatby felipecsl
Lightweight Ruby web crawler/scraper with an elegant DSL which extracts structured data from pages.
Ruby 1281Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
O
OpenWPMby openwpm
A web privacy measurement framework
Python 1254Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
w
web-scraper-chrome-extensionby martinsbalodis
Web data extraction tool implemented as chrome extension
JavaScript 1212Updated: 2 y ago License: Weak Copyleft (LGPL-3.0)
Support
Quality
Security
License
Reuse
s
search-script-scrapeby stanfordjournalism
101 real world web scraping exercises in Python 3 for data journalists
Python 1206Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
q
quotesbotby scrapy
This is a sample Scrapy project for educational purposes
Python 1191Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
l
linkedin_scraperby joeyism
A library that scrapes Linkedin for user data
Python 1158Updated: 1 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
t
toutatisby megadose
Toutatis is a tool that allows you to extract information from instagrams accounts such as e-mails, phone numbers and more
Python 1157Updated: 1 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
t
trafilaturaby adbar
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Python 1105Updated: 1 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
a
artooby medialab
artoo.js - the client-side scraping companion.
JavaScript 1086Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
p
pjscrapeby nrabinowitz
A web-scraping framework written in Javascript, using PhantomJS and jQuery
JavaScript 1006Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
m
metainspectorby metainspector
Ruby gem for web scraping purposes. It scrapes a given URL, and returns you its title, meta description, meta keywords, links, images...
Ruby 990Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
o
osi.igby th3unkn0n
Information Gathering Instagram.
Python 986Updated: 1 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
m
mlscraperby lorey
🤖 Scrape data from HTML websites automatically by just providing examples
Python 975Updated: 1 y ago License: No License (No License)
Support
Quality
Security
License
Reuse