Scrapy, a fast high-level web crawling & scraping framework for Python.
Support
Quality
Security
License
Reuse
The fast, flexible, and elegant library for parsing and manipulating HTML and XML.
Support
Quality
Security
License
Reuse
A logger for just about everything.
Support
Quality
Security
License
Reuse
Elegant Scraper and Crawler Framework for Golang
Support
Quality
Security
License
Reuse
Python爬虫代理IP池(proxy pool)
Support
Quality
Security
License
Reuse
:rainbow:Python3网络爬虫实战:淘宝、京东、网易云、B站、12306、抖音、笔趣阁、漫画小说下载、音乐电影下载等
Support
Quality
Security
License
Reuse
A Powerful Spider(Web Crawler) System in Python.
Support
Quality
Security
License
Reuse
一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、微信读书、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are friendly to beginners. )
Support
Quality
Security
License
Reuse
A scalable web crawler framework for Java.
Support
Quality
Security
License
Reuse
Multitask、MultiThread(MultiConnection)、Breakpoint-resume、High-concurrency、Simple to use、Single/NotSingle-process
Support
Quality
Security
License
Reuse
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
Support
Quality
Security
License
Reuse
Incredibly fast crawler designed for OSINT.
Support
Quality
Security
License
Reuse
AV 电影管理系统, avmoo , javbus , javlibrary 爬虫,线上 AV 影片图书馆,AV 磁力链接数据库,Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database
Support
Quality
Security
License
Reuse
🕵️♂️ Collect a dossier on a person by username from thousands of sites
Support
Quality
Security
License
Reuse
Python脚本。模拟登录知乎, 爬虫,操作excel,微信公众号,远程开机
Support
Quality
Security
License
Reuse
新一代爬虫平台,以图形化方式定义爬虫流程,不写代码即可完成爬虫。
Support
Quality
Security
License
Reuse
Pholcus is a distributed high-concurrency crawler software written in pure golang
Support
Quality
Security
License
Reuse
新浪微博爬虫,用python爬取新浪微博数据
Support
Quality
Security
License
Reuse
Pholcus is a distributed high-concurrency crawler software written in pure golang
Support
Quality
Security
License
Reuse
INFO-SPIDER 是一个集众多数据源于一身的爬虫工具箱🧰,旨在安全快捷的帮助用户拿回自己的数据,工具代码开源,流程透明。支持数据源包括GitHub、QQ邮箱、网易邮箱、阿里邮箱、新浪邮箱、Hotmail邮箱、Outlook邮箱、京东、淘宝、支付宝、中国移动、中国联通、中国电信、知乎、哔哩哔哩、网易云音乐、QQ好友、QQ群、生成朋友圈相册、浏览器浏览历史、12306、博客园、CSDN博客、开源中国博客、简书。
Support
Quality
Security
License
Reuse
Web Crawler/Spider for NodeJS + server-side jQuery ;-)
Support
Quality
Security
License
Reuse
Python入门网络爬虫之精华版
Support
Quality
Security
License
Reuse
模拟登录一些知名的网站,为了方便爬取需要登录的网站
Support
Quality
Security
License
Reuse
基于搜狗微信搜索的微信公众号爬虫接口
Support
Quality
Security
License
Reuse
Distributed crawler powered by Headless Chrome
Support
Quality
Security
License
Reuse
Redis-based components for Scrapy.
Support
Quality
Security
License
Reuse
:sparkling_heart: High available distributed ip proxy pool, powerd by Scrapy and Redis
Support
Quality
Security
License
Reuse
A Tool for Domain Flyovers
Support
Quality
Security
License
Reuse
:zap: A distributed crawler for weibo, building with celery and requests.
Support
Quality
Security
License
Reuse
今日热榜,一个获取各大热门网站热门头条的聚合网站,使用Go语言编写,多协程异步快速抓取信息,预览:https://mo.fish
Support
Quality
Security
License
Reuse
Open Source Web Crawler for Java
Support
Quality
Security
License
Reuse
实战🐍多种网站、电商数据爬虫🕷。包含🕸:淘宝商品、微信公众号、大众点评、企查查、招聘网站、闲鱼、阿里任务、博客园、微博、百度贴吧、豆瓣电影、包图网、全景网、豆瓣音乐、某省药监局、搜狐新闻、机器学习文本采集、fofa资产采集、汽车之家、国家统计局、百度关键词收录数、蜘蛛泛目录、今日头条、豆瓣影评、携程、小米应用商店、安居客、途家民宿❤️❤️❤️。微信爬虫展示项目:
Support
Quality
Security
License
Reuse
Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application
Support
Quality
Security
License
Reuse
DotnetSpider, a .NET standard web crawling library. It is lightweight, efficient and fast high-level web crawling & scraping framework
Support
Quality
Security
License
Reuse
《我用爬虫一天时间“偷了”知乎一百万用户,只为证明PHP是世界上最好的语言 》所使用的程序
Support
Quality
Security
License
Reuse
新浪微博爬虫(Scrapy、Redis)
Support
Quality
Security
License
Reuse
novel-plus 是一个多端(PC、WAP)阅读 、功能完善的小说 CMS 系统。包括小说推荐、小说检索、小说排行、小说阅读、小说书架、小说评论、小说爬虫、会员中心、作家专区、充值订阅、新闻发布等功能。
Support
Quality
Security
License
Reuse
C
Crawler_Illegal_Cases_In_Chinaby HiddenStrawberry
HTML 3056 Version:Current License: No License (No License)
Collection of China illegal cases about web crawler 本项目用来整理所有中国大陆爬虫开发者涉诉与违规相关的新闻、资料与法律法规。致力于帮助在中国大陆工作的爬虫行业从业者了解我国相关法律,避免触碰数据合规红线。 [AD]中文知识图谱门户
Support
Quality
Security
License
Reuse
Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js
Support
Quality
Security
License
Reuse
WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes.
Support
Quality
Security
License
Reuse
Multifarious Scrapy examples. Spiders for alexa / amazon / douban / douyu / github / linkedin etc.
Support
Quality
Security
License
Reuse
🔞 JAVClub - 让你的大姐姐不再走丢
Support
Quality
Security
License
Reuse
Standards compliant HTML filter written in PHP
Support
Quality
Security
License
Reuse
An advanced web directory & file scanning tool that will be more powerful than DirBuster, Dirsearch, cansina, and Yu Jian.一个高级web目录、文件扫描工具,功能将会强于DirBuster、Dirsearch、cansina、御剑。
Support
Quality
Security
License
Reuse
admin ui for scrapy/open source scrapinghub
Support
Quality
Security
License
Reuse
:spider: The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。
Support
Quality
Security
License
Reuse
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
Support
Quality
Security
License
Reuse
小说精品屋是一个多平台(web、安卓app、微信小程序)、功能完善的屏幕自适应小说漫画连载系统,包含精品小说专区、轻小说专区和漫画专区。包括小说/漫画分类、小说/漫画搜索、小说/漫画排行、完本小说/漫画、小说/漫画评分、小说/漫画在线阅读、小说/漫画书架、小说/漫画阅读记录、小说下载、小说弹幕、小说/漫画自动采集/更新/纠错、小说内容自动分享到微博、邮件自动推广、链接自动推送到百度搜索引擎等功能。
Support
Quality
Security
License
Reuse
Python爬虫实战 - 模拟登陆各大网站 包含但不限于:滑块验证、拼多多、美团、百度、bilibili、大众点评、淘宝,如果喜欢请start ❤️
Support
Quality
Security
License
Reuse
链家网和贝壳网房价爬虫,采集北京上海广州深圳等21个中国主要城市的房价数据(小区,二手房,出租房,新房),稳定可靠快速!支持csv,MySQL, MongoDB,Excel, json存储,支持Python2和3,图表展示数据,注释丰富 ,点星支持,仅供学习参考,请勿用于商业用途,后果自负。
Support
Quality
Security
License
Reuse
s
scrapyby scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python.
Python 47503Updated: 1 y ago License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
c
cheerioby cheeriojs
The fast, flexible, and elegant library for parsing and manipulating HTML and XML.
TypeScript 26488Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
w
winstonby winstonjs
A logger for just about everything.
JavaScript 20628Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
c
collyby gocolly
Elegant Scraper and Crawler Framework for Golang
Go 19706Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
p
Support
Quality
Security
License
Reuse
p
python-spiderby Jack-Cherish
:rainbow:Python3网络爬虫实战:淘宝、京东、网易云、B站、12306、抖音、笔趣阁、漫画小说下载、音乐电影下载等
Python 16227Updated: 1 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
p
pyspiderby binux
A Powerful Spider(Web Crawler) System in Python.
Python 15891Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
e
examples-of-web-crawlersby shengqiangzhang
一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、微信读书、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are friendly to beginners. )
Python 12136Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
w
webmagicby code4craft
A scalable web crawler framework for Java.
Java 10861Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
F
FileDownloaderby lingochamp
Multitask、MultiThread(MultiConnection)、Breakpoint-resume、High-concurrency、Simple to use、Single/NotSingle-process
Java 10805Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
c
crawlabby crawlab-team
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
Go 9884Updated: 1 y ago License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
P
Photonby s0md3v
Incredibly fast crawler designed for OSINT.
Python 9703Updated: 1 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
a
avbookby guyueyingmu
AV 电影管理系统, avmoo , javbus , javlibrary 爬虫,线上 AV 影片图书馆,AV 磁力链接数据库,Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database
PHP 8923Updated: 1 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
m
maigretby soxoj
🕵️♂️ Collect a dossier on a person by username from thousands of sites
Python 8607Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
P
Pythonby injetlee
Python脚本。模拟登录知乎, 爬虫,操作excel,微信公众号,远程开机
Python 8377Updated: 1 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
s
spider-flowby ssssssss-team
新一代爬虫平台,以图形化方式定义爬虫流程,不写代码即可完成爬虫。
Java 8064Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
p
pholcusby andeya
Pholcus is a distributed high-concurrency crawler software written in pure golang
Go 7391Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
w
weiboSpiderby dataabc
新浪微博爬虫,用python爬取新浪微博数据
Python 6919Updated: 1 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
p
pholcusby henrylee2cn
Pholcus is a distributed high-concurrency crawler software written in pure golang
Go 6819Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
I
InfoSpiderby kangvcar
INFO-SPIDER 是一个集众多数据源于一身的爬虫工具箱🧰,旨在安全快捷的帮助用户拿回自己的数据,工具代码开源,流程透明。支持数据源包括GitHub、QQ邮箱、网易邮箱、阿里邮箱、新浪邮箱、Hotmail邮箱、Outlook邮箱、京东、淘宝、支付宝、中国移动、中国联通、中国电信、知乎、哔哩哔哩、网易云音乐、QQ好友、QQ群、生成朋友圈相册、浏览器浏览历史、12306、博客园、CSDN博客、开源中国博客、简书。
Python 6681Updated: 1 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
n
node-crawlerby bda-research
Web Crawler/Spider for NodeJS + server-side jQuery ;-)
JavaScript 6422Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
P
PythonSpiderNotesby lining0806
Python入门网络爬虫之精华版
Python 6183Updated: 1 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
f
fuck-loginby xchaoinfo
模拟登录一些知名的网站,为了方便爬取需要登录的网站
Python 5791Updated: 1 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
W
Support
Quality
Security
License
Reuse
h
headless-chrome-crawlerby yujiosaka
Distributed crawler powered by Headless Chrome
JavaScript 5368Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
scrapy-redisby rmax
Redis-based components for Scrapy.
Python 5279Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
h
haipproxyby SpiderClub
:sparkling_heart: High available distributed ip proxy pool, powerd by Scrapy and Redis
Python 5238Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
a
Support
Quality
Security
License
Reuse
w
weibospiderby SpiderClub
:zap: A distributed crawler for weibo, building with celery and requests.
Python 4769Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
T
TopListby tophubs
今日热榜,一个获取各大热门网站热门头条的聚合网站,使用Go语言编写,多协程异步快速抓取信息,预览:https://mo.fish
Go 4502Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
c
crawler4jby yasserg
Open Source Web Crawler for Java
Java 4391Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
E
ECommerceCrawlersby DropsDevopsOrg
实战🐍多种网站、电商数据爬虫🕷。包含🕸:淘宝商品、微信公众号、大众点评、企查查、招聘网站、闲鱼、阿里任务、博客园、微博、百度贴吧、豆瓣电影、包图网、全景网、豆瓣音乐、某省药监局、搜狐新闻、机器学习文本采集、fofa资产采集、汽车之家、国家统计局、百度关键词收录数、蜘蛛泛目录、今日头条、豆瓣影评、携程、小米应用商店、安居客、途家民宿❤️❤️❤️。微信爬虫展示项目:
Python 3941Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
h
hakrawlerby hakluke
Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application
Go 3768Updated: 1 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
D
DotnetSpiderby dotnetcore
DotnetSpider, a .NET standard web crawling library. It is lightweight, efficient and fast high-level web crawling & scraping framework
C# 3664Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
p
phpspiderby owner888
《我用爬虫一天时间“偷了”知乎一百万用户,只为证明PHP是世界上最好的语言 》所使用的程序
PHP 3497Updated: 1 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
S
SinaSpiderby LiuXingMing
新浪微博爬虫(Scrapy、Redis)
Python 3209Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
n
novel-plusby 201206030
novel-plus 是一个多端(PC、WAP)阅读 、功能完善的小说 CMS 系统。包括小说推荐、小说检索、小说排行、小说阅读、小说书架、小说评论、小说爬虫、会员中心、作家专区、充值订阅、新闻发布等功能。
Java 3084Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
C
Crawler_Illegal_Cases_In_Chinaby HiddenStrawberry
Collection of China illegal cases about web crawler 本项目用来整理所有中国大陆爬虫开发者涉诉与违规相关的新闻、资料与法律法规。致力于帮助在中国大陆工作的爬虫行业从业者了解我国相关法律,避免触碰数据合规红线。 [AD]中文知识图谱门户
HTML 3056Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
G
Gerapyby Gerapy
Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js
Python 2993Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
W
WebCollectorby CrawlScript
WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes.
Java 2975Updated: 1 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
s
scrapy-examplesby geekan
Multifarious Scrapy examples. Spiders for alexa / amazon / douban / douyu / github / linkedin etc.
Python 2959Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
c
Support
Quality
Security
License
Reuse
h
htmlpurifierby ezyang
Standards compliant HTML filter written in PHP
PHP 2785Updated: 1 y ago License: Weak Copyleft (LGPL-2.1)
Support
Quality
Security
License
Reuse
d
dirmapby H4ckForJob
An advanced web directory & file scanning tool that will be more powerful than DirBuster, Dirsearch, cansina, and Yu Jian.一个高级web目录、文件扫描工具,功能将会强于DirBuster、Dirsearch、cansina、御剑。
Python 2716Updated: 1 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
S
SpiderKeeperby DormyMo
admin ui for scrapy/open source scrapinghub
Python 2640Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
Q
QueryListby jae-jae
:spider: The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。
PHP 2548Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
h
heritrix3by internetarchive
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
Java 2485Updated: 1 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
f
fiction_houseby 201206030
小说精品屋是一个多平台(web、安卓app、微信小程序)、功能完善的屏幕自适应小说漫画连载系统,包含精品小说专区、轻小说专区和漫画专区。包括小说/漫画分类、小说/漫画搜索、小说/漫画排行、完本小说/漫画、小说/漫画评分、小说/漫画在线阅读、小说/漫画书架、小说/漫画阅读记录、小说下载、小说弹幕、小说/漫画自动采集/更新/纠错、小说内容自动分享到微博、邮件自动推广、链接自动推送到百度搜索引擎等功能。
Java 2477Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
P
Python3-Spiderby wkunzhi
Python爬虫实战 - 模拟登陆各大网站 包含但不限于:滑块验证、拼多多、美团、百度、bilibili、大众点评、淘宝,如果喜欢请start ❤️
Python 2461Updated: 1 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
l
lianjia-beike-spiderby jumper2014
链家网和贝壳网房价爬虫,采集北京上海广州深圳等21个中国主要城市的房价数据(小区,二手房,出租房,新房),稳定可靠快速!支持csv,MySQL, MongoDB,Excel, json存储,支持Python2和3,图表展示数据,注释丰富 ,点星支持,仅供学习参考,请勿用于商业用途,后果自负。
Python 2446Updated: 1 y ago License: No License (No License)
Support
Quality
Security
License
Reuse