Crawler Libraries - Page 1

scrapyby scrapy

Python 47503 Version:Current
License: Permissive (BSD-3-Clause)

Scrapy, a fast high-level web crawling & scraping framework for Python.

Support

Quality

Security

License

Reuse

cheerioby cheeriojs

TypeScript 26488 Version:Current
License: Permissive (MIT)

The fast, flexible, and elegant library for parsing and manipulating HTML and XML.

Support

Quality

Security

License

Reuse

winstonby winstonjs

JavaScript 20628 Version:Current
License: Permissive (MIT)

A logger for just about everything.

Support

Quality

Security

License

Reuse

collyby gocolly

Go 19706 Version:Current
License: Permissive (Apache-2.0)

Elegant Scraper and Crawler Framework for Golang

Support

Quality

Security

License

Reuse

Python 18050 Version:Current
License: Permissive (MIT)

Python爬虫代理IP池(proxy pool)

Support

Quality

Security

License

Reuse

python-spiderby Jack-Cherish

Python 16227 Version:Current
License: No License (No License)

:rainbow:Python3网络爬虫实战：淘宝、京东、网易云、B站、12306、抖音、笔趣阁、漫画小说下载、音乐电影下载等

Support

Quality

Security

License

Reuse

Python 15891 Version:Current
License: Permissive (Apache-2.0)

A Powerful Spider(Web Crawler) System in Python.

Support

Quality

Security

License

Reuse

examples-of-web-crawlersby shengqiangzhang

Python 12136 Version:Current
License: Permissive (MIT)

一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、微信读书、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are friendly to beginners. )

Support

Quality

Security

License

Reuse

webmagicby code4craft

Java 10861 Version:Current
License: Permissive (Apache-2.0)

A scalable web crawler framework for Java.

Support

Quality

Security

License

Reuse

Java 10805 Version:Current
License: Permissive (Apache-2.0)

Multitask、MultiThread(MultiConnection)、Breakpoint-resume、High-concurrency、Simple to use、Single/NotSingle-process

Support

Quality

Security

License

Reuse

crawlabby crawlab-team

Go 9884 Version:Current
License: Permissive (BSD-3-Clause)

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台，支持任何语言和框架

Support

Quality

Security

License

Reuse

Photonby s0md3v

Python 9703 Version:Current
License: Strong Copyleft (GPL-3.0)

Incredibly fast crawler designed for OSINT.

Support

Quality

Security

License

Reuse

avbookby guyueyingmu

PHP 8923 Version:Current
License: No License (No License)

AV 电影管理系统， avmoo , javbus , javlibrary 爬虫，线上 AV 影片图书馆，AV 磁力链接数据库，Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database

Support

Quality

Security

License

Reuse

maigretby soxoj

Python 8607 Version:Current
License: Permissive (MIT)

🕵️‍♂️ Collect a dossier on a person by username from thousands of sites

Support

Quality

Security

License

Reuse

Pythonby injetlee

Python 8377 Version:Current
License: No License (No License)

Python脚本。模拟登录知乎，爬虫，操作excel，微信公众号，远程开机

Support

Quality

Security

License

Reuse

spider-flowby ssssssss-team

Java 8064 Version:Current
License: Permissive (MIT)

新一代爬虫平台，以图形化方式定义爬虫流程，不写代码即可完成爬虫。

Support

Quality

Security

License

Reuse

pholcusby andeya

Go 7391 Version:Current
License: Permissive (Apache-2.0)

Pholcus is a distributed high-concurrency crawler software written in pure golang

Support

Quality

Security

License

Reuse

Python 6919 Version:Current
License: No License (No License)

新浪微博爬虫，用python爬取新浪微博数据

Support

Quality

Security

License

Reuse

pholcusby henrylee2cn

Go 6819 Version:Current
License: Permissive (Apache-2.0)

Pholcus is a distributed high-concurrency crawler software written in pure golang

Support

Quality

Security

License

Reuse

InfoSpiderby kangvcar

Python 6681 Version:Current
License: Strong Copyleft (GPL-3.0)

INFO-SPIDER 是一个集众多数据源于一身的爬虫工具箱🧰，旨在安全快捷的帮助用户拿回自己的数据，工具代码开源，流程透明。支持数据源包括GitHub、QQ邮箱、网易邮箱、阿里邮箱、新浪邮箱、Hotmail邮箱、Outlook邮箱、京东、淘宝、支付宝、中国移动、中国联通、中国电信、知乎、哔哩哔哩、网易云音乐、QQ好友、QQ群、生成朋友圈相册、浏览器浏览历史、12306、博客园、CSDN博客、开源中国博客、简书。

Support

Quality

Security

License

Reuse

node-crawlerby bda-research

JavaScript 6422 Version:Current
License: Permissive (MIT)

Web Crawler/Spider for NodeJS + server-side jQuery ;-)

Support

Quality

Security

License

Reuse

PythonSpiderNotesby lining0806

Python 6183 Version:Current
License: No License (No License)

Python入门网络爬虫之精华版

Support

Quality

Security

License

Reuse

fuck-loginby xchaoinfo

Python 5791 Version:Current
License: No License (No License)

模拟登录一些知名的网站，为了方便爬取需要登录的网站

Support

Quality

Security

License

Reuse

Python 5517 Version:Current
License: Permissive (Apache-2.0)

基于搜狗微信搜索的微信公众号爬虫接口

Support

Quality

Security

License

Reuse

headless-chrome-crawlerby yujiosaka

JavaScript 5368 Version:Current
License: Permissive (MIT)

Distributed crawler powered by Headless Chrome

Support

Quality

Security

License

Reuse

Python 5279 Version:Current
License: Permissive (MIT)

Redis-based components for Scrapy.

Support

Quality

Security

License

Reuse

haipproxyby SpiderClub

Python 5238 Version:Current
License: Permissive (MIT)

:sparkling_heart: High available distributed ip proxy pool, powerd by Scrapy and Redis

Support

Quality

Security

License

Reuse

aquatoneby michenriksen

Go 5159 Version:Current
License: Permissive (MIT)

A Tool for Domain Flyovers

Support

Quality

Security

License

Reuse

weibospiderby SpiderClub

Python 4769 Version:Current
License: Permissive (MIT)

:zap: A distributed crawler for weibo, building with celery and requests.

Support

Quality

Security

License

Reuse

TopListby tophubs

Go 4502 Version:Current
License: Permissive (Apache-2.0)

今日热榜，一个获取各大热门网站热门头条的聚合网站，使用Go语言编写，多协程异步快速抓取信息，预览:https://mo.fish

Support

Quality

Security

License

Reuse

crawler4jby yasserg

Java 4391 Version:Current
License: Permissive (Apache-2.0)

Open Source Web Crawler for Java

Support

Quality

Security

License

Reuse

ECommerceCrawlersby DropsDevopsOrg

Python 3941 Version:Current
License: Permissive (MIT)

实战🐍多种网站、电商数据爬虫🕷。包含🕸：淘宝商品、微信公众号、大众点评、企查查、招聘网站、闲鱼、阿里任务、博客园、微博、百度贴吧、豆瓣电影、包图网、全景网、豆瓣音乐、某省药监局、搜狐新闻、机器学习文本采集、fofa资产采集、汽车之家、国家统计局、百度关键词收录数、蜘蛛泛目录、今日头条、豆瓣影评、携程、小米应用商店、安居客、途家民宿❤️❤️❤️。微信爬虫展示项目:

Support

Quality

Security

License

Reuse

hakrawlerby hakluke

Go 3768 Version:Current
License: Strong Copyleft (GPL-3.0)

Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application

Support

Quality

Security

License

Reuse

DotnetSpiderby dotnetcore

C# 3664 Version:Current
License: Permissive (MIT)

DotnetSpider, a .NET standard web crawling library. It is lightweight, efficient and fast high-level web crawling & scraping framework

Support

Quality

Security

License

Reuse

phpspiderby owner888

PHP 3497 Version:Current
License: No License (No License)

《我用爬虫一天时间“偷了”知乎一百万用户，只为证明PHP是世界上最好的语言》所使用的程序

Support

Quality

Security

License

Reuse

SinaSpiderby LiuXingMing

Python 3209 Version:Current
License: No License (No License)

新浪微博爬虫（Scrapy、Redis）

Support

Quality

Security

License

Reuse

novel-plusby 201206030

Java 3084 Version:Current
License: Permissive (Apache-2.0)

novel-plus 是一个多端（PC、WAP）阅读、功能完善的小说 CMS 系统。包括小说推荐、小说检索、小说排行、小说阅读、小说书架、小说评论、小说爬虫、会员中心、作家专区、充值订阅、新闻发布等功能。

Support

Quality

Security

License

Reuse

Crawler_Illegal_Cases_In_Chinaby HiddenStrawberry

HTML 3056 Version:Current
License: No License (No License)

Collection of China illegal cases about web crawler 本项目用来整理所有中国大陆爬虫开发者涉诉与违规相关的新闻、资料与法律法规。致力于帮助在中国大陆工作的爬虫行业从业者了解我国相关法律，避免触碰数据合规红线。 [AD]中文知识图谱门户

Support

Quality

Security

License

Reuse

Gerapyby Gerapy

Python 2993 Version:Current
License: Permissive (MIT)

Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js

Support

Quality

Security

License

Reuse

WebCollectorby CrawlScript

Java 2975 Version:Current
License: Strong Copyleft (GPL-3.0)

WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes.

Support

Quality

Security

License

Reuse

Python 2959 Version:Current
License: No License (No License)

Multifarious Scrapy examples. Spiders for alexa / amazon / douban / douyu / github / linkedin etc.

Support

Quality

Security

License

Reuse

coreby JAVClub

JavaScript 2798 Version:Current
License: Permissive (MIT)

🔞 JAVClub - 让你的大姐姐不再走丢

Support

Quality

Security

License

Reuse

PHP 2785 Version:Current
License: Weak Copyleft (LGPL-2.1)

Standards compliant HTML filter written in PHP

Support

Quality

Security

License

Reuse

dirmapby H4ckForJob

Python 2716 Version:Current
License: Strong Copyleft (GPL-3.0)

An advanced web directory & file scanning tool that will be more powerful than DirBuster, Dirsearch, cansina, and Yu Jian.一个高级web目录、文件扫描工具，功能将会强于DirBuster、Dirsearch、cansina、御剑。

Support

Quality

Security

License

Reuse

Python 2640 Version:Current
License: No License (No License)

admin ui for scrapy/open source scrapinghub

Support

Quality

Security

License

Reuse

QueryListby jae-jae

PHP 2548 Version:Current
License: No License (No License)

:spider: The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。

Support

Quality

Security

License

Reuse

heritrix3by internetarchive

Java 2485 Version:Current
License: Proprietary (Proprietary)

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.

Support

Quality

Security

License

Reuse

Java 2477 Version:Current
License: Permissive (Apache-2.0)

小说精品屋是一个多平台（web、安卓app、微信小程序）、功能完善的屏幕自适应小说漫画连载系统，包含精品小说专区、轻小说专区和漫画专区。包括小说/漫画分类、小说/漫画搜索、小说/漫画排行、完本小说/漫画、小说/漫画评分、小说/漫画在线阅读、小说/漫画书架、小说/漫画阅读记录、小说下载、小说弹幕、小说/漫画自动采集/更新/纠错、小说内容自动分享到微博、邮件自动推广、链接自动推送到百度搜索引擎等功能。

Support

Quality

Security

License

Reuse

Python 2461 Version:Current
License: No License (No License)

Python爬虫实战 - 模拟登陆各大网站包含但不限于：滑块验证、拼多多、美团、百度、bilibili、大众点评、淘宝，如果喜欢请start ❤️

Support

Quality

Security

License

Reuse

lianjia-beike-spiderby jumper2014

Python 2446 Version:Current
License: No License (No License)

链家网和贝壳网房价爬虫，采集北京上海广州深圳等21个中国主要城市的房价数据（小区，二手房，出租房，新房），稳定可靠快速！支持csv,MySQL, MongoDB,Excel, json存储，支持Python2和3，图表展示数据，注释丰富，点星支持，仅供学习参考，请勿用于商业用途，后果自负。

Support

Quality

Security

License

Reuse

scrapyby scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

Python

47503

Updated: 2 y ago

License: Permissive (BSD-3-Clause)

Support

Quality

Security

License

Reuse

cheerioby cheeriojs

The fast, flexible, and elegant library for parsing and manipulating HTML and XML.

TypeScript

26488

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

winstonby winstonjs

A logger for just about everything.

JavaScript

20628

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

collyby gocolly

Elegant Scraper and Crawler Framework for Golang

19706

Updated: 2 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

proxy_poolby jhao104

Python爬虫代理IP池(proxy pool)

Python

18050

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

python-spiderby Jack-Cherish

:rainbow:Python3网络爬虫实战：淘宝、京东、网易云、B站、12306、抖音、笔趣阁、漫画小说下载、音乐电影下载等

Python

16227

Updated: 2 y ago

License: No License (No License)

Support

Quality

Security

License

Reuse

pyspiderby binux

A Powerful Spider(Web Crawler) System in Python.

Python

15891

Updated: 2 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

examples-of-web-crawlersby shengqiangzhang

一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、微信读书、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are friendly to beginners. )

Python

12136

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

webmagicby code4craft

A scalable web crawler framework for Java.

Java

10861

Updated: 2 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

FileDownloaderby lingochamp

Multitask、MultiThread(MultiConnection)、Breakpoint-resume、High-concurrency、Simple to use、Single/NotSingle-process

Java

10805

Updated: 2 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

crawlabby crawlab-team

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台，支持任何语言和框架

9884

Updated: 2 y ago

License: Permissive (BSD-3-Clause)

Support

Quality

Security

License

Reuse

Photonby s0md3v

Incredibly fast crawler designed for OSINT.

Python

9703

Updated: 2 y ago

License: Strong Copyleft (GPL-3.0)

Support

Quality

Security

License

Reuse

avbookby guyueyingmu

AV 电影管理系统， avmoo , javbus , javlibrary 爬虫，线上 AV 影片图书馆，AV 磁力链接数据库，Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database

PHP

8923

Updated: 2 y ago

License: No License (No License)

Support

Quality

Security

License

Reuse

maigretby soxoj

🕵️‍♂️ Collect a dossier on a person by username from thousands of sites

Python

8607

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

Pythonby injetlee

Python脚本。模拟登录知乎，爬虫，操作excel，微信公众号，远程开机

Python

8377

Updated: 2 y ago

License: No License (No License)

Support

Quality

Security

License

Reuse

spider-flowby ssssssss-team

新一代爬虫平台，以图形化方式定义爬虫流程，不写代码即可完成爬虫。

Java

8064

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

pholcusby andeya

Pholcus is a distributed high-concurrency crawler software written in pure golang

7391

Updated: 2 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

weiboSpiderby dataabc

新浪微博爬虫，用python爬取新浪微博数据

Python

6919

Updated: 2 y ago

License: No License (No License)

Support

Quality

Security

License

Reuse

pholcusby henrylee2cn

Pholcus is a distributed high-concurrency crawler software written in pure golang

6819

Updated: 4 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

InfoSpiderby kangvcar

INFO-SPIDER 是一个集众多数据源于一身的爬虫工具箱🧰，旨在安全快捷的帮助用户拿回自己的数据，工具代码开源，流程透明。支持数据源包括GitHub、QQ邮箱、网易邮箱、阿里邮箱、新浪邮箱、Hotmail邮箱、Outlook邮箱、京东、淘宝、支付宝、中国移动、中国联通、中国电信、知乎、哔哩哔哩、网易云音乐、QQ好友、QQ群、生成朋友圈相册、浏览器浏览历史、12306、博客园、CSDN博客、开源中国博客、简书。

Python

6681

Updated: 2 y ago

License: Strong Copyleft (GPL-3.0)

Support

Quality

Security

License

Reuse

node-crawlerby bda-research

Web Crawler/Spider for NodeJS + server-side jQuery ;-)

JavaScript

6422

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

PythonSpiderNotesby lining0806

Python入门网络爬虫之精华版

Python

6183

Updated: 2 y ago

License: No License (No License)

Support

Quality

Security

License

Reuse

fuck-loginby xchaoinfo

模拟登录一些知名的网站，为了方便爬取需要登录的网站

Python

5791

Updated: 2 y ago

License: No License (No License)

Support

Quality

Security

License

Reuse

WechatSogouby chyroc

基于搜狗微信搜索的微信公众号爬虫接口

Python

5517

Updated: 2 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

headless-chrome-crawlerby yujiosaka

Distributed crawler powered by Headless Chrome

JavaScript

5368

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

scrapy-redisby rmax

Redis-based components for Scrapy.

Python

5279

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

haipproxyby SpiderClub

:sparkling_heart: High available distributed ip proxy pool, powerd by Scrapy and Redis

Python

5238

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

aquatoneby michenriksen

A Tool for Domain Flyovers

5159

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

weibospiderby SpiderClub

:zap: A distributed crawler for weibo, building with celery and requests.

Python

4769

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

TopListby tophubs

今日热榜，一个获取各大热门网站热门头条的聚合网站，使用Go语言编写，多协程异步快速抓取信息，预览:https://mo.fish

4502

Updated: 2 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

crawler4jby yasserg

Open Source Web Crawler for Java

Java

4391

Updated: 2 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

ECommerceCrawlersby DropsDevopsOrg

实战🐍多种网站、电商数据爬虫🕷。包含🕸：淘宝商品、微信公众号、大众点评、企查查、招聘网站、闲鱼、阿里任务、博客园、微博、百度贴吧、豆瓣电影、包图网、全景网、豆瓣音乐、某省药监局、搜狐新闻、机器学习文本采集、fofa资产采集、汽车之家、国家统计局、百度关键词收录数、蜘蛛泛目录、今日头条、豆瓣影评、携程、小米应用商店、安居客、途家民宿❤️❤️❤️。微信爬虫展示项目:

Python

3941

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

hakrawlerby hakluke

Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application

3768

Updated: 2 y ago

License: Strong Copyleft (GPL-3.0)

Support

Quality

Security

License

Reuse

DotnetSpiderby dotnetcore

DotnetSpider, a .NET standard web crawling library. It is lightweight, efficient and fast high-level web crawling & scraping framework

3664

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

phpspiderby owner888

《我用爬虫一天时间“偷了”知乎一百万用户，只为证明PHP是世界上最好的语言》所使用的程序

PHP

3497

Updated: 2 y ago

License: No License (No License)

Support

Quality

Security

License

Reuse

SinaSpiderby LiuXingMing

新浪微博爬虫（Scrapy、Redis）

Python

3209

Updated: 2 y ago

License: No License (No License)

Support

Quality

Security

License

Reuse

novel-plusby 201206030

novel-plus 是一个多端（PC、WAP）阅读、功能完善的小说 CMS 系统。包括小说推荐、小说检索、小说排行、小说阅读、小说书架、小说评论、小说爬虫、会员中心、作家专区、充值订阅、新闻发布等功能。

Java

3084

Updated: 2 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

Crawler_Illegal_Cases_In_Chinaby HiddenStrawberry

Collection of China illegal cases about web crawler 本项目用来整理所有中国大陆爬虫开发者涉诉与违规相关的新闻、资料与法律法规。致力于帮助在中国大陆工作的爬虫行业从业者了解我国相关法律，避免触碰数据合规红线。 [AD]中文知识图谱门户

HTML

3056

Updated: 2 y ago

License: No License (No License)

Support

Quality

Security

License

Reuse

Gerapyby Gerapy

Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js

Python

2993

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

WebCollectorby CrawlScript

WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes.

Java

2975

Updated: 2 y ago

License: Strong Copyleft (GPL-3.0)

Support

Quality

Security

License

Reuse

scrapy-examplesby geekan

Multifarious Scrapy examples. Spiders for alexa / amazon / douban / douyu / github / linkedin etc.

Python

2959

Updated: 2 y ago

License: No License (No License)

Support

Quality

Security

License

Reuse

coreby JAVClub

🔞 JAVClub - 让你的大姐姐不再走丢

JavaScript

2798

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

htmlpurifierby ezyang

Standards compliant HTML filter written in PHP

PHP

2785

Updated: 2 y ago

License: Weak Copyleft (LGPL-2.1)

Support

Quality

Security

License

Reuse

dirmapby H4ckForJob

An advanced web directory & file scanning tool that will be more powerful than DirBuster, Dirsearch, cansina, and Yu Jian.一个高级web目录、文件扫描工具，功能将会强于DirBuster、Dirsearch、cansina、御剑。

Python

2716

Updated: 2 y ago

License: Strong Copyleft (GPL-3.0)

Support

Quality

Security

License

Reuse

SpiderKeeperby DormyMo

admin ui for scrapy/open source scrapinghub

Python

2640

Updated: 2 y ago

License: No License (No License)

Support

Quality

Security

License

Reuse

QueryListby jae-jae

:spider: The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。

PHP

2548

Updated: 2 y ago

License: No License (No License)

Support

Quality

Security

License

Reuse

heritrix3by internetarchive

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.

Java

2485

Updated: 2 y ago

License: Proprietary (Proprietary)

Support

Quality

Security

License

Reuse

fiction_houseby 201206030

小说精品屋是一个多平台（web、安卓app、微信小程序）、功能完善的屏幕自适应小说漫画连载系统，包含精品小说专区、轻小说专区和漫画专区。包括小说/漫画分类、小说/漫画搜索、小说/漫画排行、完本小说/漫画、小说/漫画评分、小说/漫画在线阅读、小说/漫画书架、小说/漫画阅读记录、小说下载、小说弹幕、小说/漫画自动采集/更新/纠错、小说内容自动分享到微博、邮件自动推广、链接自动推送到百度搜索引擎等功能。

Java

2477

Updated: 4 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

Python3-Spiderby wkunzhi

Python爬虫实战 - 模拟登陆各大网站包含但不限于：滑块验证、拼多多、美团、百度、bilibili、大众点评、淘宝，如果喜欢请start ❤️

Python

2461

Updated: 2 y ago

License: No License (No License)

Support

Quality

Security

License

Reuse

lianjia-beike-spiderby jumper2014

链家网和贝壳网房价爬虫，采集北京上海广州深圳等21个中国主要城市的房价数据（小区，二手房，出租房，新房），稳定可靠快速！支持csv,MySQL, MongoDB,Excel, json存储，支持Python2和3，图表展示数据，注释丰富，点星支持，仅供学习参考，请勿用于商业用途，后果自负。

Python

2446

Updated: 2 y ago

License: No License (No License)

Support

Quality

Security

License

Reuse

Open Weaver – Develop Applications Faster with Open Source

Terms
Privacy policy

Crawler Libraries - Page 1

scrapyby scrapy

Python 47503 Version:Current License: Permissive (BSD-3-Clause)

Scrapy, a fast high-level web crawling & scraping framework for Python.

cheerioby cheeriojs

TypeScript 26488 Version:Current License: Permissive (MIT)

The fast, flexible, and elegant library for parsing and manipulating HTML and XML.

winstonby winstonjs

JavaScript 20628 Version:Current License: Permissive (MIT)

A logger for just about everything.

collyby gocolly

Go 19706 Version:Current License: Permissive (Apache-2.0)

Elegant Scraper and Crawler Framework for Golang

proxy_poolby jhao104

Python 18050 Version:Current License: Permissive (MIT)

Python爬虫代理IP池(proxy pool)

python-spiderby Jack-Cherish

Python 16227 Version:Current License: No License (No License)

:rainbow:Python3网络爬虫实战：淘宝、京东、网易云、B站、12306、抖音、笔趣阁、漫画小说下载、音乐电影下载等

pyspiderby binux

Python 15891 Version:Current License: Permissive (Apache-2.0)

A Powerful Spider(Web Crawler) System in Python.

examples-of-web-crawlersby shengqiangzhang

Python 12136 Version:Current License: Permissive (MIT)

一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、微信读书、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are friendly to beginners. )

webmagicby code4craft

Java 10861 Version:Current License: Permissive (Apache-2.0)

A scalable web crawler framework for Java.

FileDownloaderby lingochamp

Java 10805 Version:Current License: Permissive (Apache-2.0)

Multitask、MultiThread(MultiConnection)、Breakpoint-resume、High-concurrency、Simple to use、Single/NotSingle-process

crawlabby crawlab-team

Go 9884 Version:Current License: Permissive (BSD-3-Clause)

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台，支持任何语言和框架

Photonby s0md3v

Python 9703 Version:Current License: Strong Copyleft (GPL-3.0)

Incredibly fast crawler designed for OSINT.

avbookby guyueyingmu

PHP 8923 Version:Current License: No License (No License)

AV 电影管理系统， avmoo , javbus , javlibrary 爬虫，线上 AV 影片图书馆，AV 磁力链接数据库，Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database

maigretby soxoj

Python 8607 Version:Current License: Permissive (MIT)

🕵️‍♂️ Collect a dossier on a person by username from thousands of sites

Pythonby injetlee

Python 8377 Version:Current License: No License (No License)

Python脚本。模拟登录知乎， 爬虫，操作excel，微信公众号，远程开机

spider-flowby ssssssss-team

Java 8064 Version:Current License: Permissive (MIT)

新一代爬虫平台，以图形化方式定义爬虫流程，不写代码即可完成爬虫。

pholcusby andeya

Go 7391 Version:Current License: Permissive (Apache-2.0)

Pholcus is a distributed high-concurrency crawler software written in pure golang

weiboSpiderby dataabc

Python 6919 Version:Current License: No License (No License)

新浪微博爬虫，用python爬取新浪微博数据

pholcusby henrylee2cn

Go 6819 Version:Current License: Permissive (Apache-2.0)

Pholcus is a distributed high-concurrency crawler software written in pure golang

InfoSpiderby kangvcar

Python 6681 Version:Current License: Strong Copyleft (GPL-3.0)

node-crawlerby bda-research

JavaScript 6422 Version:Current License: Permissive (MIT)

Web Crawler/Spider for NodeJS + server-side jQuery ;-)

PythonSpiderNotesby lining0806

Python 6183 Version:Current License: No License (No License)

Python入门网络爬虫之精华版

fuck-loginby xchaoinfo

Python 5791 Version:Current License: No License (No License)

模拟登录一些知名的网站，为了方便爬取需要登录的网站

WechatSogouby chyroc

Python 5517 Version:Current License: Permissive (Apache-2.0)

基于搜狗微信搜索的微信公众号爬虫接口

headless-chrome-crawlerby yujiosaka

JavaScript 5368 Version:Current License: Permissive (MIT)

Distributed crawler powered by Headless Chrome

scrapy-redisby rmax

Python 5279 Version:Current License: Permissive (MIT)

Redis-based components for Scrapy.

haipproxyby SpiderClub

Python 5238 Version:Current License: Permissive (MIT)

Python 47503 Version:Current
License: Permissive (BSD-3-Clause)

TypeScript 26488 Version:Current
License: Permissive (MIT)

JavaScript 20628 Version:Current
License: Permissive (MIT)

Go 19706 Version:Current
License: Permissive (Apache-2.0)

Python 18050 Version:Current
License: Permissive (MIT)

Python 16227 Version:Current
License: No License (No License)

Python 15891 Version:Current
License: Permissive (Apache-2.0)

Python 12136 Version:Current
License: Permissive (MIT)

Java 10861 Version:Current
License: Permissive (Apache-2.0)

Java 10805 Version:Current
License: Permissive (Apache-2.0)

Go 9884 Version:Current
License: Permissive (BSD-3-Clause)

Python 9703 Version:Current
License: Strong Copyleft (GPL-3.0)

PHP 8923 Version:Current
License: No License (No License)

Python 8607 Version:Current
License: Permissive (MIT)

Python 8377 Version:Current
License: No License (No License)

Python脚本。模拟登录知乎，爬虫，操作excel，微信公众号，远程开机

Java 8064 Version:Current
License: Permissive (MIT)

Go 7391 Version:Current
License: Permissive (Apache-2.0)

Python 6919 Version:Current
License: No License (No License)

Go 6819 Version:Current
License: Permissive (Apache-2.0)

Python 6681 Version:Current
License: Strong Copyleft (GPL-3.0)

JavaScript 6422 Version:Current
License: Permissive (MIT)

Python 6183 Version:Current
License: No License (No License)

Python 5791 Version:Current
License: No License (No License)

Python 5517 Version:Current
License: Permissive (Apache-2.0)

JavaScript 5368 Version:Current
License: Permissive (MIT)

Python 5279 Version:Current
License: Permissive (MIT)

Python 5238 Version:Current
License: Permissive (MIT)

Go 5159 Version:Current
License: Permissive (MIT)

Python 4769 Version:Current
License: Permissive (MIT)

Go 4502 Version:Current
License: Permissive (Apache-2.0)

Java 4391 Version:Current
License: Permissive (Apache-2.0)

Python 3941 Version:Current
License: Permissive (MIT)

Go 3768 Version:Current
License: Strong Copyleft (GPL-3.0)

C# 3664 Version:Current
License: Permissive (MIT)

PHP 3497 Version:Current
License: No License (No License)

《我用爬虫一天时间“偷了”知乎一百万用户，只为证明PHP是世界上最好的语言》所使用的程序

Python 3209 Version:Current
License: No License (No License)

Java 3084 Version:Current
License: Permissive (Apache-2.0)

novel-plus 是一个多端（PC、WAP）阅读、功能完善的小说 CMS 系统。包括小说推荐、小说检索、小说排行、小说阅读、小说书架、小说评论、小说爬虫、会员中心、作家专区、充值订阅、新闻发布等功能。

HTML 3056 Version:Current
License: No License (No License)

Python 2993 Version:Current
License: Permissive (MIT)

Java 2975 Version:Current
License: Strong Copyleft (GPL-3.0)

Python 2959 Version:Current
License: No License (No License)

JavaScript 2798 Version:Current
License: Permissive (MIT)

PHP 2785 Version:Current
License: Weak Copyleft (LGPL-2.1)

Python 2716 Version:Current
License: Strong Copyleft (GPL-3.0)

Python 2640 Version:Current
License: No License (No License)

PHP 2548 Version:Current
License: No License (No License)

Java 2485 Version:Current
License: Proprietary (Proprietary)

Java 2477 Version:Current
License: Permissive (Apache-2.0)

Python 2461 Version:Current
License: No License (No License)

Python爬虫实战 - 模拟登陆各大网站包含但不限于：滑块验证、拼多多、美团、百度、bilibili、大众点评、淘宝，如果喜欢请start ❤️

Python 2446 Version:Current
License: No License (No License)