A tool for pixiv.net. 人人可用的P站爬虫
Support
Quality
Security
License
Reuse
S
ServiceWrapper_WebCrawler_GUI_NoCode_Spiderby NaiboWang
JavaScript 
950
Version:Current
License: Strong Copyleft (GPL-3.0)
A web crawler/spider which can be used without writing any code with GUI (Service Wrapper)一个可以无代码可视化设计和执行的面向服务架构的爬虫软件
Support
Quality
Security
License
Reuse
Syntactic patterns of HTTP user-agents used by bots / robots / crawlers / scrapers / spiders. pull-request welcome :star:
Support
Quality
Security
License
Reuse
爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、微信、阿里、头条、pdd、优酷、爱奇艺、携程、12306、58、搜狐、各种指数、维普万方、Zlibraty、Oalib、小说、招标网、采购网、小红书、大众点评、推特、脉脉、知乎》
Support
Quality
Security
License
Reuse
JSpider会每周更新至少一个网站的JS解密方式,欢迎 Star,交流微信:13298307816
Support
Quality
Security
License
Reuse
Python website crawler.
Support
Quality
Security
License
Reuse
NewPipe's core library for extracting data from streaming sites
Support
Quality
Security
License
Reuse
Chrome multi-threaded download manager extension,based on Aria2 and AriaNg. Chrome多线程下载扩展。
Support
Quality
Security
License
Reuse
Word2vec 千人千面 个性化搜索 + Scrapy2.3.0(爬取数据) + ElasticSearch7.9.1(存储数据并提供对外Restful API) + Django3.1.1 搜索
Support
Quality
Security
License
Reuse
zhihu-crawler是一个基于Java的高性能、支持免费http代理池、支持横向扩展、分布式爬虫项目
Support
Quality
Security
License
Reuse
Minimalist and powerful Web Crawler.
Support
Quality
Security
License
Reuse
OSINT Tool: Generate username lists for companies on LinkedIn
Support
Quality
Security
License
Reuse
哔哩哔哩常用API调用。支持视频、番剧、用户、频道、音频等功能。原仓库地址:https://github.com/MoyuScript/bilibili-api
Support
Quality
Security
License
Reuse
✌️ Python3 BitTorrent DHT crawler
Support
Quality
Security
License
Reuse
📝 quickly crawl the information (e.g. followers, tags etc...) of an instagram profile.
Support
Quality
Security
License
Reuse
OnionSearch is a script that scrapes urls on different .onion search engines.
Support
Quality
Security
License
Reuse
Thread pool implementation using c++11 threads
Support
Quality
Security
License
Reuse
Block bad, possibly even malicious web crawlers (automated bots) using Nginx
Support
Quality
Security
License
Reuse
Scrapy middleware to handle javascript pages using selenium
Support
Quality
Security
License
Reuse
export thunder lixian url to aria2/wget
Support
Quality
Security
License
Reuse
python爬虫
Support
Quality
Security
License
Reuse
The fastest dork scanner written in Go.
Support
Quality
Security
License
Reuse
:paw_prints: Creeper - The Next Generation Crawler Framework (Go)
Support
Quality
Security
License
Reuse
📅🇨🇳中国法定节假日数据 自动每日抓取国务院公告
Support
Quality
Security
License
Reuse
一个超级轻量的百度图片爬虫
Support
Quality
Security
License
Reuse
A simple and flexible web crawler that follows the robots.txt policies and crawl delays.
Support
Quality
Security
License
Reuse
Possibly the best practice of Scrapy 🕷 and renting a house 🏡
Support
Quality
Security
License
Reuse
A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
Support
Quality
Security
License
Reuse
A multi-thread crawler framework with many builtin image crawlers provided.
Support
Quality
Security
License
Reuse
BaiduSpider,一个爬取百度搜索结果的爬虫,目前支持百度网页搜索,百度图片搜索,百度知道搜索,百度视频搜索,百度资讯搜索,百度文库搜索,百度经验搜索和百度百科搜索。
Support
Quality
Security
License
Reuse
A web spider for zhihu.com
Support
Quality
Security
License
Reuse
计算机专业系统性学习资料(python,c,c++,计算机组成,计算机网络,编译原理,电路,谷歌插件,爬虫)
Support
Quality
Security
License
Reuse
HTTP API for Scrapy spiders
Support
Quality
Security
License
Reuse
一个方便安全研究人员获取每日安全日报的爬虫和推送程序,目前爬取范围包括先知社区、安全客、Seebug Paper、跳跳糖、奇安信攻防社区、棱角社区以及绿盟、腾讯玄武、天融信、360等实验室博客,持续更新中。
Support
Quality
Security
License
Reuse
:beers: bilibili video (including bangumi) and danmaku downloader | B站视频(含番剧)、弹幕下载器
Support
Quality
Security
License
Reuse
Shockolate - A minimalist and cross platform System Shock source port.
Support
Quality
Security
License
Reuse
直接連線登入的 PTT library,支援 PTT, PTT2
Support
Quality
Security
License
Reuse
Doujinshi downloader 绅士漫画下载
Support
Quality
Security
License
Reuse
T
TOP250movie_doubanby iphysresearch
Jupyter Notebook 
673
Version:Current
License: Permissive (BSD-2-Clause)
TOP250豆瓣电影短评:Scrapy 爬虫+数据清理/分析+构建中文文本情感分析模型
Support
Quality
Security
License
Reuse
Email addresses harvester
Support
Quality
Security
License
Reuse
P
PornHub-downloader-pythonby mariosemes
Python 
661
Version:Current
License: Strong Copyleft (GPL-3.0)
Download stuff from PH the easy way.
Support
Quality
Security
License
Reuse
A distributed web crawler framework.(分布式爬虫框架XXL-CRAWLER)
Support
Quality
Security
License
Reuse
Utilities for dealing with Tumblr blogs, Tumblr backup
Support
Quality
Security
License
Reuse
抖音、快手、火山、皮皮虾,视频去水印程序
Support
Quality
Security
License
Reuse
🤖/👨🦰 Detect bots/crawlers/spiders using the user agent string
Support
Quality
Security
License
Reuse
tailwindui-crawler downloads the component HTML files locally
Support
Quality
Security
License
Reuse
A Python Crawler Framework
Support
Quality
Security
License
Reuse
use multiple proxies with Scrapy
Support
Quality
Security
License
Reuse
豆瓣电影top250、斗鱼爬取json数据以及爬取美女图片、淘宝、有缘、CrawlSpider爬取红娘网相亲人的部分基本信息以及红娘网分布式爬取和存储redis、爬虫小demo、Selenium、爬取多点、django开发接口、爬取有缘网信息、模拟知乎登录、模拟github登录、模拟图虫网登录、爬取多点商城整站数据、爬取微信公众号历史文章、爬取微信群或者微信好友分享的文章、itchat监听指定微信公众号分享的文章
Support
Quality
Security
License
Reuse
Random User-Agent middleware based on fake-useragent
Support
Quality
Security
License
Reuse
P
Pxerby FoXZilla
A tool for pixiv.net. 人人可用的P站爬虫
JavaScript
952
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
S
ServiceWrapper_WebCrawler_GUI_NoCode_Spiderby NaiboWang
A web crawler/spider which can be used without writing any code with GUI (Service Wrapper)一个可以无代码可视化设计和执行的面向服务架构的爬虫软件
JavaScript
950
Updated: 2 y ago
License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
c
crawler-user-agentsby monperrus
Syntactic patterns of HTTP user-agents used by bots / robots / crawlers / scrapers / spiders. pull-request welcome :star:
Python
949
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
l
lxSpiderby lixi5338619
爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、微信、阿里、头条、pdd、优酷、爱奇艺、携程、12306、58、搜狐、各种指数、维普万方、Zlibraty、Oalib、小说、招标网、采购网、小红书、大众点评、推特、脉脉、知乎》
Python
949
Updated: 2 y ago
License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
J
JSpiderby EnjoyScraping
JSpider会每周更新至少一个网站的JS解密方式,欢迎 Star,交流微信:13298307816
JavaScript
934
Updated: 4 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
S
Spiderby buckyroberts
Python website crawler.
Python
926
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
N
NewPipeExtractorby TeamNewPipe
NewPipe's core library for extracting data from streaming sites
Java
913
Updated: 2 y ago
License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
C
Camtdby jae-jae
Chrome multi-threaded download manager extension,based on Aria2 and AriaNg. Chrome多线程下载扩展。
JavaScript
907
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
F
FunpySpiderSearchEngineby mtianyan
Word2vec 千人千面 个性化搜索 + Scrapy2.3.0(爬取数据) + ElasticSearch7.9.1(存储数据并提供对外Restful API) + Django3.1.1 搜索
Python
906
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
z
zhihu-crawlerby wycm
zhihu-crawler是一个基于Java的高性能、支持免费http代理池、支持横向扩展、分布式爬虫项目
Java
896
Updated: 4 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
s
sukhoiby untwisted
Minimalist and powerful Web Crawler.
Python
879
Updated: 4 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
l
linkedin2usernameby initstring
OSINT Tool: Generate username lists for companies on LinkedIn
Python
879
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
b
bilibili-apiby Nemo2011
哔哩哔哩常用API调用。支持视频、番剧、用户、频道、音频等功能。原仓库地址:https://github.com/MoyuScript/bilibili-api
Python
878
Updated: 2 y ago
License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
m
magnet-dhtby chenjiandongx
✌️ Python3 BitTorrent DHT crawler
Python
869
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
i
instagram-profilecrawlby timgrossmann
📝 quickly crawl the information (e.g. followers, tags etc...) of an instagram profile.
Python
849
Updated: 4 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
O
OnionSearchby megadose
OnionSearch is a script that scrapes urls on different .onion search engines.
Python
829
Updated: 2 y ago
License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
t
thread-poolby mtrebi
Thread pool implementation using c++11 threads
C++
820
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
n
nginx-badbot-blockerby mariusv
Block bad, possibly even malicious web crawlers (automated bots) using Nginx
Shell
806
Updated: 4 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
s
scrapy-seleniumby clemfromspace
Scrapy middleware to handle javascript pages using selenium
Python
792
Updated: 2 y ago
License: Permissive (WTFPL)
Support
Quality
Security
License
Reuse
T
ThunderLixianExporterby binux
export thunder lixian url to aria2/wget
JavaScript
775
Updated: 4 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
s
Support
Quality
Security
License
Reuse
g
go-dorkby dwisiswant0
The fastest dork scanner written in Go.
Go
774
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
c
creeperby wspl
:paw_prints: Creeper - The Next Generation Crawler Framework (Go)
Go
774
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
h
holiday-cnby NateScarlet
📅🇨🇳中国法定节假日数据 自动每日抓取国务院公告
Python
772
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
B
Support
Quality
Security
License
Reuse
f
fetchbotby PuerkitoBio
A simple and flexible web crawler that follows the robots.txt policies and crawl delays.
Go
760
Updated: 4 y ago
License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
h
house-rentingby kezhenxu94
Possibly the best practice of Scrapy 🕷 and renting a house 🏡
Python
756
Updated: 4 y ago
License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
s
spidrby postmodern
A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
Ruby
753
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
i
icrawlerby hellock
A multi-thread crawler framework with many builtin image crawlers provided.
Python
747
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
B
BaiduSpiderby BaiduSpider
BaiduSpider,一个爬取百度搜索结果的爬虫,目前支持百度网页搜索,百度图片搜索,百度知道搜索,百度视频搜索,百度资讯搜索,百度文库搜索,百度经验搜索和百度百科搜索。
Python
746
Updated: 2 y ago
License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
z
zhihu-spiderby MorganZhang100
A web spider for zhihu.com
Python
739
Updated: 4 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
C
ComputerStudentby sfvsfv
计算机专业系统性学习资料(python,c,c++,计算机组成,计算机网络,编译原理,电路,谷歌插件,爬虫)
HTML
700
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
scrapyrtby scrapinghub
HTTP API for Scrapy spiders
Python
692
Updated: 3 y ago
License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
S
SecCrawlerby Le0nsec
一个方便安全研究人员获取每日安全日报的爬虫和推送程序,目前爬取范围包括先知社区、安全客、Seebug Paper、跳跳糖、奇安信攻防社区、棱角社区以及绿盟、腾讯玄武、天融信、360等实验室博客,持续更新中。
Go
691
Updated: 2 y ago
License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
b
bililiby SigureMo
:beers: bilibili video (including bangumi) and danmaku downloader | B站视频(含番剧)、弹幕下载器
Python
688
Updated: 3 y ago
License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
s
systemshockby Interrupt
Shockolate - A minimalist and cross platform System Shock source port.
C
677
Updated: 2 y ago
License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
P
PyPttby PyPtt
直接連線登入的 PTT library,支援 PTT, PTT2
Python
674
Updated: 2 y ago
License: Weak Copyleft (LGPL-3.0)
Support
Quality
Security
License
Reuse
x
xeHentaiby fffonion
Doujinshi downloader 绅士漫画下载
Python
673
Updated: 2 y ago
License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
T
TOP250movie_doubanby iphysresearch
TOP250豆瓣电影短评:Scrapy 爬虫+数据清理/分析+构建中文文本情感分析模型
Jupyter Notebook
673
Updated: 2 y ago
License: Permissive (BSD-2-Clause)
Support
Quality
Security
License
Reuse
E
EmailHarvesterby maldevel
Email addresses harvester
Python
670
Updated: 2 y ago
License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
P
PornHub-downloader-pythonby mariosemes
Download stuff from PH the easy way.
Python
661
Updated: 2 y ago
License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
x
xxl-crawlerby xuxueli
A distributed web crawler framework.(分布式爬虫框架XXL-CRAWLER)
Java
658
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
t
tumblr-utilsby bbolli
Utilities for dealing with Tumblr blogs, Tumblr backup
Python
645
Updated: 2 y ago
License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
s
Support
Quality
Security
License
Reuse
i
isbotby omrilotan
🤖/👨🦰 Detect bots/crawlers/spiders using the user agent string
JavaScript
645
Updated: 2 y ago
License: Permissive (Unlicense)
Support
Quality
Security
License
Reuse
t
tailwindui-crawlerby kiliman
tailwindui-crawler downloads the component HTML files locally
JavaScript
642
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
t
toolsby ghost123gg
A Python Crawler Framework
Python
636
Updated: 4 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
s
scrapy-rotating-proxiesby TeamHG-Memex
use multiple proxies with Scrapy
Python
634
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
P
Python-Spiderby lb2281075105
豆瓣电影top250、斗鱼爬取json数据以及爬取美女图片、淘宝、有缘、CrawlSpider爬取红娘网相亲人的部分基本信息以及红娘网分布式爬取和存储redis、爬虫小demo、Selenium、爬取多点、django开发接口、爬取有缘网信息、模拟知乎登录、模拟github登录、模拟图虫网登录、爬取多点商城整站数据、爬取微信公众号历史文章、爬取微信群或者微信好友分享的文章、itchat监听指定微信公众号分享的文章
Python
632
Updated: 4 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
scrapy-fake-useragentby alecxe
Random User-Agent middleware based on fake-useragent
Python
631
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse