A tool for pixiv.net. 人人可用的P站爬虫
Support
Quality
Security
License
Reuse
S
ServiceWrapper_WebCrawler_GUI_NoCode_Spiderby NaiboWang
JavaScript 950 Version:Current License: Strong Copyleft (GPL-3.0)
A web crawler/spider which can be used without writing any code with GUI (Service Wrapper)一个可以无代码可视化设计和执行的面向服务架构的爬虫软件
Support
Quality
Security
License
Reuse
Syntactic patterns of HTTP user-agents used by bots / robots / crawlers / scrapers / spiders. pull-request welcome :star:
Support
Quality
Security
License
Reuse
爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、微信、阿里、头条、pdd、优酷、爱奇艺、携程、12306、58、搜狐、各种指数、维普万方、Zlibraty、Oalib、小说、招标网、采购网、小红书、大众点评、推特、脉脉、知乎》
Support
Quality
Security
License
Reuse
JSpider会每周更新至少一个网站的JS解密方式,欢迎 Star,交流微信:13298307816
Support
Quality
Security
License
Reuse
Python website crawler.
Support
Quality
Security
License
Reuse
NewPipe's core library for extracting data from streaming sites
Support
Quality
Security
License
Reuse
Chrome multi-threaded download manager extension,based on Aria2 and AriaNg. Chrome多线程下载扩展。
Support
Quality
Security
License
Reuse
Word2vec 千人千面 个性化搜索 + Scrapy2.3.0(爬取数据) + ElasticSearch7.9.1(存储数据并提供对外Restful API) + Django3.1.1 搜索
Support
Quality
Security
License
Reuse
zhihu-crawler是一个基于Java的高性能、支持免费http代理池、支持横向扩展、分布式爬虫项目
Support
Quality
Security
License
Reuse
Minimalist and powerful Web Crawler.
Support
Quality
Security
License
Reuse
OSINT Tool: Generate username lists for companies on LinkedIn
Support
Quality
Security
License
Reuse
哔哩哔哩常用API调用。支持视频、番剧、用户、频道、音频等功能。原仓库地址:https://github.com/MoyuScript/bilibili-api
Support
Quality
Security
License
Reuse
✌️ Python3 BitTorrent DHT crawler
Support
Quality
Security
License
Reuse
📝 quickly crawl the information (e.g. followers, tags etc...) of an instagram profile.
Support
Quality
Security
License
Reuse
OnionSearch is a script that scrapes urls on different .onion search engines.
Support
Quality
Security
License
Reuse
Thread pool implementation using c++11 threads
Support
Quality
Security
License
Reuse
Block bad, possibly even malicious web crawlers (automated bots) using Nginx
Support
Quality
Security
License
Reuse
Scrapy middleware to handle javascript pages using selenium
Support
Quality
Security
License
Reuse
export thunder lixian url to aria2/wget
Support
Quality
Security
License
Reuse
python爬虫
Support
Quality
Security
License
Reuse
The fastest dork scanner written in Go.
Support
Quality
Security
License
Reuse
:paw_prints: Creeper - The Next Generation Crawler Framework (Go)
Support
Quality
Security
License
Reuse
📅🇨🇳中国法定节假日数据 自动每日抓取国务院公告
Support
Quality
Security
License
Reuse
一个超级轻量的百度图片爬虫
Support
Quality
Security
License
Reuse
A simple and flexible web crawler that follows the robots.txt policies and crawl delays.
Support
Quality
Security
License
Reuse
Possibly the best practice of Scrapy 🕷 and renting a house 🏡
Support
Quality
Security
License
Reuse
A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
Support
Quality
Security
License
Reuse
A multi-thread crawler framework with many builtin image crawlers provided.
Support
Quality
Security
License
Reuse
BaiduSpider,一个爬取百度搜索结果的爬虫,目前支持百度网页搜索,百度图片搜索,百度知道搜索,百度视频搜索,百度资讯搜索,百度文库搜索,百度经验搜索和百度百科搜索。
Support
Quality
Security
License
Reuse
A web spider for zhihu.com
Support
Quality
Security
License
Reuse
计算机专业系统性学习资料(python,c,c++,计算机组成,计算机网络,编译原理,电路,谷歌插件,爬虫)
Support
Quality
Security
License
Reuse
HTTP API for Scrapy spiders
Support
Quality
Security
License
Reuse
一个方便安全研究人员获取每日安全日报的爬虫和推送程序,目前爬取范围包括先知社区、安全客、Seebug Paper、跳跳糖、奇安信攻防社区、棱角社区以及绿盟、腾讯玄武、天融信、360等实验室博客,持续更新中。
Support
Quality
Security
License
Reuse
:beers: bilibili video (including bangumi) and danmaku downloader | B站视频(含番剧)、弹幕下载器
Support
Quality
Security
License
Reuse
Shockolate - A minimalist and cross platform System Shock source port.
Support
Quality
Security
License
Reuse
直接連線登入的 PTT library,支援 PTT, PTT2
Support
Quality
Security
License
Reuse
Doujinshi downloader 绅士漫画下载
Support
Quality
Security
License
Reuse
T
TOP250movie_doubanby iphysresearch
Jupyter Notebook 673 Version:Current License: Permissive (BSD-2-Clause)
TOP250豆瓣电影短评:Scrapy 爬虫+数据清理/分析+构建中文文本情感分析模型
Support
Quality
Security
License
Reuse
Email addresses harvester
Support
Quality
Security
License
Reuse
P
PornHub-downloader-pythonby mariosemes
Python 661 Version:Current License: Strong Copyleft (GPL-3.0)
Download stuff from PH the easy way.
Support
Quality
Security
License
Reuse
A distributed web crawler framework.(分布式爬虫框架XXL-CRAWLER)
Support
Quality
Security
License
Reuse
Utilities for dealing with Tumblr blogs, Tumblr backup
Support
Quality
Security
License
Reuse
抖音、快手、火山、皮皮虾,视频去水印程序
Support
Quality
Security
License
Reuse
🤖/👨🦰 Detect bots/crawlers/spiders using the user agent string
Support
Quality
Security
License
Reuse
tailwindui-crawler downloads the component HTML files locally
Support
Quality
Security
License
Reuse
A Python Crawler Framework
Support
Quality
Security
License
Reuse
use multiple proxies with Scrapy
Support
Quality
Security
License
Reuse
豆瓣电影top250、斗鱼爬取json数据以及爬取美女图片、淘宝、有缘、CrawlSpider爬取红娘网相亲人的部分基本信息以及红娘网分布式爬取和存储redis、爬虫小demo、Selenium、爬取多点、django开发接口、爬取有缘网信息、模拟知乎登录、模拟github登录、模拟图虫网登录、爬取多点商城整站数据、爬取微信公众号历史文章、爬取微信群或者微信好友分享的文章、itchat监听指定微信公众号分享的文章
Support
Quality
Security
License
Reuse
Random User-Agent middleware based on fake-useragent
Support
Quality
Security
License
Reuse
P
Pxerby FoXZilla
A tool for pixiv.net. 人人可用的P站爬虫
JavaScript 952Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
S
ServiceWrapper_WebCrawler_GUI_NoCode_Spiderby NaiboWang
A web crawler/spider which can be used without writing any code with GUI (Service Wrapper)一个可以无代码可视化设计和执行的面向服务架构的爬虫软件
JavaScript 950Updated: 2 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
c
crawler-user-agentsby monperrus
Syntactic patterns of HTTP user-agents used by bots / robots / crawlers / scrapers / spiders. pull-request welcome :star:
Python 949Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
l
lxSpiderby lixi5338619
爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、微信、阿里、头条、pdd、优酷、爱奇艺、携程、12306、58、搜狐、各种指数、维普万方、Zlibraty、Oalib、小说、招标网、采购网、小红书、大众点评、推特、脉脉、知乎》
Python 949Updated: 1 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
J
JSpiderby EnjoyScraping
JSpider会每周更新至少一个网站的JS解密方式,欢迎 Star,交流微信:13298307816
JavaScript 934Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
S
Spiderby buckyroberts
Python website crawler.
Python 926Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
N
NewPipeExtractorby TeamNewPipe
NewPipe's core library for extracting data from streaming sites
Java 913Updated: 1 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
C
Camtdby jae-jae
Chrome multi-threaded download manager extension,based on Aria2 and AriaNg. Chrome多线程下载扩展。
JavaScript 907Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
F
FunpySpiderSearchEngineby mtianyan
Word2vec 千人千面 个性化搜索 + Scrapy2.3.0(爬取数据) + ElasticSearch7.9.1(存储数据并提供对外Restful API) + Django3.1.1 搜索
Python 906Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
z
zhihu-crawlerby wycm
zhihu-crawler是一个基于Java的高性能、支持免费http代理池、支持横向扩展、分布式爬虫项目
Java 896Updated: 3 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
s
sukhoiby untwisted
Minimalist and powerful Web Crawler.
Python 879Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
l
linkedin2usernameby initstring
OSINT Tool: Generate username lists for companies on LinkedIn
Python 879Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
b
bilibili-apiby Nemo2011
哔哩哔哩常用API调用。支持视频、番剧、用户、频道、音频等功能。原仓库地址:https://github.com/MoyuScript/bilibili-api
Python 878Updated: 1 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
m
magnet-dhtby chenjiandongx
✌️ Python3 BitTorrent DHT crawler
Python 869Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
i
instagram-profilecrawlby timgrossmann
📝 quickly crawl the information (e.g. followers, tags etc...) of an instagram profile.
Python 849Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
O
OnionSearchby megadose
OnionSearch is a script that scrapes urls on different .onion search engines.
Python 829Updated: 1 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
t
thread-poolby mtrebi
Thread pool implementation using c++11 threads
C++ 820Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
n
nginx-badbot-blockerby mariusv
Block bad, possibly even malicious web crawlers (automated bots) using Nginx
Shell 806Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
s
scrapy-seleniumby clemfromspace
Scrapy middleware to handle javascript pages using selenium
Python 792Updated: 2 y ago License: Permissive (WTFPL)
Support
Quality
Security
License
Reuse
T
ThunderLixianExporterby binux
export thunder lixian url to aria2/wget
JavaScript 775Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
s
Support
Quality
Security
License
Reuse
g
go-dorkby dwisiswant0
The fastest dork scanner written in Go.
Go 774Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
c
creeperby wspl
:paw_prints: Creeper - The Next Generation Crawler Framework (Go)
Go 774Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
h
holiday-cnby NateScarlet
📅🇨🇳中国法定节假日数据 自动每日抓取国务院公告
Python 772Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
B
Support
Quality
Security
License
Reuse
f
fetchbotby PuerkitoBio
A simple and flexible web crawler that follows the robots.txt policies and crawl delays.
Go 760Updated: 3 y ago License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
h
house-rentingby kezhenxu94
Possibly the best practice of Scrapy 🕷 and renting a house 🏡
Python 756Updated: 3 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
s
spidrby postmodern
A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
Ruby 753Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
i
icrawlerby hellock
A multi-thread crawler framework with many builtin image crawlers provided.
Python 747Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
B
BaiduSpiderby BaiduSpider
BaiduSpider,一个爬取百度搜索结果的爬虫,目前支持百度网页搜索,百度图片搜索,百度知道搜索,百度视频搜索,百度资讯搜索,百度文库搜索,百度经验搜索和百度百科搜索。
Python 746Updated: 1 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
z
zhihu-spiderby MorganZhang100
A web spider for zhihu.com
Python 739Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
C
ComputerStudentby sfvsfv
计算机专业系统性学习资料(python,c,c++,计算机组成,计算机网络,编译原理,电路,谷歌插件,爬虫)
HTML 700Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
scrapyrtby scrapinghub
HTTP API for Scrapy spiders
Python 692Updated: 3 y ago License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
S
SecCrawlerby Le0nsec
一个方便安全研究人员获取每日安全日报的爬虫和推送程序,目前爬取范围包括先知社区、安全客、Seebug Paper、跳跳糖、奇安信攻防社区、棱角社区以及绿盟、腾讯玄武、天融信、360等实验室博客,持续更新中。
Go 691Updated: 2 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
b
bililiby SigureMo
:beers: bilibili video (including bangumi) and danmaku downloader | B站视频(含番剧)、弹幕下载器
Python 688Updated: 3 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
s
systemshockby Interrupt
Shockolate - A minimalist and cross platform System Shock source port.
C 677Updated: 2 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
P
PyPttby PyPtt
直接連線登入的 PTT library,支援 PTT, PTT2
Python 674Updated: 1 y ago License: Weak Copyleft (LGPL-3.0)
Support
Quality
Security
License
Reuse
x
xeHentaiby fffonion
Doujinshi downloader 绅士漫画下载
Python 673Updated: 2 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
T
TOP250movie_doubanby iphysresearch
TOP250豆瓣电影短评:Scrapy 爬虫+数据清理/分析+构建中文文本情感分析模型
Jupyter Notebook 673Updated: 1 y ago License: Permissive (BSD-2-Clause)
Support
Quality
Security
License
Reuse
E
EmailHarvesterby maldevel
Email addresses harvester
Python 670Updated: 2 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
P
PornHub-downloader-pythonby mariosemes
Download stuff from PH the easy way.
Python 661Updated: 2 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
x
xxl-crawlerby xuxueli
A distributed web crawler framework.(分布式爬虫框架XXL-CRAWLER)
Java 658Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
t
tumblr-utilsby bbolli
Utilities for dealing with Tumblr blogs, Tumblr backup
Python 645Updated: 2 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
s
Support
Quality
Security
License
Reuse
i
isbotby omrilotan
🤖/👨🦰 Detect bots/crawlers/spiders using the user agent string
JavaScript 645Updated: 2 y ago License: Permissive (Unlicense)
Support
Quality
Security
License
Reuse
t
tailwindui-crawlerby kiliman
tailwindui-crawler downloads the component HTML files locally
JavaScript 642Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
t
toolsby ghost123gg
A Python Crawler Framework
Python 636Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
s
scrapy-rotating-proxiesby TeamHG-Memex
use multiple proxies with Scrapy
Python 634Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
P
Python-Spiderby lb2281075105
豆瓣电影top250、斗鱼爬取json数据以及爬取美女图片、淘宝、有缘、CrawlSpider爬取红娘网相亲人的部分基本信息以及红娘网分布式爬取和存储redis、爬虫小demo、Selenium、爬取多点、django开发接口、爬取有缘网信息、模拟知乎登录、模拟github登录、模拟图虫网登录、爬取多点商城整站数据、爬取微信公众号历史文章、爬取微信群或者微信好友分享的文章、itchat监听指定微信公众号分享的文章
Python 632Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
scrapy-fake-useragentby alecxe
Random User-Agent middleware based on fake-useragent
Python 631Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse