Crawler Libraries - Page 3

Pxerby FoXZilla

JavaScript 952 Version:Current
License: Permissive (MIT)

A tool for pixiv.net. 人人可用的P站爬虫

Support

Quality

Security

License

Reuse

ServiceWrapper_WebCrawler_GUI_NoCode_Spiderby NaiboWang

JavaScript 950 Version:Current
License: Strong Copyleft (GPL-3.0)

A web crawler/spider which can be used without writing any code with GUI (Service Wrapper)一个可以无代码可视化设计和执行的面向服务架构的爬虫软件

Support

Quality

Security

License

Reuse

crawler-user-agentsby monperrus

Python 949 Version:Current
License: Permissive (MIT)

Syntactic patterns of HTTP user-agents used by bots / robots / crawlers / scrapers / spiders. pull-request welcome :star:

Support

Quality

Security

License

Reuse

lxSpiderby lixi5338619

Python 949 Version:Current
License: Strong Copyleft (GPL-3.0)

爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、微信、阿里、头条、pdd、优酷、爱奇艺、携程、12306、58、搜狐、各种指数、维普万方、Zlibraty、Oalib、小说、招标网、采购网、小红书、大众点评、推特、脉脉、知乎》

Support

Quality

Security

License

Reuse

JSpiderby EnjoyScraping

JavaScript 934 Version:Current
License: No License (No License)

JSpider会每周更新至少一个网站的JS解密方式，欢迎 Star，交流微信：13298307816

Support

Quality

Security

License

Reuse

Spiderby buckyroberts

Python 926 Version:Current
License: No License (No License)

Python website crawler.

Support

Quality

Security

License

Reuse

NewPipeExtractorby TeamNewPipe

Java 913 Version:Current
License: Strong Copyleft (GPL-3.0)

NewPipe's core library for extracting data from streaming sites

Support

Quality

Security

License

Reuse

Camtdby jae-jae

JavaScript 907 Version:Current
License: No License (No License)

Chrome multi-threaded download manager extension,based on Aria2 and AriaNg. Chrome多线程下载扩展。

Support

Quality

Security

License

Reuse

FunpySpiderSearchEngineby mtianyan

Python 906 Version:Current
License: Permissive (MIT)

Word2vec 千人千面个性化搜索 + Scrapy2.3.0(爬取数据) + ElasticSearch7.9.1(存储数据并提供对外Restful API) + Django3.1.1 搜索

Support

Quality

Security

License

Reuse

Java 896 Version:Current
License: Proprietary (Proprietary)

zhihu-crawler是一个基于Java的高性能、支持免费http代理池、支持横向扩展、分布式爬虫项目

Support

Quality

Security

License

Reuse

sukhoiby untwisted

Python 879 Version:Current
License: Permissive (Apache-2.0)

Minimalist and powerful Web Crawler.

Support

Quality

Security

License

Reuse

linkedin2usernameby initstring

Python 879 Version:Current
License: Permissive (MIT)

OSINT Tool: Generate username lists for companies on LinkedIn

Support

Quality

Security

License

Reuse

Python 878 Version:Current
License: Strong Copyleft (GPL-3.0)

哔哩哔哩常用API调用。支持视频、番剧、用户、频道、音频等功能。原仓库地址：https://github.com/MoyuScript/bilibili-api

Support

Quality

Security

License

Reuse

magnet-dhtby chenjiandongx

Python 869 Version:Current
License: Permissive (MIT)

✌️ Python3 BitTorrent DHT crawler

Support

Quality

Security

License

Reuse

instagram-profilecrawlby timgrossmann

Python 849 Version:Current
License: Permissive (MIT)

📝 quickly crawl the information (e.g. followers, tags etc...) of an instagram profile.

Support

Quality

Security

License

Reuse

Python 829 Version:Current
License: Strong Copyleft (GPL-3.0)

OnionSearch is a script that scrapes urls on different .onion search engines.

Support

Quality

Security

License

Reuse

C++ 820 Version:Current
License: Permissive (MIT)

Thread pool implementation using c++11 threads

Support

Quality

Security

License

Reuse

nginx-badbot-blockerby mariusv

Shell 806 Version:Current
License: No License (No License)

Block bad, possibly even malicious web crawlers (automated bots) using Nginx

Support

Quality

Security

License

Reuse

scrapy-seleniumby clemfromspace

Python 792 Version:Current
License: Permissive (WTFPL)

Scrapy middleware to handle javascript pages using selenium

Support

Quality

Security

License

Reuse

JavaScript 775 Version:Current
License: No License (No License)

export thunder lixian url to aria2/wget

Support

Quality

Security

License

Reuse

Python 774 Version:Current
License: Permissive (Apache-2.0)

python爬虫

Support

Quality

Security

License

Reuse

go-dorkby dwisiswant0

Go 774 Version:Current
License: Permissive (MIT)

The fastest dork scanner written in Go.

Support

Quality

Security

License

Reuse

Go 774 Version:Current
License: Permissive (Apache-2.0)

:paw_prints: Creeper - The Next Generation Crawler Framework (Go)

Support

Quality

Security

License

Reuse

holiday-cnby NateScarlet

Python 772 Version:Current
License: Permissive (MIT)

📅🇨🇳中国法定节假日数据自动每日抓取国务院公告

Support

Quality

Security

License

Reuse

Python 765 Version:Current
License: Permissive (MIT)

一个超级轻量的百度图片爬虫

Support

Quality

Security

License

Reuse

fetchbotby PuerkitoBio

Go 760 Version:Current
License: Permissive (BSD-3-Clause)

A simple and flexible web crawler that follows the robots.txt policies and crawl delays.

Support

Quality

Security

License

Reuse

Python 756 Version:Current
License: Strong Copyleft (GPL-3.0)

Possibly the best practice of Scrapy 🕷 and renting a house 🏡

Support

Quality

Security

License

Reuse

spidrby postmodern

Ruby 753 Version:Current
License: Permissive (MIT)

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

Support

Quality

Security

License

Reuse

icrawlerby hellock

Python 747 Version:Current
License: Permissive (MIT)

A multi-thread crawler framework with many builtin image crawlers provided.

Support

Quality

Security

License

Reuse

BaiduSpiderby BaiduSpider

Python 746 Version:Current
License: Strong Copyleft (GPL-3.0)

BaiduSpider，一个爬取百度搜索结果的爬虫，目前支持百度网页搜索，百度图片搜索，百度知道搜索，百度视频搜索，百度资讯搜索，百度文库搜索，百度经验搜索和百度百科搜索。

Support

Quality

Security

License

Reuse

zhihu-spiderby MorganZhang100

Python 739 Version:Current
License: Permissive (MIT)

A web spider for zhihu.com

Support

Quality

Security

License

Reuse

HTML 700 Version:Current
License: Permissive (MIT)

计算机专业系统性学习资料（python,c,c++,计算机组成，计算机网络，编译原理，电路，谷歌插件，爬虫）

Support

Quality

Security

License

Reuse

scrapyrtby scrapinghub

Python 692 Version:Current
License: Permissive (BSD-3-Clause)

HTTP API for Scrapy spiders

Support

Quality

Security

License

Reuse

Go 691 Version:Current
License: Strong Copyleft (GPL-3.0)

一个方便安全研究人员获取每日安全日报的爬虫和推送程序，目前爬取范围包括先知社区、安全客、Seebug Paper、跳跳糖、奇安信攻防社区、棱角社区以及绿盟、腾讯玄武、天融信、360等实验室博客，持续更新中。

Support

Quality

Security

License

Reuse

bililiby SigureMo

Python 688 Version:Current
License: Strong Copyleft (GPL-3.0)

:beers: bilibili video (including bangumi) and danmaku downloader | B站视频（含番剧）、弹幕下载器

Support

Quality

Security

License

Reuse

systemshockby Interrupt

C 677 Version:Current
License: Strong Copyleft (GPL-3.0)

Shockolate - A minimalist and cross platform System Shock source port.

Support

Quality

Security

License

Reuse

PyPttby PyPtt

Python 674 Version:Current
License: Weak Copyleft (LGPL-3.0)

直接連線登入的 PTT library，支援 PTT, PTT2

Support

Quality

Security

License

Reuse

xeHentaiby fffonion

Python 673 Version:Current
License: Strong Copyleft (GPL-3.0)

Doujinshi downloader 绅士漫画下载

Support

Quality

Security

License

Reuse

TOP250movie_doubanby iphysresearch

Jupyter Notebook 673 Version:Current
License: Permissive (BSD-2-Clause)

TOP250豆瓣电影短评：Scrapy 爬虫+数据清理/分析+构建中文文本情感分析模型

Support

Quality

Security

License

Reuse

Python 670 Version:Current
License: Strong Copyleft (GPL-3.0)

Email addresses harvester

Support

Quality

Security

License

Reuse

PornHub-downloader-pythonby mariosemes

Python 661 Version:Current
License: Strong Copyleft (GPL-3.0)

Download stuff from PH the easy way.

Support

Quality

Security

License

Reuse

Java 658 Version:Current
License: Permissive (Apache-2.0)

A distributed web crawler framework.（分布式爬虫框架XXL-CRAWLER）

Support

Quality

Security

License

Reuse

Python 645 Version:Current
License: Strong Copyleft (GPL-3.0)

Utilities for dealing with Tumblr blogs, Tumblr backup

Support

Quality

Security

License

Reuse

stealerby moyada

Python 645 Version:Current
License: Permissive (MIT)

抖音、快手、火山、皮皮虾，视频去水印程序

Support

Quality

Security

License

Reuse

isbotby omrilotan

JavaScript 645 Version:Current
License: Permissive (Unlicense)

🤖/👨‍🦰 Detect bots/crawlers/spiders using the user agent string

Support

Quality

Security

License

Reuse

JavaScript 642 Version:Current
License: Permissive (MIT)

tailwindui-crawler downloads the component HTML files locally

Support

Quality

Security

License

Reuse

toolsby ghost123gg

Python 636 Version:Current
License: No License (No License)

A Python Crawler Framework

Support

Quality

Security

License

Reuse

scrapy-rotating-proxiesby TeamHG-Memex

Python 634 Version:Current
License: Permissive (MIT)

use multiple proxies with Scrapy

Support

Quality

Security

License

Reuse

Python-Spiderby lb2281075105

Python 632 Version:Current
License: Permissive (Apache-2.0)

豆瓣电影top250、斗鱼爬取json数据以及爬取美女图片、淘宝、有缘、CrawlSpider爬取红娘网相亲人的部分基本信息以及红娘网分布式爬取和存储redis、爬虫小demo、Selenium、爬取多点、django开发接口、爬取有缘网信息、模拟知乎登录、模拟github登录、模拟图虫网登录、爬取多点商城整站数据、爬取微信公众号历史文章、爬取微信群或者微信好友分享的文章、itchat监听指定微信公众号分享的文章

Support

Quality

Security

License

Reuse

scrapy-fake-useragentby alecxe

Python 631 Version:Current
License: Permissive (MIT)

Random User-Agent middleware based on fake-useragent

Support

Quality

Security

License

Reuse

Pxerby FoXZilla

A tool for pixiv.net. 人人可用的P站爬虫

JavaScript

952

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

ServiceWrapper_WebCrawler_GUI_NoCode_Spiderby NaiboWang

A web crawler/spider which can be used without writing any code with GUI (Service Wrapper)一个可以无代码可视化设计和执行的面向服务架构的爬虫软件

JavaScript

950

Updated: 2 y ago

License: Strong Copyleft (GPL-3.0)

Support

Quality

Security

License

Reuse

crawler-user-agentsby monperrus

Syntactic patterns of HTTP user-agents used by bots / robots / crawlers / scrapers / spiders. pull-request welcome :star:

Python

949

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

lxSpiderby lixi5338619

爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、微信、阿里、头条、pdd、优酷、爱奇艺、携程、12306、58、搜狐、各种指数、维普万方、Zlibraty、Oalib、小说、招标网、采购网、小红书、大众点评、推特、脉脉、知乎》

Python

949

Updated: 2 y ago

License: Strong Copyleft (GPL-3.0)

Support

Quality

Security

License

Reuse

JSpiderby EnjoyScraping

JSpider会每周更新至少一个网站的JS解密方式，欢迎 Star，交流微信：13298307816

JavaScript

934

Updated: 4 y ago

License: No License (No License)

Support

Quality

Security

License

Reuse

Spiderby buckyroberts

Python website crawler.

Python

926

Updated: 2 y ago

License: No License (No License)

Support

Quality

Security

License

Reuse

NewPipeExtractorby TeamNewPipe

NewPipe's core library for extracting data from streaming sites

Java

913

Updated: 2 y ago

License: Strong Copyleft (GPL-3.0)

Support

Quality

Security

License

Reuse

Camtdby jae-jae

Chrome multi-threaded download manager extension,based on Aria2 and AriaNg. Chrome多线程下载扩展。

JavaScript

907

Updated: 2 y ago

License: No License (No License)

Support

Quality

Security

License

Reuse

FunpySpiderSearchEngineby mtianyan

Word2vec 千人千面个性化搜索 + Scrapy2.3.0(爬取数据) + ElasticSearch7.9.1(存储数据并提供对外Restful API) + Django3.1.1 搜索

Python

906

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

zhihu-crawlerby wycm

zhihu-crawler是一个基于Java的高性能、支持免费http代理池、支持横向扩展、分布式爬虫项目

Java

896

Updated: 4 y ago

License: Proprietary (Proprietary)

Support

Quality

Security

License

Reuse

sukhoiby untwisted

Minimalist and powerful Web Crawler.

Python

879

Updated: 4 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

linkedin2usernameby initstring

OSINT Tool: Generate username lists for companies on LinkedIn

Python

879

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

bilibili-apiby Nemo2011

哔哩哔哩常用API调用。支持视频、番剧、用户、频道、音频等功能。原仓库地址：https://github.com/MoyuScript/bilibili-api

Python

878

Updated: 2 y ago

License: Strong Copyleft (GPL-3.0)

Support

Quality

Security

License

Reuse

magnet-dhtby chenjiandongx

✌️ Python3 BitTorrent DHT crawler

Python

869

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

instagram-profilecrawlby timgrossmann

📝 quickly crawl the information (e.g. followers, tags etc...) of an instagram profile.

Python

849

Updated: 4 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

OnionSearchby megadose

OnionSearch is a script that scrapes urls on different .onion search engines.

Python

829

Updated: 2 y ago

License: Strong Copyleft (GPL-3.0)

Support

Quality

Security

License

Reuse

thread-poolby mtrebi

Thread pool implementation using c++11 threads

C++

820

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

nginx-badbot-blockerby mariusv

Block bad, possibly even malicious web crawlers (automated bots) using Nginx

Shell

806

Updated: 4 y ago

License: No License (No License)

Support

Quality

Security

License

Reuse

scrapy-seleniumby clemfromspace

Scrapy middleware to handle javascript pages using selenium

Python

792

Updated: 2 y ago

License: Permissive (WTFPL)

Support

Quality

Security

License

Reuse

ThunderLixianExporterby binux

export thunder lixian url to aria2/wget

JavaScript

775

Updated: 4 y ago

License: No License (No License)

Support

Quality

Security

License

Reuse

spider_pythonby xingag

python爬虫

Python

774

Updated: 2 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

go-dorkby dwisiswant0

The fastest dork scanner written in Go.

774

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

creeperby wspl

:paw_prints: Creeper - The Next Generation Crawler Framework (Go)

774

Updated: 2 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

holiday-cnby NateScarlet

📅🇨🇳中国法定节假日数据自动每日抓取国务院公告

Python

772

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

BaiduImageSpiderby kong36088

一个超级轻量的百度图片爬虫

Python

765

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

fetchbotby PuerkitoBio

A simple and flexible web crawler that follows the robots.txt policies and crawl delays.

760

Updated: 4 y ago

License: Permissive (BSD-3-Clause)

Support

Quality

Security

License

Reuse

house-rentingby kezhenxu94

Possibly the best practice of Scrapy 🕷 and renting a house 🏡

Python

756

Updated: 4 y ago

License: Strong Copyleft (GPL-3.0)

Support

Quality

Security

License

Reuse

spidrby postmodern

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

Ruby

753

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

icrawlerby hellock

A multi-thread crawler framework with many builtin image crawlers provided.

Python

747

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

BaiduSpiderby BaiduSpider

BaiduSpider，一个爬取百度搜索结果的爬虫，目前支持百度网页搜索，百度图片搜索，百度知道搜索，百度视频搜索，百度资讯搜索，百度文库搜索，百度经验搜索和百度百科搜索。

Python

746

Updated: 2 y ago

License: Strong Copyleft (GPL-3.0)

Support

Quality

Security

License

Reuse

zhihu-spiderby MorganZhang100

A web spider for zhihu.com

Python

739

Updated: 4 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

ComputerStudentby sfvsfv

计算机专业系统性学习资料（python,c,c++,计算机组成，计算机网络，编译原理，电路，谷歌插件，爬虫）

HTML

700

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

scrapyrtby scrapinghub

HTTP API for Scrapy spiders

Python

692

Updated: 3 y ago

License: Permissive (BSD-3-Clause)

Support

Quality

Security

License

Reuse

SecCrawlerby Le0nsec

一个方便安全研究人员获取每日安全日报的爬虫和推送程序，目前爬取范围包括先知社区、安全客、Seebug Paper、跳跳糖、奇安信攻防社区、棱角社区以及绿盟、腾讯玄武、天融信、360等实验室博客，持续更新中。

691

Updated: 2 y ago

License: Strong Copyleft (GPL-3.0)

Support

Quality

Security

License

Reuse

bililiby SigureMo

:beers: bilibili video (including bangumi) and danmaku downloader | B站视频（含番剧）、弹幕下载器

Python

688

Updated: 3 y ago

License: Strong Copyleft (GPL-3.0)

Support

Quality

Security

License

Reuse

systemshockby Interrupt

Shockolate - A minimalist and cross platform System Shock source port.

677

Updated: 2 y ago

License: Strong Copyleft (GPL-3.0)

Support

Quality

Security

License

Reuse

PyPttby PyPtt

直接連線登入的 PTT library，支援 PTT, PTT2

Python

674

Updated: 2 y ago

License: Weak Copyleft (LGPL-3.0)

Support

Quality

Security

License

Reuse

xeHentaiby fffonion

Doujinshi downloader 绅士漫画下载

Python

673

Updated: 2 y ago

License: Strong Copyleft (GPL-3.0)

Support

Quality

Security

License

Reuse

TOP250movie_doubanby iphysresearch

TOP250豆瓣电影短评：Scrapy 爬虫+数据清理/分析+构建中文文本情感分析模型

Jupyter Notebook

673

Updated: 2 y ago

License: Permissive (BSD-2-Clause)

Support

Quality

Security

License

Reuse

EmailHarvesterby maldevel

Email addresses harvester

Python

670

Updated: 2 y ago

License: Strong Copyleft (GPL-3.0)

Support

Quality

Security

License

Reuse

PornHub-downloader-pythonby mariosemes

Download stuff from PH the easy way.

Python

661

Updated: 2 y ago

License: Strong Copyleft (GPL-3.0)

Support

Quality

Security

License

Reuse

xxl-crawlerby xuxueli

A distributed web crawler framework.（分布式爬虫框架XXL-CRAWLER）

Java

658

Updated: 2 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

tumblr-utilsby bbolli

Utilities for dealing with Tumblr blogs, Tumblr backup

Python

645

Updated: 2 y ago

License: Strong Copyleft (GPL-3.0)

Support

Quality

Security

License

Reuse

stealerby moyada

抖音、快手、火山、皮皮虾，视频去水印程序

Python

645

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

isbotby omrilotan

🤖/👨‍🦰 Detect bots/crawlers/spiders using the user agent string

JavaScript

645

Updated: 2 y ago

License: Permissive (Unlicense)

Support

Quality

Security

License

Reuse

tailwindui-crawlerby kiliman

tailwindui-crawler downloads the component HTML files locally

JavaScript

642

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

toolsby ghost123gg

A Python Crawler Framework

Python

636

Updated: 4 y ago

License: No License (No License)

Support

Quality

Security

License

Reuse

scrapy-rotating-proxiesby TeamHG-Memex

use multiple proxies with Scrapy

Python

634

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

Python-Spiderby lb2281075105

豆瓣电影top250、斗鱼爬取json数据以及爬取美女图片、淘宝、有缘、CrawlSpider爬取红娘网相亲人的部分基本信息以及红娘网分布式爬取和存储redis、爬虫小demo、Selenium、爬取多点、django开发接口、爬取有缘网信息、模拟知乎登录、模拟github登录、模拟图虫网登录、爬取多点商城整站数据、爬取微信公众号历史文章、爬取微信群或者微信好友分享的文章、itchat监听指定微信公众号分享的文章

Python

632

Updated: 4 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

scrapy-fake-useragentby alecxe

Random User-Agent middleware based on fake-useragent

Python

631

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

Open Weaver – Develop Applications Faster with Open Source

Terms
Privacy policy

Crawler Libraries - Page 3

Pxerby FoXZilla

JavaScript 952 Version:Current License: Permissive (MIT)

A tool for pixiv.net. 人人可用的P站爬虫

ServiceWrapper_WebCrawler_GUI_NoCode_Spiderby NaiboWang

JavaScript 950 Version:Current License: Strong Copyleft (GPL-3.0)

A web crawler/spider which can be used without writing any code with GUI (Service Wrapper)一个可以无代码可视化设计和执行的面向服务架构的爬虫软件

crawler-user-agentsby monperrus

Python 949 Version:Current License: Permissive (MIT)

Syntactic patterns of HTTP user-agents used by bots / robots / crawlers / scrapers / spiders. pull-request welcome :star:

lxSpiderby lixi5338619

Python 949 Version:Current License: Strong Copyleft (GPL-3.0)

JSpiderby EnjoyScraping

JavaScript 934 Version:Current License: No License (No License)

JSpider会每周更新至少一个网站的JS解密方式，欢迎 Star，交流微信：13298307816

Spiderby buckyroberts

Python 926 Version:Current License: No License (No License)

Python website crawler.

NewPipeExtractorby TeamNewPipe

Java 913 Version:Current License: Strong Copyleft (GPL-3.0)

NewPipe's core library for extracting data from streaming sites

Camtdby jae-jae

JavaScript 907 Version:Current License: No License (No License)

Chrome multi-threaded download manager extension,based on Aria2 and AriaNg. Chrome多线程下载扩展。

FunpySpiderSearchEngineby mtianyan

Python 906 Version:Current License: Permissive (MIT)

Word2vec 千人千面 个性化搜索 + Scrapy2.3.0(爬取数据) + ElasticSearch7.9.1(存储数据并提供对外Restful API) + Django3.1.1 搜索

zhihu-crawlerby wycm

Java 896 Version:Current License: Proprietary (Proprietary)

zhihu-crawler是一个基于Java的高性能、支持免费http代理池、支持横向扩展、分布式爬虫项目

sukhoiby untwisted

Python 879 Version:Current License: Permissive (Apache-2.0)

Minimalist and powerful Web Crawler.

linkedin2usernameby initstring

Python 879 Version:Current License: Permissive (MIT)

OSINT Tool: Generate username lists for companies on LinkedIn

bilibili-apiby Nemo2011

Python 878 Version:Current License: Strong Copyleft (GPL-3.0)

哔哩哔哩常用API调用。支持视频、番剧、用户、频道、音频等功能。原仓库地址：https://github.com/MoyuScript/bilibili-api

magnet-dhtby chenjiandongx

Python 869 Version:Current License: Permissive (MIT)

✌️ Python3 BitTorrent DHT crawler

instagram-profilecrawlby timgrossmann

Python 849 Version:Current License: Permissive (MIT)

📝 quickly crawl the information (e.g. followers, tags etc...) of an instagram profile.

OnionSearchby megadose

Python 829 Version:Current License: Strong Copyleft (GPL-3.0)

OnionSearch is a script that scrapes urls on different .onion search engines.

thread-poolby mtrebi

C++ 820 Version:Current License: Permissive (MIT)

Thread pool implementation using c++11 threads

nginx-badbot-blockerby mariusv

Shell 806 Version:Current License: No License (No License)

Block bad, possibly even malicious web crawlers (automated bots) using Nginx

scrapy-seleniumby clemfromspace

Python 792 Version:Current License: Permissive (WTFPL)

Scrapy middleware to handle javascript pages using selenium

ThunderLixianExporterby binux

JavaScript 775 Version:Current License: No License (No License)

export thunder lixian url to aria2/wget

spider_pythonby xingag

Python 774 Version:Current License: Permissive (Apache-2.0)

python爬虫

go-dorkby dwisiswant0

Go 774 Version:Current License: Permissive (MIT)

The fastest dork scanner written in Go.

creeperby wspl

Go 774 Version:Current License: Permissive (Apache-2.0)

:paw_prints: Creeper - The Next Generation Crawler Framework (Go)

holiday-cnby NateScarlet

Python 772 Version:Current License: Permissive (MIT)

📅🇨🇳中国法定节假日数据 自动每日抓取国务院公告

BaiduImageSpiderby kong36088

Python 765 Version:Current License: Permissive (MIT)

一个超级轻量的百度图片爬虫

fetchbotby PuerkitoBio

Go 760 Version:Current License: Permissive (BSD-3-Clause)

A simple and flexible web crawler that follows the robots.txt policies and crawl delays.

house-rentingby kezhenxu94

Python 756 Version:Current License: Strong Copyleft (GPL-3.0)

JavaScript 952 Version:Current
License: Permissive (MIT)

JavaScript 950 Version:Current
License: Strong Copyleft (GPL-3.0)

Python 949 Version:Current
License: Permissive (MIT)

Python 949 Version:Current
License: Strong Copyleft (GPL-3.0)

JavaScript 934 Version:Current
License: No License (No License)

Python 926 Version:Current
License: No License (No License)

Java 913 Version:Current
License: Strong Copyleft (GPL-3.0)

JavaScript 907 Version:Current
License: No License (No License)

Python 906 Version:Current
License: Permissive (MIT)

Word2vec 千人千面个性化搜索 + Scrapy2.3.0(爬取数据) + ElasticSearch7.9.1(存储数据并提供对外Restful API) + Django3.1.1 搜索

Java 896 Version:Current
License: Proprietary (Proprietary)

Python 879 Version:Current
License: Permissive (Apache-2.0)

Python 879 Version:Current
License: Permissive (MIT)

Python 878 Version:Current
License: Strong Copyleft (GPL-3.0)

Python 869 Version:Current
License: Permissive (MIT)

Python 849 Version:Current
License: Permissive (MIT)

Python 829 Version:Current
License: Strong Copyleft (GPL-3.0)

C++ 820 Version:Current
License: Permissive (MIT)

Shell 806 Version:Current
License: No License (No License)

Python 792 Version:Current
License: Permissive (WTFPL)

JavaScript 775 Version:Current
License: No License (No License)

Python 774 Version:Current
License: Permissive (Apache-2.0)

Go 774 Version:Current
License: Permissive (MIT)

Go 774 Version:Current
License: Permissive (Apache-2.0)

Python 772 Version:Current
License: Permissive (MIT)

📅🇨🇳中国法定节假日数据自动每日抓取国务院公告

Python 765 Version:Current
License: Permissive (MIT)

Go 760 Version:Current
License: Permissive (BSD-3-Clause)

Python 756 Version:Current
License: Strong Copyleft (GPL-3.0)

Ruby 753 Version:Current
License: Permissive (MIT)

Python 747 Version:Current
License: Permissive (MIT)

Python 746 Version:Current
License: Strong Copyleft (GPL-3.0)

Python 739 Version:Current
License: Permissive (MIT)

HTML 700 Version:Current
License: Permissive (MIT)

Python 692 Version:Current
License: Permissive (BSD-3-Clause)

Go 691 Version:Current
License: Strong Copyleft (GPL-3.0)

Python 688 Version:Current
License: Strong Copyleft (GPL-3.0)

C 677 Version:Current
License: Strong Copyleft (GPL-3.0)

Python 674 Version:Current
License: Weak Copyleft (LGPL-3.0)

Python 673 Version:Current
License: Strong Copyleft (GPL-3.0)

Jupyter Notebook 673 Version:Current
License: Permissive (BSD-2-Clause)

Python 670 Version:Current
License: Strong Copyleft (GPL-3.0)

Python 661 Version:Current
License: Strong Copyleft (GPL-3.0)

Java 658 Version:Current
License: Permissive (Apache-2.0)

Python 645 Version:Current
License: Strong Copyleft (GPL-3.0)

Python 645 Version:Current
License: Permissive (MIT)

JavaScript 645 Version:Current
License: Permissive (Unlicense)

JavaScript 642 Version:Current
License: Permissive (MIT)

Python 636 Version:Current
License: No License (No License)

Python 634 Version:Current
License: Permissive (MIT)

Python 632 Version:Current
License: Permissive (Apache-2.0)

Python 631 Version:Current
License: Permissive (MIT)