My NLP datasets for Russian language
Support
Quality
Security
License
Reuse
Utterance-level Aggregation For Speaker Recognition In The Wild
Support
Quality
Security
License
Reuse
The Tensorflow code for this ACL 2018 paper: "Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms"
Support
Quality
Security
License
Reuse
[EMNLP 2020] Text Classification Using Label Names Only: A Language Model Self-Training Approach
Support
Quality
Security
License
Reuse
Project for Go Search, a search engine for finding popular and relevant packages.
Support
Quality
Security
License
Reuse
This repository maintains codes for tencent advertisement algorithm competition 2018. Our codes ranked the 3rd place in the final round.
Support
Quality
Security
License
Reuse
The code to reproduce results from paper "MultiFiT: Efficient Multi-lingual Language Model Fine-tuning" https://arxiv.org/abs/1909.04761
Support
Quality
Security
License
Reuse
A KBQA system based on DBpedia.
Support
Quality
Security
License
Reuse
Pytorch version of BERT-whitening
Support
Quality
Security
License
Reuse
New dataset
Support
Quality
Security
License
Reuse
s
simple-effective-text-matchingby alibaba-edu
Python 270 Version:Current License: Permissive (Apache-2.0)
Source code of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".
Support
Quality
Security
License
Reuse
t
text-classification-surveysby xiaoqian19940510
Python 267 Version:Current License: No License (No License)
文本分类资源汇总,包括深度学习文本分类模型,如SpanBERT、ALBERT、RoBerta、Xlnet、MT-DNN、BERT、TextGCN、MGAN、TextCapsule、SGNN、SGM、LEAM、ULMFiT、DGCNN、ELMo、RAM、DeepMoji、IAN、DPCNN、TopicRNN、LSTMN 、Multi-Task、HAN、CharCNN、Tree-LSTM、DAN、TextRCNN、Paragraph-Vec、TextCNN、DCNN、RNTN、MV-RNN、RAE等,浅层学习模型,如LightGBM 、SVM、XGboost、Random Forest、C4.5、CART、KNN、NB、HMM等。介绍文本分类数据集,如MR、SST、MPQA、IMDB、Yelp、20NG、AG、R8、DBpedia、Ohsumed、SQuAD、SNLI、MNLI、MSRP、MRDA、RCV1、AAPD,评价指标,如accuracy、Precision、Recall、F1、EM、MRR、HL、Micro-F1、Macro-F1、P@K,和技术挑战,包括多标签文本分类。
Support
Quality
Security
License
Reuse
S
Schema-based-Knowledge-Extractionby yuanxiaosc
Python 266 Version:Current License: No License (No License)
Code for http://lic2019.ccf.org.cn/kg 信息抽取。使用基于 BERT 的实体抽取和关系抽取的端到端的联合模型。
Support
Quality
Security
License
Reuse
Comparison of Chinese Named Entity Recognition Models between NeuroNER and BertNER
Support
Quality
Security
License
Reuse
A Package of Keyphrase Extraction and Social Tag Suggestion
Support
Quality
Security
License
Reuse
Codes for "TENER: Adapting Transformer Encoder for Named Entity Recognition"
Support
Quality
Security
License
Reuse
Data and software for building the ACL Anthology.
Support
Quality
Security
License
Reuse
Support
Quality
Security
License
Reuse
A trial of kbqa based on bert for NLPCC2016/2017 Task 5 (基于BERT的中文知识库问答实践,代码可跑通)
Support
Quality
Security
License
Reuse
Default English stopword lists from many different sources
Support
Quality
Security
License
Reuse
t
tensorflow-ml-nlp-tf2by NLP-kr
Jupyter Notebook 262 Version:Current License: Permissive (Apache-2.0)
텐서플로2와 머신러닝으로 시작하는 자연어처리 (로지스틱회귀부터 BERT와 GPT3까지) 실습자료
Support
Quality
Security
License
Reuse
turn natural language into structured data(支持中文,自定义了N种模型,支持不同的场景和任务)
Support
Quality
Security
License
Reuse
A Unified Model for Opinion Target Extraction and Target Sentiment Prediction (AAAI 2019)
Support
Quality
Security
License
Reuse
Support
Quality
Security
License
Reuse
NLP for human. A fast and easy-to-use natural language processing (NLP) toolkit, satisfying your imagination about NLP.
Support
Quality
Security
License
Reuse
AraVec is a pre-trained distributed word representation (word embedding) open source project which aims to provide the Arabic NLP research community with free to use and powerful word embedding models.
Support
Quality
Security
License
Reuse
dead simple python dead code detection
Support
Quality
Security
License
Reuse
Global Encoding for Abstractive Summarization (ACL 2018)
Support
Quality
Security
License
Reuse
p
pytorch-transformers-classificationby ThilinaRajapakse
Jupyter Notebook 250 Version:Current License: Permissive (Apache-2.0)
Based on the Pytorch-Transformers library by HuggingFace. To be used as a starting point for employing Transformer models in text classification tasks. Contains code to easily train BERT, XLNet, RoBERTa, and XLM models for text classification.
Support
Quality
Security
License
Reuse
Chinese Biomedical Language Understanding Evaluation benchmark (ChineseBLUE)
Support
Quality
Security
License
Reuse
Code for producing Japanese GPT-2 provided by rinna Co., Ltd.
Support
Quality
Security
License
Reuse
Ner with Bert
Support
Quality
Security
License
Reuse
all kinds of baseline models for long text classificaiton( text categorization)
Support
Quality
Security
License
Reuse
1. Use BERT, ALBERT and GPT2 as tensorflow2.0's layer. 2. Implement GCN, GAN, GIN and GraphSAGE based on message passing.
Support
Quality
Security
License
Reuse
Jack the Reader
Support
Quality
Security
License
Reuse
Attention-based Aspect-term Sentiment Analysis implemented by tensorflow.
Support
Quality
Security
License
Reuse
[ACL-IJCNLP 2021] Automated Concatenation of Embeddings for Structured Prediction
Support
Quality
Security
License
Reuse
The repo contains the code of the ACL2020 paper `Dice Loss for Data-imbalanced NLP Tasks`
Support
Quality
Security
License
Reuse
Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.
Support
Quality
Security
License
Reuse
QRNN implementation for TensorFlow
Support
Quality
Security
License
Reuse
Transformer with Untied Positional Encoding (TUPE). Code of paper "Rethinking Positional Encoding in Language Pre-training". Improve existing models like BERT.
Support
Quality
Security
License
Reuse
Score documents with pure dot product / cosine similarity with ES
Support
Quality
Security
License
Reuse
This is a repository with the code for the ACL 2019 paper "Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned" and the ACL 2021 paper "Analyzing Source and Target Contributions to NMT Predictions".
Support
Quality
Security
License
Reuse
Resources for the NAACL 2018 paper "A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents"
Support
Quality
Security
License
Reuse
BERT as language model, fork from https://github.com/google-research/bert
Support
Quality
Security
License
Reuse
MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型
Support
Quality
Security
License
Reuse
中文命名实体识别NER。用keras实现BILSTM+CRF、IDCNN+CRF、BERT+BILSTM+CRF进行实体识别。结果当然是BERT+BILSTM+CRF最好啦。
Support
Quality
Security
License
Reuse
The source code of NeurIPS 2020 paper "CogLTX: Applying BERT to Long Texts"
Support
Quality
Security
License
Reuse
BNLP is a natural language processing toolkit for Bengali Language.
Support
Quality
Security
License
Reuse
EMNLP 2020: "Dialogue Response Ranking Training with Large-Scale Human Feedback Data"
Support
Quality
Security
License
Reuse
N
NLP_Datasetsby Koziev
My NLP datasets for Russian language
C# 283Updated: 2 y ago License: Permissive (CC0-1.0)
Support
Quality
Security
License
Reuse
V
VGG-Speaker-Recognitionby WeidiXie
Utterance-level Aggregation For Speaker Recognition In The Wild
Python 282Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
S
SWEMby dinghanshen
The Tensorflow code for this ACL 2018 paper: "Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms"
Python 282Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
L
LOTClassby yumeng5
[EMNLP 2020] Text Classification Using Label Names Only: A Language Model Self-Training Approach
Python 281Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
g
gcseby daviddengcn
Project for Go Search, a search engine for finding popular and relevant packages.
Go 279Updated: 4 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
T
Tencent_Ads_Algo_2018by DiligentPanda
This repository maintains codes for tencent advertisement algorithm competition 2018. Our codes ranked the 3rd place in the final round.
Python 278Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
m
multifitby n-waves
The code to reproduce results from paper "MultiFiT: Efficient Multi-lingual Language Model Fine-tuning" https://arxiv.org/abs/1909.04761
Jupyter Notebook 277Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
g
gAnswerby pkumod
A KBQA system based on DBpedia.
Java 274Updated: 4 y ago License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
B
BERT-whitening-pytorchby autoliuweijie
Pytorch version of BERT-whitening
Python 272Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
M
Support
Quality
Security
License
Reuse
s
simple-effective-text-matchingby alibaba-edu
Source code of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".
Python 270Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
t
text-classification-surveysby xiaoqian19940510
文本分类资源汇总,包括深度学习文本分类模型,如SpanBERT、ALBERT、RoBerta、Xlnet、MT-DNN、BERT、TextGCN、MGAN、TextCapsule、SGNN、SGM、LEAM、ULMFiT、DGCNN、ELMo、RAM、DeepMoji、IAN、DPCNN、TopicRNN、LSTMN 、Multi-Task、HAN、CharCNN、Tree-LSTM、DAN、TextRCNN、Paragraph-Vec、TextCNN、DCNN、RNTN、MV-RNN、RAE等,浅层学习模型,如LightGBM 、SVM、XGboost、Random Forest、C4.5、CART、KNN、NB、HMM等。介绍文本分类数据集,如MR、SST、MPQA、IMDB、Yelp、20NG、AG、R8、DBpedia、Ohsumed、SQuAD、SNLI、MNLI、MSRP、MRDA、RCV1、AAPD,评价指标,如accuracy、Precision、Recall、F1、EM、MRR、HL、Micro-F1、Macro-F1、P@K,和技术挑战,包括多标签文本分类。
Python 267Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
S
Schema-based-Knowledge-Extractionby yuanxiaosc
Code for http://lic2019.ccf.org.cn/kg 信息抽取。使用基于 BERT 的实体抽取和关系抽取的端到端的联合模型。
Python 266Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
N
NER-Chineseby EOA-AILab
Comparison of Chinese Named Entity Recognition Models between NeuroNER and BertNER
Python 265Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
T
THUTagby thunlp
A Package of Keyphrase Extraction and Social Tag Suggestion
Java 265Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
T
TENERby fastnlp
Codes for "TENER: Adapting Transformer Encoder for Named Entity Recognition"
Python 264Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
a
acl-anthologyby acl-org
Data and software for building the ACL Anthology.
Python 264Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
soft-prompt-tuningby kipgparker
Jupyter Notebook 264Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
b
bert-kbqa-NLPCC2017by jkszw2014
A trial of kbqa based on bert for NLPCC2016/2017 Task 5 (基于BERT的中文知识库问答实践,代码可跑通)
Python 263Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
s
stopwordsby igorbrigadir
Default English stopword lists from many different sources
Python 262Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
t
tensorflow-ml-nlp-tf2by NLP-kr
텐서플로2와 머신러닝으로 시작하는 자연어처리 (로지스틱회귀부터 BERT와 GPT3까지) 실습자료
Jupyter Notebook 262Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
r
rasa_nlu_gqby GaoQ1
turn natural language into structured data(支持中文,自定义了N种模型,支持不同的场景和任务)
Python 261Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
E
E2E-TBSAby lixin4ever
A Unified Model for Opinion Target Extraction and Target Sentiment Prediction (AAAI 2019)
Python 260Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
d
deep-corefby clarkkev
Python 256Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
f
fancy-nlpby boat-group
NLP for human. A fast and easy-to-use natural language processing (NLP) toolkit, satisfying your imagination about NLP.
Python 253Updated: 3 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
a
aravecby bakrianoo
AraVec is a pre-trained distributed word representation (word embedding) open source project which aims to provide the Arabic NLP research community with free to use and powerful word embedding models.
Jupyter Notebook 253Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
d
deadby asottile
dead simple python dead code detection
Python 252Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
G
Global-Encodingby lancopku
Global Encoding for Abstractive Summarization (ACL 2018)
Python 250Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
p
pytorch-transformers-classificationby ThilinaRajapakse
Based on the Pytorch-Transformers library by HuggingFace. To be used as a starting point for employing Transformer models in text classification tasks. Contains code to easily train BERT, XLNet, RoBERTa, and XLM models for text classification.
Jupyter Notebook 250Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
C
ChineseBLUEby alibaba-research
Chinese Biomedical Language Understanding Evaluation benchmark (ChineseBLUE)
Python 249Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
j
japanese-gpt2by rinnakk
Code for producing Japanese GPT-2 provided by rinna Co., Ltd.
Python 249Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
b
Support
Quality
Security
License
Reuse
a
ai_lawby brightmart
all kinds of baseline models for long text classificaiton( text categorization)
Python 247Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
N
NLPGNNby kyzhouhzau
1. Use BERT, ALBERT and GPT2 as tensorflow2.0's layer. 2. Implement GCN, GAN, GIN and GraphSAGE based on message passing.
Python 246Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
j
Support
Quality
Security
License
Reuse
T
TD-LSTMby jimmyyfeng
Attention-based Aspect-term Sentiment Analysis implemented by tensorflow.
Python 245Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
A
ACEby Alibaba-NLP
[ACL-IJCNLP 2021] Automated Concatenation of Embeddings for Structured Prediction
Python 240Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
d
dice_loss_for_NLPby ShannonAI
The repo contains the code of the ACL2020 paper `Dice Loss for Data-imbalanced NLP Tasks`
Python 240Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
b
backpropby backprop-ai
Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.
Python 240Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
t
tensorflow_qrnnby icoxfog417
QRNN implementation for TensorFlow
Python 239Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
T
TUPEby guolinke
Transformer with Untied Positional Encoding (TUPE). Code of paper "Rethinking Positional Encoding in Language Pre-training". Improve existing models like BERT.
Python 238Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
e
elasticsearch-vector-scoringby MLnick
Score documents with pure dot product / cosine similarity with ES
Java 237Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
t
the-story-of-headsby lena-voita
This is a repository with the code for the ACL 2019 paper "Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned" and the ACL 2021 paper "Analyzing Source and Target Contributions to NMT Predictions".
Python 236Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
l
long-summarizationby armancohan
Resources for the NAACL 2018 paper "A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents"
Python 235Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
b
bert-as-language-modelby xu-song
BERT as language model, fork from https://github.com/google-research/bert
Python 234Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
m
monpaby monpa-team
MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型
Python 234Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
N
NER_DEMOby CLOVEXCWZ
中文命名实体识别NER。用keras实现BILSTM+CRF、IDCNN+CRF、BERT+BILSTM+CRF进行实体识别。结果当然是BERT+BILSTM+CRF最好啦。
Python 233Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
C
CogLTXby Sleepychord
The source code of NeurIPS 2020 paper "CogLTX: Applying BERT to Long Texts"
Python 231Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
b
bnlpby sagorbrur
BNLP is a natural language processing toolkit for Bengali Language.
Jupyter Notebook 231Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
D
DialogRPTby golsun
EMNLP 2020: "Dialogue Response Ranking Training with Large-Scale Human Feedback Data"
Python 229Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse