10 Best Python Language Detection Libraries 2023
by Kanika Maheshwari Updated: Feb 10, 2023
Guide Kit
Here are some famous Python Language Detection Libraries. Python Language Detection Libraries' use cases include Text Classification, Natural Language Processing, Spell Checking, Text Summarization, and Machine Translation.
Python language detection libraries are libraries of code that are used to detect the language of a particular text. They can be used to determine the language of a given piece of text, such as a web page, tweet, or document.
Let us look at the libraries in detail below.
funNLP
- Uses a probabilistic approach to language detection.
- Supports multilingual text detection.
- Detect languages based on only a few words or characters.
funNLPby fighting41love
Chinese and English sensitive words, language detection, Chinese and foreign mobile phone/telephone attribution/operator query, name inference gender, mobile phone number extraction, ID card extraction, mailbox extraction, Chinese and Japanese name database, Chinese abbreviation database, word split dictionary, vocabulary emotional value, Stop words, anti-verb list, violent word list, traditional and simplified conversion, English simulated Chinese pronunciation, Wang Feng lyrics generator, professional name thesaurus, synonyms, antonyms, negative thesaurus, car brand thesaurus, auto parts words Database, continuous English cutting, various Chinese word vectors, company name encyclopedia, ancient poetry thesaurus, IT thesaurus, financial and economics thesaurus, idiom thesaurus, place names, historical celebrity thesaurus, poetry thesaurus, medical thesaurus, diet Thesaurus, legal thesaurus, car thesaurus, animal thesaurus, Chinese chat corpus, Chinese rumor data, Baidu Chinese question and answer dataset, sentence similarity matching algorithm collection, bert resources, text generation & abstract related tools, cocoNLP information extraction tools , Domestic phone number regular matching, Tsinghua University XLORE: Chinese-English cross-language encyclopedia knowledge map, Tsinghua University artificial intelligence technology series reports, natural language generation, NLU is too difficult series, automatic couplet data and robots, user name blacklist list, crime Legal terminology and classification model, WeChat official account corpus, cs224n deep learning natural language processing course, Chinese handwritten Chinese character recognition, Chinese natural language processing corpus/dataset, variable naming artifact, word segmentation corpus + code, task-based dialogue English data set, ASR Speech data set + Chinese speech recognition system based on deep learning, laughter detector, Microsoft multilingual number/unit/such as date and time recognition package, Zhonghua Xinhua dictionary database and api (including commonly used Xiehouyu, idioms, words and Chinese characters), documents Automatic generation of graphs, SpaCy Chinese model, new version of Common Voice speech recognition dataset, neural network relation extraction, bert-based named entity recognition, keyword (Keyphrase) extraction package pke, question answering system based on knowledge graphs in the medical field, based on dependency syntax and Event triplet extraction for semantic role annotation, dependency syntax analysis of 40,000 sentences of high-quality annotation data, cnocr: Python3 package for Chinese OCR, Chinese character relationship knowledge map project, Chinese nlp competition project and code summary, Chinese character data 、speech-aligner: A tool for generating phoneme-level time alignment annotations from "human voice speech" and its "language text", AmpliGraph: knowledge map representation learning (Python) library: knowledge map concept link prediction, Scattertext text visualization (python), Language/knowledge representation tools: BERT & ERNIE, a summary of the differences between Chinese and English natural language processing NLP, Synonyms Chinese synonyms toolkit, HarvestText field adaptive text mining tools (new word discovery-sentiment analysis-entity linking, etc.), word2word:( Python on) Easy-to-use multilingual word-word pair set: 62 languages/3,564 multilingual pairs, speech recognition corpus generation tool: create automatic speech recognition (ASR) corpus from online videos with audio/subtitles, build medical entities Recognition model (including dictionaries and corpus annotations), single-document unsupervised keyword extraction, gpt-2 language model used in Kashgari, open source financial investment data extraction tool, text automatic summarization library TextTeaser: only supports English, People's Daily corpus Processing tool set, some basic models about natural language, question and answer attempt based on 14W song knowledge base - functions include lyrics solitaire and known lyrics to find songs and question and answer of song singer lyrics triangle relationship, similar sentence judgment model based on Siamese bilstm model It also provides training data sets and test data sets, automatic generation of comments based on Hacker News article titles implemented by the Transformer codec model, template codes for sequence labeling and text classification with BERT, LitBank: NLP data sets - supporting natural language processing and 100 labeled English novel corpora for computational humanities tasks, Baidu open source benchmark information extraction system, fake news dataset, Facebook: LAMA language model analysis, providing unified access to Transformer-XL/BERT/ELMo/GPT pre-trained language models Interface, CommonsenseQA: Common sense-oriented English QA challenges, Chinese knowledge map materials, data and tools, technical documents PDF or PPT shared by the big cows in major companies, natural language generation SQL statements (English), Chinese NLP data enhancement (EDA ) tool, English NLP data enhancement tool, intelligent question answering system based on medical knowledge graph, Jingdong product knowledge graph, question answering project based on mongodb storage military domain knowledge graph, Chinese relationship extraction based on remote supervision, speech sentiment analysis, Chinese ULMFiT-emotion Analysis-text classification-corpus and models, a photo-taking program, a large-scale name database from all over the world, a Chinese chat robot trained by using the interesting Chinese corpus qingyun, a Chinese chat robot seqGAN, provincial, municipal and town administrative division data with pinyin annotations , Education industry news corpus includes automatic summarization function, open dialogue robot-knowledge map-semantic understanding-natural language processing tools and data, Chinese knowledge map: based on Baidu Encyclopedia Chinese page-extract triple information-build Chinese knowledge map, masr : Chinese Speech Recognition-provide pre-training model-high recognition rate, Python audio data augmentation library, Chinese full word coverage BERT and two reading comprehension data, ConvLab: open source multi-domain end-to-end dialogue system platform, Chinese natural language processing data Set, dialogue system based on the latest version of rasa, pipeline entity based on TensorFlow and BERT And relation extraction, a small securities knowledge map/knowledge base, review of the TOP scheme of all NLP competitions, OpenCLaP: multi-domain open source Chinese pre-training language model warehouse, UER: Chinese pre-training based on different corpus + encoder + target tasks Model warehouse, collection of Chinese natural language processing vectors, chatbots based on the financial-judicial domain (with the nature of chatting), g2pC: context-based Chinese pronunciation automatic tagging module, Zincbase knowledge graph construction toolkit, poetry quality evaluation/fine-grained emotion Poetry corpus, rapid conversion of "Chinese numerals" and "Arabic numerals", Baidu know question and answer corpus, question answering system based on knowledge graph, jieba_fast accelerated version of jieba, regular expression tutorial, Chinese reading comprehension dataset, BERT and other latest language models Extractive summary extraction, Python's comprehensive guide to text summarization using deep learning, knowledge map deep learning related data collation, Wikipedia large-scale parallel text corpus, StanfordNLP 0.2.0: pure Python version of natural language processing package, NeuralNLP-NeuralClassifier: Tencent Open source deep learning text classification tool, end-to-end closed-domain dialogue system, Chinese named entity recognition: NeuroNER vs. BertNER, news event clue extraction, Baidu triple extraction competition in 2019: "Science Space Team" source code, based on dependency syntax Open domain text knowledge triplet extraction and knowledge base construction, Chinese GPT2 training code, ML-NLP - knowledge points and code implementations often tested in Machine Learning (Machine Learning) NLP interviews, nlp4han: Chinese natural language processing tool set (Sentence segmentation/word segmentation/part-of-speech tagging/chunking/syntactic analysis/semantic analysis/NER/N-gram/HMM/pronoun resolution/sentiment analysis/spelling check, XLM: Facebook's cross-language pre-training language model, using BERT-based fine-tuning and feature extraction method to extract attributes of Baidu Encyclopedia characters in knowledge graphs, open tasks related to Chinese natural language processing-dataset-current best results, CoupletAI-automatic couplet system based on CNN+Bi-LSTM+Attention, abstraction Knowledge graph, MiningZhiDaoQACorpus - 5.8 million Baidu Zhidao Q&A data mining project, brat rapid annotation tool: sequence annotation tool, large-scale Chinese knowledge graph data: 140 million entities, application and effect of data enhancement in machine translation and other nlp tasks, allennlp Reading comprehension: supports multiple data and models, PDF table data extraction tools, Graphbrain: AI open source software library and scientific research tools, the purpose is to promote automatic meaning extraction and text understanding, as well as knowledge exploration and promotion Automatic resume screening system, automatic summary of resumes based on named entity recognition, Chinese language comprehension benchmarks, including representative data sets & benchmark models & corpus & leaderboards, tree hole OCR text recognition, from scanned images containing tables Recognition of tables and text, voice migration, Python spoken natural language processing toolset (English), similarity: similarity calculation toolkit, written in java, massive Chinese pre-trained ALBERT model, Transformers 2.0, audio based on large-scale audio dataset Audioset Enhancement, Poplar: a web-based natural language annotation tool, picture and text removal, which can be used for comic translation, a library of digital names in 186 languages, Amazon's knowledge-based human-human open domain dialogue data set, Chinese text error correction module code, Conversion of traditional and simplified characters, multiple text readability evaluation indicators implemented by Python, nomenclature recognition data sets similar to names of people/places/organizations, Southeast University "Knowledge Graph" graduate course (data), .English spelling check library, wwsearch is a self-developed full-text search engine in the background of WeChat Enterprise, CHAMELEON: meta-architecture of deep learning news recommendation system, 8 papers combing the progress and reflection of BERT related models, DocSearch: free document search engine, LIDA: lightweight interactive dialogue annotation tool, aili - the fastest in-memory index in the East The fastest concurrent index in the Eastern Hemisphere, knowledge map car audio work project, natural language generation resource collection, Chinese, Japanese and Korean thesaurus mecab's Python interface library, Chinese text summarization/keyword extraction, Chinese character feature extractor (featurizer), extracting the features of Chinese characters (pronunciation features, font features) for deep learning features, Chinese generation task benchmark evaluation, Chinese abbreviation data set, Chinese task benchmark evaluation- representative data set-benchmark (pre-trained) model-corpus-baseline-toolkit-leaderboard, PySS3: SS3 text classifier machine visualization tool for explainable AI, Chinese NLP dataset list, COPE - metrical poetry editing program, doccano: web-based open source Collaborative multilingual text annotation tool, PreNLP: natural language preprocessing library, simple resume parser to extract key information from resume, GPT2 model for Chinese chat: GPT2-chitchat, multiple rounds of response selection based on retrieval chatbot List of related resources (Leaderboards, Datasets, Papers), (Colab) abstract text summary implementation collection (tutorials, word pinyin data, efficient fuzzy search tools, NLP data augmentation resource collection, Microsoft dialogue robot framework, GitHub Typo Corpus: large-scale GitHub multilingual Chinese spelling error/grammatical error data set, TextCluster: short text clustering preprocessing module Short text cluster, Chinese text normalization for speech recognition, BLINK: the most advanced entity link library, BertPunc: the most advanced punctuation repair model based on BERT, Tokenizer: fast, customizable text entry library, Chinese language understanding benchmark, including representative data sets, benchmark (pre-trained) models, corpus, leaderboard, spaCy medical text mining and information extraction, NLP task example projects Code set, python spell checking library, chatbot-list - the industry's application and architecture of intelligent customer service, chatbots, algorithm sharing and introduction, voice quality evaluation indicators (MOSNet, BSSEval, STOI, PESQ, SRMR), training with 138GB corpus The French RoBERTa pre-training language model, BERT-NER-Pytorch: three different modes of BERT Chinese NER experiment, Wudao Dictionary - the command line version of Youdao Dictionary, supports English-Chinese mutual search and online query, 2019 NLP highlights review, Chinese medical dialogue data Chinese medical dialogue data set, the best Chinese character number (Chinese number)-Arabic number conversion tool, multi-word meaning/sense item acquisition of Chinese words based on encyclopedia knowledge base and semantic disambiguation of specific sentence words, awesome-nlp-sentiment -analysis - Sentiment analysis, emotional cause identification, evaluation object and evaluation word extraction, LineFlow: NLP data efficient loader for all deep learning frameworks, Chinese medical NLP open resource organization, MedQuAD: (English) medical question answering data set, natural Parsing and conversion of language number strings into integers and floating point numbers, Transfer Learning in Natural Language Processing (NLP), Chinese/English pronunciation dictionary for speech recognition, Tokenizers: the most advanced tokenizer focusing on performance and versatility, CLUENER fine-grained named entities Recognition of Fine Grained Named Entity Recognition, BERT-based Chinese named entity recognition, Chinese rumor database, NLP dataset/big list of benchmark tasks, some papers and codes related to nlp, including topic model, word vector (Word Embedding), named entity recognition (NER), text classification (Text Classificatin), text generation (Text Generation), text similarity (Text Similarity) calculation, etc., involving various algorithms related to nlp, based on keras and tensorflow, Python text mining/NLP practical examples, Blackstone: spaCy pipeline and NLP model for unstructured legal texts to achieve text "face change" through synonym replacement, Chinese pre-training ELECTREA model: pretrain Chinese Model based on confrontational learning, albert-chinese -ner - Chinese NER with pre-trained language model ALBERT, topic-specific text generation/text augmentation based on GPT2, open source pre-trained language model collection, multilingual sentence vector package, encoding, labeling and implementation: a controllable and efficient Text generation method, large list of English swear words, attnvis: GPT2, BERT and other transformer language model attention interactive visualization, CoVoST: multilingual speech-to-text translation corpus released by Facebook, including 11 languages (French, German, Dutch, Russian, Speech, text transcription and English translation of Spanish, Italian, Turkish, Persian, Swedish, Mongolian and Chinese), Jiagu natural language processing tool - Based on models such as BiLSTM, it provides knowledge graph relationship extraction for Chinese word segmentation Labeling Named entity recognition Sentiment analysis New word discovery Key words Text summary Text clustering and other functions, use unet to realize automatic detection of document tables, table reconstruction, NLP event extraction Document resource list, large list of natural language processing research resources in the financial field, CLUEDatasetSearch - Chinese and English NLP datasets: Search all Chinese NLP datasets, with commonly used English NLP datasets, medical_NER - Chinese medical knowledge map named entity recognition, (Harvard) free book on causal reasoning, knowledge map related learning materials/datasets/ A large list of tool resources, Forte: a flexible and powerful natural language processing pipeline tool set, Python string similarity algorithm library, PyLaia: a deep learning toolkit for handwritten document analysis, TextFooler: an adversarial text generation module for text classification/reasoning, Haystack: Flexible, Powerful and Extensible Question Answering (QA) Framework, Chinese Key Phrase Extraction Tool
funNLPby fighting41love
Python
47983
Version:Current
License: No License
lingua-py
- Only language detection library written in pure Python.
- Unique algorithm to detect the language of a given string.
- Optimized for speed, allowing it to detect languages quickly and accurately.
lingua-pyby pemistahl
The most accurate natural language detection library for Python, suitable for long and short text alike
lingua-pyby pemistahl
Python
438
Version:v1.3.2
License: Permissive (Apache-2.0)
TextBlob
- Built-in sentiment analysis tool allows you to analyze a text's sentiment quickly.
- Has spell-checking capabilities, which can correct typos in a text.
- Built-in tokenizers can break text into individual words or tokens.
TextBlobby sloria
Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
TextBlobby sloria
Python
8472
Version:0.7.0
License: Permissive (MIT)
polyglot
- Supports language detection, not just for Python, but for multiple languages.
- Uses a statistical model to detect language.
- Supports language-agnostic tasks such as text summarization.
polyglotby aboSamoor
Multilingual text (NLP) processing toolkit
polyglotby aboSamoor
Python
2119
Version:Current
License: Others (Non-SPDX)
langID
- Designed to be robust to spelling errors and noise.
- Allows users to customize the language detection model.
- Designed to be both fast and accurate.
langIDby aldld
Language identification machine learning program written in Python, based on a naive Bayes classifier.
langIDby aldld
Python
2
Version:Current
License: No License
nltk
- Provides a wide variety of text processing and analysis capabilities.
- Has a vast library of predefined corpora and lexical resources.
- Offers a suite of graphical tools for exploring and visualizing language data.
langcodes
- Uses a simple, intuitive syntax that is easy to understand and use.
- Designed to be extensible, so developers can add additional information as needed.
- Provides a standard format for language codes.
langcodesby rspeer
A Python library for working with and comparing language codes.
langcodesby rspeer
Python
278
Version:v3.3.0
License: Permissive (MIT)
langdetect
- Written in Java and is based on Google's language detection algorithm.
- Does not rely on Python's native libraries for language detection.
- Open source and completely free to use.
langdetectby Mimino666
Port of Google's language-detection library to Python.
langdetectby Mimino666
Python
1395
Version:1.0.8
License: Others (Non-SPDX)
langdetector
- Supports various languages, making it a great choice for international projects.
- Including both supervised and unsupervised models.
- Extensive documentation library.
langdetectorby myroslavrozum
langdetectorby myroslavrozum
Python
0
Version:Current
License: No License
python-cld2
- Only library that uses the Compact Language Detector 2 library for language detection.
- Only library that uses the language confidence score.
- Supports several different encoding types, such as UTF-8, ISO-8859-1, and Windows-1252.
python-cld2by scrapinghub
Python bindings for CLD2.