spaCy | 💫 Industrial-strength Natural Language Processing | Natural Language Processing library

 by   explosion Python Version: 4.0.0.dev0 License: MIT

kandi X-RAY | spaCy Summary

spaCy is a Python library typically used in Artificial Intelligence, Natural Language Processing, Deep Learning, Pytorch, Bert applications. spaCy has no vulnerabilities, it has build file available, it has a Permissive License and it has medium support. However spaCy has 8 bugs. You can install using 'pip install spaCy' or download it from GitHub, PyPI.
spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest research, and was designed from day one to be used in real products. spaCy comes with pretrained pipelines and currently supports tokenization and training for 60+ languages. It features state-of-the-art speed and neural network models for tagging, parsing, named entity recognition, text classification and more, multi-task learning with pretrained transformers like BERT, as well as a production-ready training system and easy model packaging, deployment and workflow management. spaCy is commercial open-source software, released under the MIT license.
    Support
      Quality
        Security
          License
            Reuse
            Support
              Quality
                Security
                  License
                    Reuse

                      kandi-support Support

                        summary
                        spaCy has a medium active ecosystem.
                        summary
                        It has 25506 star(s) with 4047 fork(s). There are 550 watchers for this library.
                        summary
                        There were 7 major release(s) in the last 6 months.
                        summary
                        There are 75 open issues and 5295 have been closed. On average issues are closed in 40 days. There are 32 open pull requests and 0 closed requests.
                        summary
                        It has a neutral sentiment in the developer community.
                        summary
                        The latest version of spaCy is 4.0.0.dev0
                        spaCy Support
                          Best in #Natural Language Processing
                            Average in #Natural Language Processing
                            spaCy Support
                              Best in #Natural Language Processing
                                Average in #Natural Language Processing

                                  kandi-Quality Quality

                                    summary
                                    spaCy has 8 bugs (4 blocker, 0 critical, 3 major, 1 minor) and 1001 code smells.
                                    spaCy Quality
                                      Best in #Natural Language Processing
                                        Average in #Natural Language Processing
                                        spaCy Quality
                                          Best in #Natural Language Processing
                                            Average in #Natural Language Processing

                                              kandi-Security Security

                                                summary
                                                spaCy has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
                                                summary
                                                spaCy code analysis shows 0 unresolved vulnerabilities.
                                                summary
                                                There are 93 security hotspots that need review.
                                                spaCy Security
                                                  Best in #Natural Language Processing
                                                    Average in #Natural Language Processing
                                                    spaCy Security
                                                      Best in #Natural Language Processing
                                                        Average in #Natural Language Processing

                                                          kandi-License License

                                                            summary
                                                            spaCy is licensed under the MIT License. This license is Permissive.
                                                            summary
                                                            Permissive licenses have the least restrictions, and you can use them in most projects.
                                                            spaCy License
                                                              Best in #Natural Language Processing
                                                                Average in #Natural Language Processing
                                                                spaCy License
                                                                  Best in #Natural Language Processing
                                                                    Average in #Natural Language Processing

                                                                      kandi-Reuse Reuse

                                                                        summary
                                                                        spaCy releases are available to install and integrate.
                                                                        summary
                                                                        Deployable package is available in PyPI.
                                                                        summary
                                                                        Build file is available. You can build the component from source.
                                                                        summary
                                                                        Installation instructions, examples and code snippets are available.
                                                                        spaCy Reuse
                                                                          Best in #Natural Language Processing
                                                                            Average in #Natural Language Processing
                                                                            spaCy Reuse
                                                                              Best in #Natural Language Processing
                                                                                Average in #Natural Language Processing
                                                                                  Top functions reviewed by kandi - BETA
                                                                                  kandi has reviewed spaCy and discovered the below as its top functions. This is intended to give you an instant insight into spaCy implemented functionality, and help decide if they suit your requirements.
                                                                                  • Defines a factory
                                                                                    • Returns a fully qualified name for the given language
                                                                                    • Sets the factory meta
                                                                                    • Register a factory function
                                                                                  • Compute the PRF score for the given examples
                                                                                    • Calculate the tp score
                                                                                  • Embed a character embedding
                                                                                    • Construct a model of static vectors
                                                                                  • Lemmatize a token
                                                                                    • Get a table by name
                                                                                  • Command line interface for debugging
                                                                                  • Parse dependencies
                                                                                  • Process if node
                                                                                  • Create a model of static vectors
                                                                                  • Lemmatize a word
                                                                                  • Parse command line interface
                                                                                  • Lemmatize rule
                                                                                  • Setup package
                                                                                  • Forward layer computation
                                                                                  • Lemmatize a specific word
                                                                                  • Update the model with the given examples
                                                                                  • Builds a token embedding model
                                                                                  • Command line interface for pretraining
                                                                                  • Extract the words from the wiktionary
                                                                                  • Rehearse the language
                                                                                  • Process a for loop
                                                                                  • Lemmatize a rule
                                                                                  Get all kandi verified functions for this library.
                                                                                  Get all kandi verified functions for this library.

                                                                                  spaCy Key Features

                                                                                  Support for 60+ languages
                                                                                  Trained pipelines for different languages and tasks
                                                                                  Multi-task learning with pretrained transformers like BERT
                                                                                  Support for pretrained word vectors and embeddings
                                                                                  State-of-the-art speed
                                                                                  Production-ready training system
                                                                                  Linguistically-motivated tokenization
                                                                                  Components for named entity recognition, part-of-speech-tagging, dependency parsing, sentence segmentation, text classification, lemmatization, morphological analysis, entity linking and more
                                                                                  Easily extensible with custom components and attributes
                                                                                  Support for custom models in PyTorch, TensorFlow and other frameworks
                                                                                  Built in visualizers for syntax and NER
                                                                                  Easy model packaging, deployment and workflow management
                                                                                  Robust, rigorously evaluated accuracy

                                                                                  spaCy Examples and Code Snippets

                                                                                  Download the dataset and install spaCy and pandas-Import pre-annotated data
                                                                                  Pythondot imgLines of Code : 69dot imgLicense : Permissive (Apache-2.0)
                                                                                  copy iconCopy
                                                                                  
                                                                                                                      import spacy import pandas as pd import json from itertools import groupby # Download spaCy models: models = { 'en_core_web_sm': spacy.load("en_core_web_sm"), 'en_core_web_lg': spacy.load("en_core_web_lg") } # This function converts spaCy docs to the list of named entity spans in Label Studio compatible JSON format: def doc_to_spans(doc): tokens = [(tok.text, tok.idx, tok.ent_type_) for tok in doc] results = [] entities = set() for entity, group in groupby(tokens, key=lambda t: t[-1]): if not entity: continue group = list(group) _, start, _ = group[0] word, last, _ = group[-1] text = ' '.join(item[0] for item in group) end = last + len(word) results.append({ 'from_name': 'label', 'to_name': 'text', 'type': 'labels', 'value': { 'start': start, 'end': end, 'text': text, 'labels': [entity] } }) entities.add(entity) return results, entities # Now load the dataset and include only lines containing "Easter ": df = pd.read_csv('lines_clean.csv') df = df[df['line_text'].str.contains("Easter ", na=False)] print(df.head()) texts = df['line_text'] # Prepare Label Studio tasks in import JSON format with the model predictions: entities = set() tasks = [] for text in texts: predictions = [] for model_name, nlp in models.items(): doc = nlp(text) spans, ents = doc_to_spans(doc) entities |= ents predictions.append({'model_version': model_name, 'result': spans}) tasks.append({ 'data': {'text': text}, 'predictions': predictions }) # Save Label Studio tasks.json print(f'Save {len(tasks)} tasks to "tasks.json"') with open('tasks.json', mode='w') as f: json.dump(tasks, f, indent=2) # Save class labels as a txt file print('Named entities are saved to "named_entities.txt"') with open('named_entities.txt', mode='w') as f: f.write('\n'.join(sorted(entities)))
                                                                                  Compare the spaCy model with the gold standard dataset
                                                                                  Pythondot imgLines of Code : 19dot imgLicense : Permissive (Apache-2.0)
                                                                                  copy iconCopy
                                                                                  
                                                                                                                      import json from collections import defaultdict tasks = json.load(open('annotations.json')) model_hits = defaultdict(int) for task in tasks: annotation_result = task['annotations'][0]['result'] for r in annotation_result: r.pop('id') for prediction in task['predictions']: model_hits[prediction['model_version']] += int(prediction['result'] == annotation_result) num_task = len(tasks) for model_name, num_hits in model_hits.items(): acc = num_hits / num_task print(f'Accuracy for {model_name}: {acc:.2f}%')
                                                                                  Accuracy for en_core_web_sm: 0.03% Accuracy for en_core_web_lg: 0.41%
                                                                                  Download the dataset and install spaCy and pandas-Install spaCy and pandas
                                                                                  Pythondot imgLines of Code : 3dot imgLicense : Permissive (Apache-2.0)
                                                                                  copy iconCopy
                                                                                  
                                                                                                                      python -m pip install -U pip
                                                                                  pip install -U spacy
                                                                                  pip install pandas
                                                                                  Tweek spacy spans
                                                                                  Pythondot imgLines of Code : 30dot imgLicense : Strong Copyleft (CC BY-SA 4.0)
                                                                                  copy iconCopy
                                                                                  import spacy
                                                                                  from spacy.matcher import Matcher
                                                                                  
                                                                                  nlp = spacy.load("en_core_web_sm")
                                                                                  matcher = Matcher(nlp.vocab)
                                                                                  # Add match ID "HelloWorld" with no callback and one pattern
                                                                                  pattern = [{"LOWER": "hello"}, {"IS_PUNCT": True}, {"LOWER": "world"}]
                                                                                  matcher.add("HelloWorld", [pattern])
                                                                                  
                                                                                  doc = nlp("Hello, world! Hello world!")
                                                                                  matches = matcher(doc)
                                                                                  for match_id, start, end in matches:
                                                                                      string_id = nlp.vocab.strings[match_id]  # Get string representation
                                                                                      span = doc[start:end]  # The matched span
                                                                                      print(match_id, string_id, start, end, span.text)
                                                                                  
                                                                                  import spacy
                                                                                  from spacy.matcher import Matcher
                                                                                  
                                                                                  nlp = spacy.load('en_core_web_lg')
                                                                                  text='The car comprises 4 brakes 4.1, 4.2, 4.3 and 4.4 in fig. 5, all include an ESP system. This is shown in Fig. 6. Fig. 5 shows how the motors 56 and 57 are blocked. Besides the doors (44, 45) are painted blue.'
                                                                                  
                                                                                  # Add EntityRuler to pipeline
                                                                                  ruler = nlp.add_pipe("entity_ruler", before="ner", config={"validate": True})
                                                                                  patterns = [{"label": "2_DIGIT", "pattern": [{"IS_DIGIT": True}, {"IS_PUNCT": True}, {"IS_DIGIT": True}]}]
                                                                                  ruler.add_patterns(patterns)
                                                                                  
                                                                                  # Print 2-Digit Ents
                                                                                  print([(ent.label_, text[ent.start_char:ent.end_char]) for ent in doc.ents if ent.label_ == "2_DIGIT"])
                                                                                  
                                                                                  Get a word's function in a sentence PY
                                                                                  Pythondot imgLines of Code : 15dot imgLicense : Strong Copyleft (CC BY-SA 4.0)
                                                                                  copy iconCopy
                                                                                  import spacy
                                                                                  
                                                                                  nlp = spacy.load('en_core_web_sm')
                                                                                  
                                                                                  # Process the document
                                                                                  doc = nlp('God loves apples.')
                                                                                  
                                                                                  for tok in doc:
                                                                                      print(tok, tok.dep_, sep='\t')
                                                                                  
                                                                                  God nsubj
                                                                                  loves   ROOT
                                                                                  apples  dobj
                                                                                  .   punct
                                                                                  
                                                                                  How to solve the spacy latin language import error
                                                                                  Pythondot imgLines of Code : 2dot imgLicense : Strong Copyleft (CC BY-SA 4.0)
                                                                                  copy iconCopy
                                                                                  nlp = spacy_stanza.load_pipeline("xx", lang="la")
                                                                                  
                                                                                  AWS Lambda function not able to find other packages in same directory
                                                                                  Pythondot imgLines of Code : 4dot imgLicense : Strong Copyleft (CC BY-SA 4.0)
                                                                                  copy iconCopy
                                                                                  COPY core ${LAMBDA_TASK_ROOT}
                                                                                  
                                                                                  COPY core ${LAMBDA_TASK_ROOT}/core
                                                                                  
                                                                                  copy iconCopy
                                                                                  import spacy
                                                                                  from spacy.scorer import Scorer
                                                                                  from spacy.tokens import Doc
                                                                                  from spacy.training.example import Example
                                                                                  
                                                                                  examples = [
                                                                                      ('Who is Talha Tayyab?',
                                                                                       {(7, 19, 'PERSON')}),
                                                                                      ('I like London and Berlin.',
                                                                                       {(7, 13, 'LOC'), (18, 24, 'LOC')}),
                                                                                       ('Agra is famous for Tajmahal, The CEO of Facebook will visit India shortly to meet Murari Mahaseth and to visit Tajmahal.',
                                                                                       {(0, 4, 'LOC'), (40, 48, 'ORG'), (60, 65, 'GPE'), (82, 97, 'PERSON'), (111, 119, 'GPE')})
                                                                                  ]
                                                                                  
                                                                                  def my_evaluate(ner_model, examples):
                                                                                      scorer = Scorer()
                                                                                      example = []
                                                                                      for input_, annotations in examples:
                                                                                          pred = ner_model(input_)
                                                                                          print(pred,annotations)
                                                                                          temp = Example.from_dict(pred, dict.fromkeys(annotations))
                                                                                          example.append(temp)
                                                                                      scores = scorer.score(example)
                                                                                      return scores
                                                                                  
                                                                                  ner_model = spacy.load('en_core_web_sm') # for spaCy's pretrained use 'en_core_web_sm'
                                                                                  results = my_evaluate(ner_model, examples)
                                                                                  print(results)
                                                                                  
                                                                                  How to label multi-word entities?
                                                                                  Pythondot imgLines of Code : 19dot imgLicense : Strong Copyleft (CC BY-SA 4.0)
                                                                                  copy iconCopy
                                                                                  FINANCE = ["Frontwave Credit Union",
                                                                                  "St. Mary's Bank",
                                                                                  "Center for Financial Services Innovation"]
                                                                                  
                                                                                  SPORT = [
                                                                                      "Christiano Ronaldo",
                                                                                      "Lewis Hamilton",
                                                                                  ]
                                                                                  
                                                                                  FINANCE = '|'.join(FINANCE)
                                                                                  sent = pd.DataFrame({'sent': ["Dear members of Frontwave Credit Union, any credit demanded by Lewis Hamilton is invalid, said Ronaldo"]})
                                                                                  home = sent['sent'].str.extractall(f'({FINANCE})')
                                                                                  
                                                                                  def labeler(row, group):
                                                                                      l = len(row.split())
                                                                                      return [f'I-{group}' if i !=0 else f'B-{group}' for i in range(l)]
                                                                                  
                                                                                  home[0].apply(labeler, group='FINANCE').explode()
                                                                                  
                                                                                  Value Error when trying to train a spacy model
                                                                                  Pythondot imgLines of Code : 3dot imgLicense : Strong Copyleft (CC BY-SA 4.0)
                                                                                  copy iconCopy
                                                                                              for batch in batches:
                                                                                                  nlp.update(batch, sgd=optimizer, losses=losses)
                                                                                  
                                                                                  Community Discussions

                                                                                  Trending Discussions on spaCy

                                                                                  Error while loading vector from Glove in Spacy
                                                                                  chevron right
                                                                                  Save and load nlp results in spacy
                                                                                  chevron right
                                                                                  How to match repeating patterns in spacy?
                                                                                  chevron right
                                                                                  Spacy adds words automatically to vocab?
                                                                                  chevron right
                                                                                  Spacy tokenization add extra white space for dates with hyphen separator when I manually build the Doc
                                                                                  chevron right
                                                                                  After installing scrubadub_spacy package, spacy.load("en_core_web_sm") not working OSError: [E053] Could not read config.cfg
                                                                                  chevron right
                                                                                  Show NER Spacy Data in dataframe
                                                                                  chevron right
                                                                                  How to get a description for each Spacy NER entity?
                                                                                  chevron right
                                                                                  Do I need to do any text cleaning for Spacy NER?
                                                                                  chevron right
                                                                                  How to use japanese engine in Spacy
                                                                                  chevron right

                                                                                  QUESTION

                                                                                  Error while loading vector from Glove in Spacy
                                                                                  Asked 2022-Mar-17 at 16:39

                                                                                  I am facing the following attribute error when loading glove model:

                                                                                  Code used to load model:

                                                                                  nlp = spacy.load('en_core_web_sm')
                                                                                  tokenizer = spacy.load('en_core_web_sm', disable=['tagger','parser', 'ner', 'textcat'])
                                                                                  nlp.vocab.vectors.from_glove('../models/GloVe')
                                                                                  

                                                                                  Getting the following atribute error when trying to load the glove model:

                                                                                  AttributeError: 'spacy.vectors.Vectors' object has no attribute 'from_glove'
                                                                                  

                                                                                  Have tried to search on StackOverflow and elsewhere but can't seem to find the solution. Thanks!

                                                                                  From pip list:

                                                                                  • spacy version: 3.1.4
                                                                                  • spacy-legacy 3.0.8
                                                                                  • en-core-web-sm 3.1.0

                                                                                  ANSWER

                                                                                  Answered 2022-Mar-17 at 14:08

                                                                                  spacy version: 3.1.4 does not have the feature from_glove.

                                                                                  I was able to use nlp.vocab.vectors.from_glove() in spacy version: 2.2.4.

                                                                                  If you want, you can change your spacy version by using:

                                                                                  !pip install spacy==2.2.4 on your Jupyter cell.

                                                                                  Source https://stackoverflow.com/questions/71512064

                                                                                  QUESTION

                                                                                  Save and load nlp results in spacy
                                                                                  Asked 2022-Mar-10 at 20:36

                                                                                  I want to use SpaCy to analyze many small texts and I want to store the nlp results for further use to save processing time. I found code at Storing and Loading spaCy Documents Containing Word Vectors but I get an error and I cannot find how to fix it. I am fairly new to python.

                                                                                  In the following code, I store the nlp results to a file and try to read it again. I can write the first file but I do not find the second file (vocab). I also get two errors: that Doc and Vocab are not defined.

                                                                                  Any idea to fix this or another method to achieve the same result is more than welcomed.

                                                                                  Thanks!

                                                                                  import spacy
                                                                                  nlp = spacy.load('en_core_web_md')
                                                                                  doc = nlp("He eats a green apple")
                                                                                  for token in doc:
                                                                                      print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_,
                                                                                              token.shape_, token.is_alpha, token.is_stop)
                                                                                  
                                                                                  NLP_FName = "E:\\SaveTest.nlp"
                                                                                  doc.to_disk(NLP_FName)
                                                                                  Vocab_FName = "E:\\SaveTest.voc"
                                                                                  doc.vocab.to_disk(Vocab_FName)
                                                                                  
                                                                                  #To read the data again:
                                                                                  idoc = Doc(Vocab()).from_disk(NLP_FName)
                                                                                  idoc.vocab.from_disk(Vocab_FName)
                                                                                  
                                                                                  for token in idoc:
                                                                                      print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_,
                                                                                              token.shape_, token.is_alpha, token.is_stop)
                                                                                  

                                                                                  ANSWER

                                                                                  Answered 2022-Mar-10 at 18:06

                                                                                  I tried your code and I had a few minor issues wgich I fixed on the code below.

                                                                                  Note that SaveTest.nlp is a binary file with your doc info and
                                                                                  SaveTest.voc is a folder with all the spacy model vocab information (vectors, strings among other).

                                                                                  Changes I made:

                                                                                  1. Import Doc class from spacy.tokens
                                                                                  2. Import Vocab class from spacy.vocab
                                                                                  3. Download en_core_web_md model using the following command:
                                                                                  python -m spacy download en_core_web_md
                                                                                  

                                                                                  Please note that spacy has multiple models for each language, and usually you have to download it first (typically sm, md and lg models). Read more about it here.

                                                                                  Code:

                                                                                  import spacy
                                                                                  from spacy.tokens import Doc
                                                                                  from spacy.vocab import Vocab
                                                                                  
                                                                                  nlp = spacy.load('en_core_web_md')
                                                                                  doc = nlp("He eats a green apple")
                                                                                  for token in doc:
                                                                                      print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_,
                                                                                            token.shape_, token.is_alpha, token.is_stop)
                                                                                  
                                                                                  NLP_FName = "E:\\SaveTest.nlp"
                                                                                  doc.to_disk(NLP_FName)
                                                                                  Vocab_FName = "E:\\SaveTest.voc"
                                                                                  doc.vocab.to_disk(Vocab_FName)
                                                                                  
                                                                                  #To read the data again:
                                                                                  idoc = Doc(Vocab()).from_disk(NLP_FName)
                                                                                  idoc.vocab.from_disk(Vocab_FName)
                                                                                  
                                                                                  for token in idoc:
                                                                                      print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_,
                                                                                            token.shape_, token.is_alpha, token.is_stop)
                                                                                  

                                                                                  Let me know if this is helpful to you, and if not, please add your error message to your original question so I can help.

                                                                                  Source https://stackoverflow.com/questions/71427521

                                                                                  QUESTION

                                                                                  How to match repeating patterns in spacy?
                                                                                  Asked 2022-Mar-09 at 04:14

                                                                                  I have a similar question as the one asked in this post: How to define a repeating pattern consisting of multiple tokens in spacy? The difference in my case compared to the linked post is that my pattern is defined by POS and dependency tags. As a consequence I don't think I could easily use regex to solve my problem (as is suggested in the accepted answer of the linked post).

                                                                                  For example, let's assume we analyze the following sentence:

                                                                                  "She told me that her dog was big, black and strong."

                                                                                  The following code would allow me to match the list of adjectives at the end of the sentence:

                                                                                  import spacy # I am using spacy 2
                                                                                  from spacy.matcher import Matcher
                                                                                  nlp = spacy.load('en_core_web_sm')
                                                                                  
                                                                                  # Create doc object from text
                                                                                  doc = nlp(u"She told me that her dog was big, black and strong.")
                                                                                  
                                                                                  # Set up pattern matching
                                                                                  matcher = Matcher(nlp.vocab)
                                                                                  pattern = [{"POS": "ADJ"}, {"IS_PUNCT": True}, {"POS": "ADJ"}, {"POS": "CCONJ"}, {"POS": "ADJ"}]
                                                                                  matcher.add("AdjList", [pattern])
                                                                                  
                                                                                  
                                                                                  matches = matcher(doc)
                                                                                  

                                                                                  Running this code would match "big, black and strong". However, this pattern would not find the list of adjectives in the following sentences "She told me that her dog was big and black" or "She told me that her dog was big, black, strong and playful".

                                                                                  How would I have to define a (single) pattern for spacy's matcher in order to find such a list with any number of adjectives? Put differently, I am looking for the correct syntax for a pattern where the part {"POS": "ADJ"}, {"IS_PUNCT": True} can be repeated arbitrarily often before the list concludes with the pattern {"POS": "ADJ"}, {"POS": "CCONJ"}, {"POS": "ADJ"}.

                                                                                  Thanks for any hints.

                                                                                  ANSWER

                                                                                  Answered 2022-Mar-09 at 04:14

                                                                                  The solution / issue isn't fundamentally different from the question linked to, there's no facility for repeating multi-token patterns in a match like that. You can use a for loop to build multiple patterns to capture what you want.

                                                                                  patterns = []
                                                                                  for ii in range(1, 5):
                                                                                      pattern = [{"POS": "ADJ"}, {"IS_PUNCT":True}] * ii
                                                                                      pattern += [{"POS": "ADJ"}, {"POS": "CCONJ"}, {"POS": "ADJ"}]
                                                                                      patterns.append(pattern)
                                                                                  

                                                                                  Alternately you could do something with the dependency matcher. In your example sentence it's not that clean, but for a sentence like "It was a big, brown, playful dog", the adjectives all have dependency arcs directly connecting them to the noun.

                                                                                  As a separate note, you are not handling sentences with the serial comma.

                                                                                  Source https://stackoverflow.com/questions/71398736

                                                                                  QUESTION

                                                                                  Spacy adds words automatically to vocab?
                                                                                  Asked 2022-Feb-28 at 04:26

                                                                                  I loaded regular spacy language, and tries the following code:

                                                                                  import spacy
                                                                                  
                                                                                  nlp = spacy.load("en_core_web_md")
                                                                                  
                                                                                  text = "xxasdfdsfsdzz is the first U.S. public company"
                                                                                  
                                                                                  if 'xxasdfdsfsdzz' in nlp.vocab:
                                                                                      print("in")
                                                                                  else:
                                                                                      print("not")
                                                                                      
                                                                                  if 'Apple' in nlp.vocab:
                                                                                      print("in")
                                                                                  else:
                                                                                      print("not")
                                                                                  
                                                                                  
                                                                                  # Process the text
                                                                                  doc = nlp(text)
                                                                                  
                                                                                  if 'xxasdfdsfsdzz' in nlp.vocab:
                                                                                      print("in")
                                                                                  else:
                                                                                      print("not")
                                                                                      
                                                                                  if 'Apple' in nlp.vocab:
                                                                                      print("in")
                                                                                  else:
                                                                                      print("not")
                                                                                  

                                                                                  It seems like spacy loaded words after they called to analyze - nlp(text) Can someone explain the output? How can I avoid it? Why "Apple" is not existing in vocab? and why "xxasdfdsfsdzz" exists?

                                                                                  Output:

                                                                                  not
                                                                                  not
                                                                                  in
                                                                                  not
                                                                                  

                                                                                  ANSWER

                                                                                  Answered 2022-Feb-28 at 04:26

                                                                                  The spaCy Vocab is mainly an internal implementation detail to interface with a memory-efficient method of storing strings. It is definitely not a list of "real words" or any other thing that you are likely to find useful.

                                                                                  The main thing a Vocab stores by default is strings that are used internally, such as POS and dependency labels. In pipelines with vectors, words in the vectors are also included. You can read more about the implementation details here.

                                                                                  All words an nlp object has seen need storage for their strings, and so will be present in the Vocab. That's what you're seeing with your nonsense string in the example above.

                                                                                  Source https://stackoverflow.com/questions/71280615

                                                                                  QUESTION

                                                                                  Spacy tokenization add extra white space for dates with hyphen separator when I manually build the Doc
                                                                                  Asked 2022-Feb-15 at 12:39

                                                                                  I've been trying to solve a problem with the spacy Tokenizer for a while, without any success. Also, I'm not sure if it's a problem with the tokenizer or some other part of the pipeline.

                                                                                  Any help is welcome!

                                                                                  Description

                                                                                  I have an application that for reasons besides the point, creates a spacy Doc from the spacy vocab and the list of tokens from a string (see code below). Note that while this is not the simplest and most common way to do this, according to spacy doc this can be done.

                                                                                  However, when I create a Doc for a text that contains compound words or dates with hyphen as a separator, the behavior I am getting is not what I expected.

                                                                                  import spacy
                                                                                  from spacy.language import Doc
                                                                                  
                                                                                  # My current way
                                                                                  doc = Doc(nlp.vocab, words=tokens)  # Tokens is a well defined list of tokens for a certein string
                                                                                  
                                                                                  # Standard way
                                                                                  doc = nlp("My text...")
                                                                                  

                                                                                  For example, with the following text, if I create the Doc using the standard procedure, the spacy Tokenizer recognizes the "-" as tokens but the Doc text is the same as the input text, in addition the spacy NER model correctly recognizes the DATE entity.

                                                                                  import spacy
                                                                                  
                                                                                  doc = nlp("What time will sunset be on 2022-12-24?")
                                                                                  print(doc.text)
                                                                                  
                                                                                  tokens = [str(token) for token in doc]
                                                                                  print(tokens)
                                                                                  
                                                                                  # Show entities
                                                                                  print(doc.ents[0].label_)
                                                                                  print(doc.ents[0].text)
                                                                                  

                                                                                  Output:

                                                                                  What time will sunset be on 2022-12-24?
                                                                                  ['What', 'time', 'will', 'sunset', 'be', 'on', '2022', '-', '12', '-', '24', '?']
                                                                                  
                                                                                  DATE
                                                                                  2022-12-24
                                                                                  

                                                                                  On the other hand, if I create the Doc from the model's vocab and the previously calculated tokens, the result obtained is different. Note that for the sake of simplicity I am using the tokens from doc, so I'm sure there are no differences in tokens. Also note that I am manually running each pipeline model in the correct order with the doc, so at the end of this process I would theoretically get the same results.

                                                                                  However, as you can see in the output below, while the Doc's tokens are the same, the Doc's text is different, there were blank spaces between the digits and the date separators.

                                                                                  doc2 = Doc(nlp.vocab, words=tokens)
                                                                                  
                                                                                  # Run each model in pipeline
                                                                                  for model_name in nlp.pipe_names:
                                                                                      pipe = nlp.get_pipe(model_name)
                                                                                      doc2 = pipe(doc2)
                                                                                  
                                                                                  # Print text and tokens
                                                                                  print(doc2.text)
                                                                                  tokens = [str(token) for token in doc2]
                                                                                  print(tokens)
                                                                                  
                                                                                  # Show entities
                                                                                  print(doc.ents[0].label_)
                                                                                  print(doc.ents[0].text)
                                                                                  

                                                                                  Output:

                                                                                  what time will sunset be on 2022 - 12 - 24 ? 
                                                                                  ['what', 'time', 'will', 'sunset', 'be', 'on', '2022', '-', '12', '-', '24', '?']
                                                                                  
                                                                                  DATE
                                                                                  2022 - 12 - 24
                                                                                  

                                                                                  I know it must be something silly that I'm missing but I don't realize it.

                                                                                  Could someone please explain to me what I'm doing wrong and point me in the right direction?

                                                                                  Thanks a lot in advance!

                                                                                  EDIT

                                                                                  Following the Talha Tayyab suggestion, I have to create an array of booleans with the same length that my list of tokens to indicate for each one, if the token is followed by an empty space. Then pass this array in doc construction as follows: doc = Doc(nlp.vocab, words=words, spaces=spaces).

                                                                                  To compute this list of boolean values ​​based on my original text string and list of tokens, I implemented the following vanilla function:

                                                                                  def get_spaces(self, text: str, tokens: List[str]) -> List[bool]:
                                                                                       
                                                                                      # Spaces
                                                                                      spaces = []
                                                                                      # Copy text to easy operate
                                                                                      t = text.lower()
                                                                                  
                                                                                      # Iterate over tokens
                                                                                      for token in tokens:
                                                                                  
                                                                                          if t.startswith(token.lower()):
                                                                                  
                                                                                              t = t[len(token):]  # Remove token
                                                                                  
                                                                                              # If after removing token we have an empty space
                                                                                              if len(t) > 0 and t[0] == " ":
                                                                                                  spaces.append(True)
                                                                                                  t = t[1:]  # Remove space
                                                                                              else:
                                                                                                  spaces.append(False)
                                                                                  
                                                                                      return spaces
                                                                                  
                                                                                  

                                                                                  With these two improvements in my code, the result obtained is as expected. However, now I have the following question:

                                                                                  Is there a more spacy-like way to compute whitespace, instead of using my vanilla implementation?

                                                                                  ANSWER

                                                                                  Answered 2022-Feb-14 at 21:06

                                                                                  Please try this:

                                                                                  from spacy.language import Doc
                                                                                  doc2 = Doc(nlp.vocab, words=tokens,spaces=[1,1,1,1,1,1,0,0,0,0,0,0])
                                                                                  # Run each model in pipeline
                                                                                  for model_name in nlp.pipe_names:
                                                                                      pipe = nlp.get_pipe(model_name)
                                                                                      doc2 = pipe(doc2)
                                                                                  
                                                                                  # Print text and tokens
                                                                                  print(doc2.text)
                                                                                  tokens = [str(token) for token in doc2]
                                                                                  print(tokens)
                                                                                  
                                                                                  # Show entities
                                                                                  print(doc.ents[0].label_)
                                                                                  print(doc.ents[0].text)
                                                                                  
                                                                                  # You can also replace 0 with False and 1 with True
                                                                                  

                                                                                  This is the complete syntax:

                                                                                  doc = Doc(nlp.vocab, words=words, spaces=spaces)
                                                                                  

                                                                                  spaces are a list of boolean values indicating whether each word has a subsequent space. Must have the same length as words, if specified. Defaults to a sequence of True.

                                                                                  So you can choose which ones you gonna have space and which ones you do not need.

                                                                                  Reference: https://spacy.io/api/doc

                                                                                  Source https://stackoverflow.com/questions/71113891

                                                                                  QUESTION

                                                                                  After installing scrubadub_spacy package, spacy.load("en_core_web_sm") not working OSError: [E053] Could not read config.cfg
                                                                                  Asked 2022-Feb-06 at 04:46

                                                                                  I am getting the below error when I'm trying to run the following line of code to load en_core_web_sm in the Azure Machine Learning instance.

                                                                                  I debugged the issue and found out that once I install scrubadub_spacy, that seems is the issue causing the error.

                                                                                  spacy.load("en_core_web_sm")
                                                                                  
                                                                                  OSError                                   Traceback (most recent call last)
                                                                                   in 
                                                                                       1 # Load English tokenizer, tagger, parser and NER
                                                                                  ----> 2 nlp = spacy.load("en_core_web_sm")
                                                                                  
                                                                                  /anaconda/envs/azureml_py36/lib/python3.6/site-packages/spacy/__init__.py in load(name, vocab, disable, exclude, config)
                                                                                      50     """
                                                                                      51     return util.load_model(
                                                                                  ---> 52         name, vocab=vocab, disable=disable, exclude=exclude, config=config
                                                                                      53     )
                                                                                      54 
                                                                                  
                                                                                  /anaconda/envs/azureml_py36/lib/python3.6/site-packages/spacy/util.py in load_model(name, vocab, disable, exclude, config)
                                                                                     418             return get_lang_class(name.replace("blank:", ""))()
                                                                                     419         if is_package(name):  # installed as package
                                                                                  --> 420             return load_model_from_package(name, **kwargs)  # type: ignore[arg-type]
                                                                                     421         if Path(name).exists():  # path to model data directory
                                                                                     422             return load_model_from_path(Path(name), **kwargs)  # type: ignore[arg-type]
                                                                                  
                                                                                  /anaconda/envs/azureml_py36/lib/python3.6/site-packages/spacy/util.py in load_model_from_package(name, vocab, disable, exclude, config)
                                                                                     451     """
                                                                                     452     cls = importlib.import_module(name)
                                                                                  --> 453     return cls.load(vocab=vocab, disable=disable, exclude=exclude, config=config)  # type: ignore[attr-defined]
                                                                                     454 
                                                                                     455 
                                                                                  
                                                                                  /anaconda/envs/azureml_py36/lib/python3.6/site-packages/en_core_web_sm/__init__.py in load(**overrides)
                                                                                      10 
                                                                                      11 def load(**overrides):
                                                                                  ---> 12     return load_model_from_init_py(__file__, **overrides)
                                                                                  
                                                                                  /anaconda/envs/azureml_py36/lib/python3.6/site-packages/spacy/util.py in load_model_from_init_py(init_file, vocab, disable, exclude, config)
                                                                                     619         disable=disable,
                                                                                     620         exclude=exclude,
                                                                                  --> 621         config=config,
                                                                                     622     )
                                                                                     623 
                                                                                  
                                                                                  /anaconda/envs/azureml_py36/lib/python3.6/site-packages/spacy/util.py in load_model_from_path(model_path, meta, vocab, disable, exclude, config)
                                                                                     485     config_path = model_path / "config.cfg"
                                                                                     486     overrides = dict_to_dot(config)
                                                                                  --> 487     config = load_config(config_path, overrides=overrides)
                                                                                     488     nlp = load_model_from_config(config, vocab=vocab, disable=disable, exclude=exclude)
                                                                                     489     return nlp.from_disk(model_path, exclude=exclude, overrides=overrides)
                                                                                  
                                                                                  /anaconda/envs/azureml_py36/lib/python3.6/site-packages/spacy/util.py in load_config(path, overrides, interpolate)
                                                                                     644     else:
                                                                                     645         if not config_path or not config_path.exists() or not config_path.is_file():
                                                                                  --> 646             raise IOError(Errors.E053.format(path=config_path, name="config.cfg"))
                                                                                     647         return config.from_disk(
                                                                                     648             config_path, overrides=overrides, interpolate=interpolate
                                                                                  
                                                                                  OSError: [E053] Could not read config.cfg from /anaconda/envs/azureml_py36/lib/python3.6/site-packages/en_core_web_sm/en_core_web_sm-2.3.1/config.cfg
                                                                                  

                                                                                  I installed the packages using the below three lines codes from Spacy

                                                                                  pip install -U pip setuptools wheel
                                                                                  pip install -U spacy
                                                                                  python -m spacy download en_core_web_sm
                                                                                  

                                                                                  How should I fix this issue? thanks in advance.

                                                                                  ANSWER

                                                                                  Answered 2022-Feb-06 at 04:46

                                                                                  Taking the path from your error message:

                                                                                  en_core_web_sm-2.3.1/config.cfg
                                                                                  

                                                                                  You have a model for v2.3, but it's looking for a config.cfg, which is only a thing in v3 of spaCy. It looks like you upgraded spaCy without realizing it.

                                                                                  There are two ways to fix this. One is to reinstall the model with spacy download, which will get a version that matches your current spaCy version. If you are just starting something that is probably the best idea. Based on the release date of scrubadub, it seems to be intended for use with spaCy v3.

                                                                                  However, note that v2 and v3 are pretty different - if you have a project with v2 of spaCy you might want to downgrade instead.

                                                                                  Source https://stackoverflow.com/questions/70976353

                                                                                  QUESTION

                                                                                  Show NER Spacy Data in dataframe
                                                                                  Asked 2022-Jan-25 at 21:27

                                                                                  I am doing some web scraping to export text info from an html and using a NER (Spacy) to identify information such as Assets Under Management, Addresses, and founding dates of companies. Once the information is extracted, I would like to place it in a dataframe.

                                                                                  I am working with the following script:

                                                                                  from bs4 import BeautifulSoup
                                                                                  import numpy as np
                                                                                  from time import sleep
                                                                                  from random import randint
                                                                                  from selenium import webdriver
                                                                                  import pandas as pd
                                                                                  import spacy
                                                                                  from spacy import displacy
                                                                                  import en_core_web_sm
                                                                                  import requests
                                                                                  import re
                                                                                  
                                                                                  NER = spacy.load("en_core_web_sm")
                                                                                  
                                                                                  url = "https://www.baincapital.com/"
                                                                                  
                                                                                  
                                                                                  driver = webdriver.Chrome("C:/Program Files/chromedriver.exe")
                                                                                  driver.get(url)  
                                                                                  sleep(randint(5,15))
                                                                                  soup = BeautifulSoup(driver.page_source, 'html.parser')
                                                                                  body=soup.body.text
                                                                                  body
                                                                                  body= body.replace('\n', ' ')
                                                                                  body= body.replace('\t', ' ')
                                                                                  body= body.replace('\r', ' ')
                                                                                  body= body.replace('\xa0', ' ')
                                                                                  text3= NER(body)
                                                                                  displacy.render(text3,style="ent",jupyter=True)
                                                                                  

                                                                                  The output is shown as:

                                                                                  And I would like to place it in the following rudimentary table:

                                                                                  Entity Identified Money $155 Billion Date 1984 Org Bain Capital Org Bain Capital Investor Portal Please Cardinal four Cardinal 24 GPE US

                                                                                  Essentially, take highlighted info and place it in a dataframe with identifying features.

                                                                                  ANSWER

                                                                                  Answered 2022-Jan-25 at 21:27

                                                                                  After you obtained the body with plain text, you can parse the text into a document and get a list of all entities with their labels and texts, and then instantiate a Pandas dataframe with those data:

                                                                                  #... your code here ...
                                                                                  body=soup.body.text
                                                                                  
                                                                                  # now, this is the modification:
                                                                                  body = ' '.join(body.split())
                                                                                  doc = NER(body)
                                                                                  entities = [(e.label_,e.text) for e in doc.ents]
                                                                                  df = pd.DataFrame(entities, columns=['Entity','Identified'])
                                                                                  

                                                                                  Note that the body = ' '.join(body.split()) line is used to normalize all whitespace in a simpler and shorter way than you used.

                                                                                  Source https://stackoverflow.com/questions/70855135

                                                                                  QUESTION

                                                                                  How to get a description for each Spacy NER entity?
                                                                                  Asked 2022-Jan-24 at 16:01

                                                                                  I am using Spacy NER model to extract from a text, some named entities relevant to my problem, such us DATE, TIME, GPE among others.

                                                                                  For example, I need to recognize the Time Zone in the following sentence:

                                                                                  "Australian Central Time"
                                                                                  

                                                                                  With Spacy model en_core_web_lg, I got the following result:

                                                                                  doc = nlp("Australian Central Time")
                                                                                  print([(ent.label_, ent.text) for ent in doc.ents])
                                                                                      
                                                                                  >> [('NORP', 'Australian')]
                                                                                  

                                                                                  My problem is: I don't have a clear idea about what exactly means entity NORP and more general what exactly means each Spacy NER entity (leaving aside the intuitive values of course).

                                                                                  I found the following snippet to get the complete entities list, but after that I'm blocked:

                                                                                  import spacy
                                                                                  nlp = spacy.load("en_core_web_lg")
                                                                                  nlp.get_pipe("ner").labels
                                                                                  

                                                                                  I'm pretty new to using Spacy NLP and didn't find what I'm looking for on the official documentation, so any help will be appreciated!

                                                                                  BTW, I'm using Spacy version 3.2.1.

                                                                                  ANSWER

                                                                                  Answered 2022-Jan-24 at 16:01

                                                                                  Most labels have definitions you can access using spacy.explain(label).

                                                                                  For NORP: "Nationalities or religious or political groups"

                                                                                  For more details you would need to look into the annotation guidelines for the resources listed in the model documentation under https://spacy.io/models/.

                                                                                  Source https://stackoverflow.com/questions/70835924

                                                                                  QUESTION

                                                                                  Do I need to do any text cleaning for Spacy NER?
                                                                                  Asked 2021-Dec-28 at 11:42

                                                                                  I am new to NER and Spacy. Trying to figure out what, if any, text cleaning needs to be done. Seems like some examples I've found trim the leading and trailing whitespace and then muck with the start/stop indexes. I saw one example where the guy did a bunch of cleaning and his accuracy was really bad because all the indexes were messed up.

                                                                                  Just to clarify, the dataset was annotated with DataTurks, so you get json like this:

                                                                                          "Content": 
                                                                                          "label": [
                                                                                              "Skills"
                                                                                          ],
                                                                                          "points": [
                                                                                              {
                                                                                                  "start": 1295,
                                                                                                  "end": 1621,
                                                                                                  "text": "\n• Programming language...
                                                                                  

                                                                                  So by "mucking with the indexes", I mean, if you strip off the leading \n, you need to update the start index, so it's still aligned properly.

                                                                                  So that's really the question, if I start removing characters from the beginning, end or middle, I need to apply the rule to the content attribute and adjust start/end indexes to match, no? I'm guessing an obvious "yes" :), so I was wondering how much cleaning needs to be done.

                                                                                  So you would remove the \ns, bullets, leading / trailing whitespace, but leave standard punctuation like commas, periods, etc?

                                                                                  What about stuff like lowercasing, stop words, lemmatizing, etc?

                                                                                  One concern I'm seeing with a few samples I've looked at, is the start/stop indexes do get thrown off by the cleaning they do because you kind of need to update EVERY annotation as you remove characters to keep them in sync.

                                                                                  I.e.

                                                                                  A 0 -> 100
                                                                                  B 101 -> 150
                                                                                  

                                                                                  if I remove a char at position 50, then I need to adjust B to 100 -> 149.

                                                                                  ANSWER

                                                                                  Answered 2021-Dec-28 at 05:19

                                                                                  First, spaCy does no transformation of the input - it takes it literally as-is and preserves the format. So you don't lose any information when you provide text to spaCy.

                                                                                  That said, input to spaCy with the pretrained pipelines will work best if it is in natural sentences with no weird punctuation, like a newspaper article, because that's what spaCy's training data looks like.

                                                                                  To that end, you should remove meaningless white space (like newlines, leading and trailing spaces) or formatting characters (maybe a line of ----?), but that's about all the cleanup you have to do. The spaCy training data won't have bullets, so they might get some weird results, but I would leave them in to start. (Also, bullets are obviously printable characters - maybe you mean non-ASCII?)

                                                                                  I have no idea what you mean by "muck with the indexes", but for some older NLP methods it was common to do more extensive preprocessing, like removing stop words and lowercasing everything. Doing that will make things worse with spaCy because it uses the information you are removing for clues, just like a human reader would.

                                                                                  Note that you can train your own models, in which case they'll learn about the kind of text you show them. In that case you can get rid of preprocessing entirely, though for actually meaningless things like newlines / leading and following spaces you might as well remove them anyway.

                                                                                  To address your new info briefly...

                                                                                  Yes, character indexes for NER labels must be updated if you do preprocessing. If they aren't updated they aren't usable.

                                                                                  It looks like you're trying to extract "skills" from a resume. That has many bullet point lists. The spaCy training data is newspaper articles, which don't contain any lists like that, so it's hard to say what the right thing to do is. I don't think the bullets matter much, but you can try removing or not removing them.

                                                                                  What about stuff like lowercasing, stop words, lemmatizing, etc?

                                                                                  I already addressed this, but do not do this. This was historically common practice for NLP models, but for modern neural models, including spaCy, it is actively unhelpful.

                                                                                  Source https://stackoverflow.com/questions/70502457

                                                                                  QUESTION

                                                                                  How to use japanese engine in Spacy
                                                                                  Asked 2021-Dec-20 at 04:54

                                                                                  I am building a NLP App using python. I heard the Spacy is proper to NLP and installed it. How should I use the Japanese engine from Spacy?

                                                                                  pip install -u spacy
                                                                                  

                                                                                  or

                                                                                  python -m pip -u Spacy
                                                                                  

                                                                                  What shall I install more?

                                                                                  ANSWER

                                                                                  Answered 2021-Dec-15 at 21:39

                                                                                  You should download and install the language package.

                                                                                  pip spacy download ja_core_news_lg
                                                                                  

                                                                                  or

                                                                                  python -m spacy download ja_core_news_lg
                                                                                  

                                                                                  If you face an issue, please try this.

                                                                                  python -m spacy download ja_core_news_sm
                                                                                  

                                                                                  Source https://stackoverflow.com/questions/70370612

                                                                                  Community Discussions, Code Snippets contain sources that include Stack Exchange Network

                                                                                  Vulnerabilities

                                                                                  No vulnerabilities reported

                                                                                  Install spaCy

                                                                                  For detailed installation instructions, see the documentation.
                                                                                  Operating system: macOS / OS X · Linux · Windows (Cygwin, MinGW, Visual Studio)
                                                                                  Python version: Python 3.6+ (only 64 bit)
                                                                                  Package managers: pip · conda (via conda-forge)
                                                                                  Trained pipelines for spaCy can be installed as Python packages. This means that they're a component of your application, just like any other module. Models can be installed using spaCy's download command, or manually by pointing pip to a path or URL.

                                                                                  Support

                                                                                  New to spaCy? Here's everything you need to know!. How to use spaCy and its features. 🚀 New in v3.0. New features, backwards incompatibilities and migration guide. End-to-end workflows you can clone, modify and run. The detailed reference for spaCy's API. Download trained pipelines for spaCy. Plugins, extensions, demos and books from the spaCy ecosystem. Learn spaCy in this free and interactive online course. Our YouTube channel with video tutorials, talks and more. Changes and version history. How to contribute to the spaCy project and code base. Get a custom spaCy pipeline, tailor-made for your NLP problem by spaCy's core developers. Streamlined, production-ready, predictable and maintainable. Start by completing our 5-minute questionnaire to tell us what you need and we'll be in touch! Learn more →.
                                                                                  Find more information at:
                                                                                  Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
                                                                                  Find more libraries
                                                                                  Explore Kits - Develop, implement, customize Projects, Custom Functions and Applications with kandi kits​
                                                                                  Save this library and start creating your kit
                                                                                  Install
                                                                                • PyPI

                                                                                  pip install spacy

                                                                                • CLONE
                                                                                • HTTPS

                                                                                  https://github.com/explosion/spaCy.git

                                                                                • CLI

                                                                                  gh repo clone explosion/spaCy

                                                                                • sshUrl

                                                                                  git@github.com:explosion/spaCy.git

                                                                                • Share this Page

                                                                                  share link

                                                                                  Consider Popular Natural Language Processing Libraries

                                                                                  transformers

                                                                                  by huggingface

                                                                                  funNLP

                                                                                  by fighting41love

                                                                                  bert

                                                                                  by google-research

                                                                                  jieba

                                                                                  by fxsjy

                                                                                  Python

                                                                                  by geekcomputers

                                                                                  Try Top Libraries by explosion

                                                                                  thinc

                                                                                  by explosionPython

                                                                                  sense2vec

                                                                                  by explosionPython

                                                                                  spacy-models

                                                                                  by explosionPython

                                                                                  spacy-transformers

                                                                                  by explosionPython

                                                                                  projects

                                                                                  by explosionPython

                                                                                  Compare Natural Language Processing Libraries with Highest Support

                                                                                  transformers

                                                                                  by huggingface

                                                                                  bert

                                                                                  by google-research

                                                                                  allennlp

                                                                                  by allenai

                                                                                  flair

                                                                                  by flairNLP

                                                                                  spaCy

                                                                                  by explosion

                                                                                  Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
                                                                                  Find more libraries
                                                                                  Explore Kits - Develop, implement, customize Projects, Custom Functions and Applications with kandi kits​
                                                                                  Save this library and start creating your kit