kandi background
Explore Kits

Annotate words based on Previous Label using SpaCy

by vigneshchennai74 Updated: Jan 1, 2023

In this solution we are going to find the Gender with their honorifics of their names using python with the help of Spacy library .Spacy is a comprehensive and popular Python library. In that Library there is a famous and prominent method called The Matcher method that lets you find words and phrases using rules describing their token attributes. In this solution kit, I am sharing the code snippet and library that I use to find name belongs to Masculine or Feminine in Python which can be executed directly in the IDE.


Preview of the output that you will get on running this code from your IDE

Code

In this solution we use the Matcher method of the SpaCy library.

Lines of Code : 58License : CC BY-SA 4.0

import spacy
from spacy.pipeline import EntityRuler

nlp = spacy.load("en_core_web_sm")

ruler = EntityRuler(nlp, overwrite_ents=True)

patterns = [{"label": "MALE_NAME", "pattern": [{"ENT_TYPE": "Male"}, {"TEXT": {"REGEX": "\w+"}} ]},
            {"label": "FEMALE_NAME", "pattern": [{"ENT_TYPE": "Female"}, {"TEXT": {"REGEX": "\w+"}} ]}]
ruler.add_patterns(patterns)
nlp.add_pipe(ruler)

doc = nlp("Mr. Johnson goes to Los Angeles, and Mrs. Smith went to San Francisco.")
print([(ent.text, ent.label_) for ent in doc.ents if ent.label_ in ['MALE_NAME', 'FEMALE_NAME']])

[('Mr. Johnson', 'MALE_NAME'), ('Mrs. Smith', 'FEMALE_NAME')]

import spacy
from spacy.pipeline import EntityRuler

nlp = spacy.load("en_core_web_sm")
ruler = EntityRuler(nlp, overwrite_ents=True)
patterns = [{"label": "MALE_NAME", "pattern": [{"LOWER": {"IN": ["mr", "mr."]}}, {"ENT_TYPE": "PERSON"}]},
            {"label": "FEMALE_NAME", "pattern": [{"LOWER": {"IN": ["mrs", "mrs."]}}, {"ENT_TYPE": "PERSON"}]}]
ruler.add_patterns(patterns)
nlp.add_pipe(ruler)

doc = nlp("Mr. Johnson goes to Los Angeles, and Mrs. Smith went to San Francisco.")
print([(ent.text, ent.label_) for ent in doc.ents if ent.label_ in ['MALE_NAME', 'FEMALE_NAME']])

[('Mr. Johnson', 'MALE_NAME'), ('Mrs. Smith', 'FEMALE_NAME')]

import spacy
from spacy.matcher import Matcher

nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)

# Create patterns
male_name_pattern = [{"LOWER": {"IN": ["mr", "mr."]}}, {"ENT_TYPE": "PERSON"}]
female_name_pattern = [{"LOWER": {"IN": ["mrs", "mrs."]}}, {"ENT_TYPE": "PERSON"}]

# Add patterns
matcher.add("MALE_NAME", None, male_name_pattern)
matcher.add("FEMALE_NAME", None, female_name_pattern)

doc = nlp("Mr. Johnson goes to Los Angeles, and Mrs. Smith went to San Francisco.")
matches = matcher(doc)
for match_id, start, end in matches:
    # Get string representation of pattern name
    string_id = nlp.vocab.strings[match_id] 
    # The matched span
    span = doc[start:end]  
    print(span.text, string_id)

Mr. Johnson MALE_NAME
Mrs. Smith FEMALE_NAME
  1. Copy the code using the "Copy" button above, and paste it in a Python file in your IDE.
  2. Enter the name in doc using honorifics
  3. Run the file to find the name is Masculine or Feminine


I hope you found this useful. I have added the link to dependent libraries, version information in the following sections.


I found this code snippet by searching for "Spacy rules to annotate words based on previous label " in kandi. You can try any such use case!

Dependency Library

spaCyby explosion

Python star image 25129 Version:3.4.4

License: Permissive (MIT)

๐Ÿ’ซ Industrial-strength Natural Language Processing (NLP) in Python

Support
Quality
Security
License
Reuse

spaCyby explosion

Python star image 25129 Version:3.4.4 License: Permissive (MIT)

๐Ÿ’ซ Industrial-strength Natural Language Processing (NLP) in Python
Support
Quality
Security
License
Reuse

If you do not have SpaCy that is required to run this code, you can install it by clicking on the above link and copying the pip Install command from the Spacy page in kandi.

You can search for any dependent library on kandi like Spacy

Environment Test

I tested this solution in the following versions. Be mindful of changes when working with other versions.


  1. The solution is created in Python 3.7.15 Version
  2. The solution is tested on Spacy 3.4.3 Version


Using this solution, we can able to find name belongs to Masculine or Feminine in our text using python with the help of Spacy library. This process also facilities an easy to use, hassle free method to create a hands-on working version of code which would help us find names using honorifics in Python.

Support

  1. For any support on kandi solution kits, please use the chat
  2. For further learning resources, visit the Open Weaver Community learning page.

Open Weaver โ€“ Develop Applications Faster with Open Source

Follow

  • ยฉ 2023 Open Weaver Inc.