Annotate words based on the previous label using Spacy

share link

by vigneshchennai74 dot icon Updated: Apr 10, 2023

technology logo
technology logo

Solution Kit Solution Kit  

Determining a person's Gender using only their name and its associated honorific is mostly accurate and reliable. In many cultures, names and their associated honorifics can hint at a person's Gender. For example, in English-speaking countries, names like "Mr." or "Sir" are typically associated with men, while "Mrs." or "Madam" are associated with women.  

 

The spaCy library will perform named entity recognition (NER) on a text document. The Matcher class is used to create and match patterns in the text. The code defines two patterns: one for a male name ("Mr." or "mr") followed by a person entity, and one for a female name ("Mrs." or "mrs") followed by a person entity. The matcher.add method in spaCy is used to add a pattern to a Matcher object. The method takes three arguments: 

  • match_id (int): A unique identifier for the pattern, typically created using the nlp.vocab.strings store. 
  • callback (callable or None): A function to be executed when the pattern is matched. You can set this argument to None if you don't need a callback. 
  • pattern (list of dictionaries): The pattern to match, represented as a list of dictionaries. Each dictionary defines a token and its attributes to be matched in the text. 

 

Here are examples of how to find the Gender using Honorifics of their Names 

Preview of the output that you will get on running this code from your IDE

Code

In this solution we use the Matcher method of the SpaCy library.

import spacy
from spacy.pipeline import EntityRuler

nlp = spacy.load("en_core_web_sm")

ruler = EntityRuler(nlp, overwrite_ents=True)

patterns = [{"label": "MALE_NAME", "pattern": [{"ENT_TYPE": "Male"}, {"TEXT": {"REGEX": "\w+"}} ]},
            {"label": "FEMALE_NAME", "pattern": [{"ENT_TYPE": "Female"}, {"TEXT": {"REGEX": "\w+"}} ]}]
ruler.add_patterns(patterns)
nlp.add_pipe(ruler)

doc = nlp("Mr. Johnson goes to Los Angeles, and Mrs. Smith went to San Francisco.")
print([(ent.text, ent.label_) for ent in doc.ents if ent.label_ in ['MALE_NAME', 'FEMALE_NAME']])

[('Mr. Johnson', 'MALE_NAME'), ('Mrs. Smith', 'FEMALE_NAME')]

import spacy
from spacy.pipeline import EntityRuler

nlp = spacy.load("en_core_web_sm")
ruler = EntityRuler(nlp, overwrite_ents=True)
patterns = [{"label": "MALE_NAME", "pattern": [{"LOWER": {"IN": ["mr", "mr."]}}, {"ENT_TYPE": "PERSON"}]},
            {"label": "FEMALE_NAME", "pattern": [{"LOWER": {"IN": ["mrs", "mrs."]}}, {"ENT_TYPE": "PERSON"}]}]
ruler.add_patterns(patterns)
nlp.add_pipe(ruler)

doc = nlp("Mr. Johnson goes to Los Angeles, and Mrs. Smith went to San Francisco.")
print([(ent.text, ent.label_) for ent in doc.ents if ent.label_ in ['MALE_NAME', 'FEMALE_NAME']])

[('Mr. Johnson', 'MALE_NAME'), ('Mrs. Smith', 'FEMALE_NAME')]

import spacy
from spacy.matcher import Matcher

nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)

# Create patterns
male_name_pattern = [{"LOWER": {"IN": ["mr", "mr."]}}, {"ENT_TYPE": "PERSON"}]
female_name_pattern = [{"LOWER": {"IN": ["mrs", "mrs."]}}, {"ENT_TYPE": "PERSON"}]

# Add patterns
matcher.add("MALE_NAME", None, male_name_pattern)
matcher.add("FEMALE_NAME", None, female_name_pattern)

doc = nlp("Mr. Johnson goes to Los Angeles, and Mrs. Smith went to San Francisco.")
matches = matcher(doc)
for match_id, start, end in matches:
    # Get string representation of pattern name
    string_id = nlp.vocab.strings[match_id] 
    # The matched span
    span = doc[start:end]  
    print(span.text, string_id)

Mr. Johnson MALE_NAME
Mrs. Smith FEMALE_NAME

Instructions

  1. Download and install VS Code on your desktop.
  2. Open VS Code and create a new file in the editor.
  3. Copy the code snippet that you want to run, using the "Copy" button or by selecting the text and using the copy command (Ctrl+C on Windows/Linux or Cmd+C on Mac).
  4. Paste the code into your file in VS Code, and save the file with a meaningful name.
  5. Open a terminal window or command prompt on your computer.
  6. For download spacy: use this command pip install spacy [3.4.3]
  7. Once spacy is installed, you can download the en_core_web_sm model using the following command: python -m spacy download en_core_web_sm Alternatively, you can install the model directly using pip: pip install en_core_web_sm
  8. To run the code, open the file in VS Code and click the "Run" button in the top menu, or use the keyboard shortcut Ctrl+Alt+N (on Windows and Linux) or Cmd+Alt+N (on Mac). The output of your code will appear in the VS Code output console.



I hope you found this useful. I have added the link to dependent libraries, version information in the following sections.


I found this code snippet by searching for "Spacy rules to annotate words based on previous label " in kandi. You can try any such use case!

Environment Tested

I tested this solution in the following versions. Be mindful of changes when working with other versions.


  1. The solution is created in Python 3.7.15 Version
  2. The solution is tested on Spacy 3.4.3 Version
  3. The solution is tested on Vscode 1.76.0 version


Using this solution, we can able to find name belongs to Masculine or Feminine in our text using python with the help of Spacy library. This process also facilities an easy to use, hassle free method to create a hands-on working version of code which would help us find names using honorifics in Python.

Dependency Library

spaCyby explosion

Python doticonstar image 26383 doticonVersion:v3.2.6doticon
License: Permissive (MIT)

💫 Industrial-strength Natural Language Processing (NLP) in Python

Support
    Quality
      Security
        License
          Reuse

            spaCyby explosion

            Python doticon star image 26383 doticonVersion:v3.2.6doticon License: Permissive (MIT)

            💫 Industrial-strength Natural Language Processing (NLP) in Python
            Support
              Quality
                Security
                  License
                    Reuse

                      If you do not have SpaCy that is required to run this code, you can install it by clicking on the above link and copying the pip Install command from the Spacy page in kandi.

                      You can search for any dependent library on kandi like Spacy

                      Support

                      1. For any support on kandi solution kits, please use the chat
                      2. For further learning resources, visit the Open Weaver Community learning page.

                      See similar Kits and Libraries