Spacy built in Lemmatiser in a spacy pipeline

by vigneshchennai74 Updated: Feb 20, 2023

Solution Kit

The Spacy library and pre-trained language models, like 'en_core_web_sm', have a variety of applications and use cases worldwide. Spacy can also be used in various Natural Language Processing tasks, like part-of-speech tagging, named entity recognition, sentiment analysis, and text classification.

One of the primary benefits of Spacy is its speed and efficiency in processing and analyzing large amounts of text data. This can be especially useful for businesses and organizations requiring to process larger amounts of text data, such as news articles, customer reviews, or social media posts. Spacy can help automate many tasks involved in the processing and analyzing this text data, making it faster and more accurate than manual processing.

This code uses the Spacy library and loads a pre-trained English language model ('en_core_web_sm') into the variable nlp. Once loaded, the language model can analyze and process text data, such as identifying parts of speech, named entities, and dependencies.

The attributes printed out for each token are:

token.text: the raw text of the token
token.pos_: the predicted part-of-speech tag for the token
token.tag_: a more detailed predicted part-of-speech tag for the token
token.dep_: the predicted dependency label for the token
token.lemma_: the predicted lemma (base form) of the token

Spacy and other NLP tools can help businesses and organizations better understand and analyze text data, leading to improved decision-making, better customer insights, and more efficient workflows.

Preview of the output that you will get on running this code from your IDE

Code

In this solution we have used Spacy library.

How to use spacys built in lemmatiser in a spacy pipeline?

Lines of Code : 25License : Strong Copyleft (CC BY-SA 4.0)

Dependent Libraries :

import spacy
nlp=spacy.load('en_core_web_sm')
doc= nlp(u"Apples and oranges are similar. Boots and hippos aren't.")
print('\n')
print("Token Attributes: \n", "token.text, token.pos_, token.tag_, token.dep_, token.lemma_")
for token in doc:
    # Print the text and the predicted part-of-speech tag
    print("{:<12}{:<12}{:<12}{:<12}{:<12}".format(token.text, token.pos_, token.tag_, token.dep_, token.lemma_))


Token Attributes: 
 token.text, token.pos_, token.tag_, token.dep_, token.lemma_
Apples      NOUN        NNS         nsubj       apple       
and         CCONJ       CC          cc          and         
oranges     NOUN        NNS         conj        orange      
are         AUX         VBP         ROOT        be          
similar     ADJ         JJ          acomp       similar     
.           PUNCT       .           punct       .           
Boots       NOUN        NNS         nsubj       boot        
and         CCONJ       CC          cc          and         
hippos      NOUN        NN          conj        hippos      
are         AUX         VBP         ROOT        be          
n't         PART        RB          neg         not         
.           PUNCT       .           punct       .

Copy the code using the "Copy" button above, and paste it in a Python file in your IDE.
Enter the text that need to be lemmatise.
Run the program to get the text to lemmatise

I hope you found this useful. I have added the link to dependent libraries, version information in the following sections.

I found this code snippet by searching for "Spacy built in Lemmatiser in a spacy pipeline" in kandi. You can try any such use case!

Dependent Library

spaCyby explosion

Python

26383

Version:v3.2.6

License: Permissive (MIT)

💫 Industrial-strength Natural Language Processing (NLP) in Python

Support

Quality

Security

License

Reuse

spaCyby explosion

Python 26383 Version:v3.2.6 License: Permissive (MIT)

💫 Industrial-strength Natural Language Processing (NLP) in Python

Support

Quality

Security

License

Reuse

If you do not have SpaCy that is required to run this code, you can install it by clicking on the above link and copying the pip Install command from the Spacy page in kandi.

You can search for any dependent library on kandi like Spacy

Environment Test

I tested this solution in the following versions. Be mindful of changes when working with other versions.

The solution is created in Python 3.7.15 Version
The solution is tested on Spacy 3.4.3 Version

Using this solution, we are able to lemmatise the text using the spaCy library in Python with simple steps. This process also facilities an easy to use, hassle free method to create a hands-on working version of code which would help us build a lemmatiser in python.

Support

For any support on kandi solution kits, please use the chat
For further learning resources, visit the Open Weaver Community learning page.

See similar Kits and Libraries

Open Weaver – Develop Applications Faster with Open Source

Terms
Privacy policy

Spacy built in Lemmatiser in a spacy pipeline

Code

Dependent Library

Environment Test

Support

Open Weaver – Develop Applications Faster with Open Source

kandi

Community and Support

Company

Follow