kandi background
Explore Kits

Stop words : NLP

by akshara

๐•Š๐•ฅ๐• ๐•ก ๐•Ž๐• ๐•ฃ๐••๐•ค refer to those sets of words that are used in everyday language. Stop words in English comprises of "a", "an", "the", "is", "are", etc. Stop words are used in the field of Natural language processing to eliminate those words that are most commonly used and they alone, do not provide any contextual information.
This process is referred to as the ๐’‘๐’“๐’†-๐’‘๐’“๐’๐’„๐’†๐’”๐’”๐’Š๐’๐’ˆ of the data. ๐‘ฐ๐’” ๐’“๐’†๐’Ž๐’๐’—๐’‚๐’ ๐’๐’‡ ๐’”๐’•๐’๐’‘ ๐’˜๐’๐’“๐’…๐’” ๐’๐’†๐’†๐’…๐’†๐’… ? Yes, the removal of stop words would be necessary as the lower-level information of the text is removed which tends to bring more focus to the important information. Dataset size decreases and hence the time taken for data training is also low. It improves the performance and accuracy of the whole system. ๐‘ด๐’–๐’๐’•๐’Š-๐’๐’Š๐’๐’ˆ๐’–๐’‚๐’ ๐’”๐’•๐’๐’‘ ๐’˜๐’๐’“๐’…๐’” ๐’“๐’†๐’Ž๐’๐’—๐’‚๐’ Stop words can be removed for better text classification which is a part of the text pre-processing. It can help in faster and relevant retrieval of data. Stop words can be removed in different languages using a certain specific library for that particular language.

Language Based Libraries for Stop Words

Stopwords-iso is the most comprehensive collection of stopwords for multiple languages. The rest of the libraries are specific to their language like English, Bangla, Ukrainian, Chinese, Turkish, Japanese etc.

**The library "stopwords" is used for the English language

  • ยฉ 2022 Open Weaver Inc.
  • ยฉ 2022 Open Weaver Inc.