technology logo
technology logo

Stop words : NLP

share link

by akshara dot icon Updated: Jun 17, 2022

Solution Kit Solution Kit ย 

๐•Š๐•ฅ๐• ๐•ก ๐•Ž๐• ๐•ฃ๐••๐•ค refer to those sets of words that are used in everyday language. Stop words in English comprises of "a", "an", "the", "is", "are", etc. Stop words are used in the field of Natural language processing to eliminate those words that are most commonly used and they alone, do not provide any contextual information.

This process is referred to as the ๐’‘๐’“๐’†-๐’‘๐’“๐’๐’„๐’†๐’”๐’”๐’Š๐’๐’ˆ of the data. ๐‘ฐ๐’” ๐’“๐’†๐’Ž๐’๐’—๐’‚๐’ ๐’๐’‡ ๐’”๐’•๐’๐’‘ ๐’˜๐’๐’“๐’…๐’” ๐’๐’†๐’†๐’…๐’†๐’… ? Yes, the removal of stop words would be necessary as the lower-level information of the text is removed which tends to bring more focus to the important information. Dataset size decreases and hence the time taken for data training is also low. It improves the performance and accuracy of the whole system. ๐‘ด๐’–๐’๐’•๐’Š-๐’๐’Š๐’๐’ˆ๐’–๐’‚๐’ ๐’”๐’•๐’๐’‘ ๐’˜๐’๐’“๐’…๐’” ๐’“๐’†๐’Ž๐’๐’—๐’‚๐’ Stop words can be removed for better text classification which is a part of the text pre-processing. It can help in faster and relevant retrieval of data. Stop words can be removed in different languages using a certain specific library for that particular language.

Language Based Libraries for Stop Words

Stopwords-iso is the most comprehensive collection of stopwords for multiple languages. The rest of the libraries are specific to their language like English, Bangla, Ukrainian, Chinese, Turkish, Japanese etc.

stopwords-isoby stopwords-iso

JavaScript doticonstar image 267 doticonVersion:Currentdoticon
License: Permissive (MIT)

All languages stopwords collection

Support
    Quality
      Security
        License
          Reuse

            stopwords-isoby stopwords-iso

            JavaScript doticon star image 267 doticonVersion:Currentdoticon License: Permissive (MIT)

            All languages stopwords collection
            Support
              Quality
                Security
                  License
                    Reuse

                      stopwordsby igorbrigadir

                      Python doticonstar image 251 doticonVersion:v1.1doticon
                      no licences License: No License (null)

                      Default English stopword lists from many different sources

                      Support
                        Quality
                          Security
                            License
                              Reuse

                                stopwordsby igorbrigadir

                                Python doticon star image 251 doticonVersion:v1.1doticonno licences License: No License

                                Default English stopword lists from many different sources
                                Support
                                  Quality
                                    Security
                                      License
                                        Reuse
                                          Python doticonstar image 142 doticonVersion:Currentdoticon
                                          License: Strong Copyleft (GPL-3.0)

                                          Persian (Farsi) Stop Words List

                                          Support
                                            Quality
                                              Security
                                                License
                                                  Reuse

                                                    persian-stopwordsby kharazi

                                                    Python doticon star image 142 doticonVersion:Currentdoticon License: Strong Copyleft (GPL-3.0)

                                                    Persian (Farsi) Stop Words List
                                                    Support
                                                      Quality
                                                        Security
                                                          License
                                                            Reuse
                                                              JavaScript doticonstar image 103 doticonVersion:Currentdoticon
                                                              no licences License: No License (null)

                                                              Vietnamese stopwords

                                                              Support
                                                                Quality
                                                                  Security
                                                                    License
                                                                      Reuse

                                                                        vietnamese-stopwordsby stopwords

                                                                        JavaScript doticon star image 103 doticonVersion:Currentdoticonno licences License: No License

                                                                        Vietnamese stopwords
                                                                        Support
                                                                          Quality
                                                                            Security
                                                                              License
                                                                                Reuse

                                                                                  Ukrainian-Stopwordsby skupriienko

                                                                                  Python doticonstar image 14 doticonVersion:Currentdoticon
                                                                                  License: Strong Copyleft (CC-BY-SA-4.0)

                                                                                  the list of ~2000 ukrainian stopwords (with numbers)

                                                                                  Support
                                                                                    Quality
                                                                                      Security
                                                                                        License
                                                                                          Reuse

                                                                                            Ukrainian-Stopwordsby skupriienko

                                                                                            Python doticon star image 14 doticonVersion:Currentdoticon License: Strong Copyleft (CC-BY-SA-4.0)

                                                                                            the list of ~2000 ukrainian stopwords (with numbers)
                                                                                            Support
                                                                                              Quality
                                                                                                Security
                                                                                                  License
                                                                                                    Reuse

                                                                                                      Turkish-Stopwordsby tkorkunckaya

                                                                                                      PHP doticonstar image 14 doticonVersion:Currentdoticon
                                                                                                      License: Strong Copyleft (GPL-3.0)

                                                                                                      A full and updated Turkish stop words list, which should be filtered out prior to, or after, processing of natural language data, full text search or data indexing.

                                                                                                      Support
                                                                                                        Quality
                                                                                                          Security
                                                                                                            License
                                                                                                              Reuse

                                                                                                                Turkish-Stopwordsby tkorkunckaya

                                                                                                                PHP doticon star image 14 doticonVersion:Currentdoticon License: Strong Copyleft (GPL-3.0)

                                                                                                                A full and updated Turkish stop words list, which should be filtered out prior to, or after, processing of natural language data, full text search or data indexing.
                                                                                                                Support
                                                                                                                  Quality
                                                                                                                    Security
                                                                                                                      License
                                                                                                                        Reuse

                                                                                                                          Chinese-StopWordsby baipengyan

                                                                                                                          Python doticonstar image 7 doticonVersion:Currentdoticon
                                                                                                                          no licences License: No License (null)

                                                                                                                          ไธญๆ–‡ๅธธ็”จ็š„ๅœ็”จ่ฏ(ๅŒ…ๅซ็™พๅบฆใ€ๅ“ˆๅทฅๅคงใ€ๅ››ๅทๅคงๅญฆ็ญ‰่ฏ่กจ)

                                                                                                                          Support
                                                                                                                            Quality
                                                                                                                              Security
                                                                                                                                License
                                                                                                                                  Reuse

                                                                                                                                    Chinese-StopWordsby baipengyan

                                                                                                                                    Python doticon star image 7 doticonVersion:Currentdoticonno licences License: No License

                                                                                                                                    ไธญๆ–‡ๅธธ็”จ็š„ๅœ็”จ่ฏ(ๅŒ…ๅซ็™พๅบฆใ€ๅ“ˆๅทฅๅคงใ€ๅ››ๅทๅคงๅญฆ็ญ‰่ฏ่กจ)
                                                                                                                                    Support
                                                                                                                                      Quality
                                                                                                                                        Security
                                                                                                                                          License
                                                                                                                                            Reuse

                                                                                                                                              Bangla-Stopwordsby asifabdullah-git

                                                                                                                                              Python doticonstar image 0 doticonVersion:Currentdoticon
                                                                                                                                              License: Permissive (MIT)

                                                                                                                                              Support
                                                                                                                                                Quality
                                                                                                                                                  Security
                                                                                                                                                    License
                                                                                                                                                      Reuse

                                                                                                                                                        Bangla-Stopwordsby asifabdullah-git

                                                                                                                                                        Python doticon star image 0 doticonVersion:Currentdoticon License: Permissive (MIT)

                                                                                                                                                        Support
                                                                                                                                                          Quality
                                                                                                                                                            Security
                                                                                                                                                              License
                                                                                                                                                                Reuse

                                                                                                                                                                  japanese-stopwordsby stopwords

                                                                                                                                                                  JavaScript doticonstar image 3 doticonVersion:Currentdoticon
                                                                                                                                                                  License: Permissive (MIT)

                                                                                                                                                                  Japanese stopwords, available for npm, bower, plaintext. ๆ—ฅๆœฌใฎใ‚นใƒˆใƒƒใƒ—ใƒฏใƒผใƒ‰

                                                                                                                                                                  Support
                                                                                                                                                                    Quality
                                                                                                                                                                      Security
                                                                                                                                                                        License
                                                                                                                                                                          Reuse

                                                                                                                                                                            japanese-stopwordsby stopwords

                                                                                                                                                                            JavaScript doticon star image 3 doticonVersion:Currentdoticon License: Permissive (MIT)

                                                                                                                                                                            Japanese stopwords, available for npm, bower, plaintext. ๆ—ฅๆœฌใฎใ‚นใƒˆใƒƒใƒ—ใƒฏใƒผใƒ‰
                                                                                                                                                                            Support
                                                                                                                                                                              Quality
                                                                                                                                                                                Security
                                                                                                                                                                                  License
                                                                                                                                                                                    Reuse

                                                                                                                                                                                      **The library "stopwords" is used for the English language

                                                                                                                                                                                      See similar Kits and Libraries