spark-nlp | State of the Art Natural Language Processing | Natural Language Processing library

 by   JohnSnowLabs Scala Version: 5.4.0rc2 License: Apache-2.0

kandi X-RAY | spark-nlp Summary

kandi X-RAY | spark-nlp Summary

spark-nlp is a Scala library typically used in Artificial Intelligence, Natural Language Processing, Pytorch, Tensorflow, Bert, Neural Network, Transformer, Spark applications. spark-nlp has no bugs, it has no vulnerabilities, it has a Permissive License and it has medium support. You can download it from GitHub.

Spark NLP is a state-of-the-art Natural Language Processing library built on top of Apache Spark. It provides simple, performant & accurate NLP annotations for machine learning pipelines that scale easily in a distributed environment. Spark NLP comes with 4000+ pretrained pipelines and models in more than 200+ languages. It also offers tasks such as Tokenization, Word Segmentation, Part-of-Speech Tagging, Word and Sentence Embeddings, Named Entity Recognition, Dependency Parsing, Spell Checking, Text Classification, Sentiment Analysis, Token Classification, Machine Translation (+180 languages), Summarization & Question Answering, Text Generation, and many more NLP tasks. Spark NLP is the only open-source NLP library in production that offers state-of-the-art transformers such as BERT, ALBERT, ELECTRA, XLNet, DistilBERT, RoBERTa, XLM-RoBERTa, Longformer, ELMO, Universal Sentence Encoder, Google T5, MarianMT, and GPT2 not only to Python and R, but also to JVM ecosystem (Java, Scala, and Kotlin) at scale by extending Apache Spark natively.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              spark-nlp has a medium active ecosystem.
              It has 3279 star(s) with 661 fork(s). There are 94 watchers for this library.
              There were 2 major release(s) in the last 6 months.
              There are 31 open issues and 758 have been closed. On average issues are closed in 99 days. There are 6 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of spark-nlp is 5.4.0rc2

            kandi-Quality Quality

              spark-nlp has no bugs reported.

            kandi-Security Security

              spark-nlp has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

            kandi-License License

              spark-nlp is licensed under the Apache-2.0 License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              spark-nlp releases are available to install and integrate.
              Installation instructions, examples and code snippets are available.

            Top functions reviewed by kandi - BETA

            kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
            Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of spark-nlp
            Get all kandi verified functions for this library.

            spark-nlp Key Features

            No Key Features are available at this moment for spark-nlp.

            spark-nlp Examples and Code Snippets

            AWS Lambda function not able to find other packages in same directory
            Pythondot img1Lines of Code : 4dot img1License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            COPY core ${LAMBDA_TASK_ROOT}
            
            COPY core ${LAMBDA_TASK_ROOT}/core
            
            How to set a tag at the experiment level in MLFlow
            Pythondot img2Lines of Code : 7dot img2License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            from mlflow.tracking import MlflowClient
            
            # Create an experiment with a name that is unique and case sensitive.
            client = MlflowClient()
            experiment_id = client.create_experiment("Social NLP Experiments")
            client.set_experiment_tag(experiment
            Regex in Spark NLP Normalizer is not working correctly
            Pythondot img3Lines of Code : 2dot img3License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            "(?U)[^\w -]|_|-(?!\w)|(?
            NLP analysis for some pyspark dataframe columns by numpy vectorization
            Pythondot img4Lines of Code : 25dot img4License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            from sparknlp.pretrained import PretrainedPipeline
            
            df = spark.sql('select year, month, u_id, p_id, comment from MY_DF where rating_score = 1 and isnull(comment) = false')
            
            df1 = df.withColumnRenamed('comment', 'text')
            
            pipeline_dl = Pretr

            Community Discussions

            QUESTION

            trying to use johnsnow pretrained pipeline on spark dataframe but unable to read delta file in the same session
            Asked 2022-Apr-03 at 08:47

            i am using the below code to read the spark dataframe from hdfs:

            ...

            ANSWER

            Answered 2022-Apr-03 at 08:47

            The configure_spark_with_delta_pip is just a shortcut to setup correct parameters of the SparkSession... If you look into its source code you'll see following code, you'll see that everything it's doing is configuring the spark.jars.packages. But because you're using it separately for SparkNLP, you're overwriting Delta's value. To handle such situations, configure_spark_with_delta_pip has an additional parameter extra_packages to specify additional packages to be configured. So in your case the code should look as following:

            Source https://stackoverflow.com/questions/71723324

            QUESTION

            AWS Lambda function not able to find other packages in same directory
            Asked 2022-Mar-14 at 08:57

            I am deploying a lambda function as a container image. Here's my project structure :

            • core
            • plugins
            • lambda_handler.py

            All three are at the same level - /var/task

            Inside lambda_handler.py I am importing the core package, but when I test it locally it says :

            ...

            ANSWER

            Answered 2022-Mar-14 at 08:57

            QUESTION

            TensorFlowException: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /mnt/yarn/usercache
            Asked 2021-Nov-18 at 09:23

            I am trying to run "onto_electra_base_uncased" model on some data stored in hive table, I ran count() on df before saving the data into hive table and got this exception.

            Spark Shell launch configurations:

            ...

            ANSWER

            Answered 2021-Nov-18 at 09:23

            The solution to this issue is use kryo serialization, the default spark-shell or spark-submit invocation is using java serialization, the Annotate class in spark-nlp is implemented to use Kryo Serialization hence same should be used for running any spark-nlp jobs

            Source https://stackoverflow.com/questions/68998112

            QUESTION

            SparkNLP's NerCrfApproach with custom labels
            Asked 2021-Oct-14 at 06:26

            I am trying to train a SparkNLP NerCrfApproach model with a dataset in CoNLL format that has custom labels for product entities (like I-Prod, B-Prod etc.). However, when using the trained model to make predictions, I get only "O" as the assigned label for all tokens. When using the same model trained on the CoNLL data from the SparkNLP workshop example, the classification works fine. (cf. https://github.com/JohnSnowLabs/spark-nlp-workshop/tree/master/jupyter/training/english/crf-ner)

            So, the question is: Does NerCrfApproach rely on the standard tag set for NER labels used by the CoNLL data? Or can I use it for any custom labels and, if yes, do I need to specify these somehow? My assumption was that the labels are inferred from the training data.

            Cheers, Martin

            Update: The issue might not be related to the labels after all. I tried to replace my custom labels with CoNLL standard labels and I am still not getting the expected classification results.

            ...

            ANSWER

            Answered 2021-Oct-14 at 06:26

            As it turns out, this issue was not caused by the labels, but rather by the size of the dataset. I was using a rather small dataset for development purposes. Not only was this dataset quite small, but also heavily imbalanced, with a lot more "O" labels than the other labels. Fixing this by using a dataset of 10x the original size (in terms of sentences), I am able to get meaningful results, even for my custom labels.

            Source https://stackoverflow.com/questions/69551405

            QUESTION

            Glue job failed with `JohnSnowLabs spark-nlp dependency not found` error randomly
            Asked 2021-May-10 at 09:20

            I'm using AWS Glue to run some pyspark python code, sometimes it succeeded but sometimes failed with a dependency error: Resource Setup Error: Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: JohnSnowLabs#spark-nlp;2.5.4: not found], here is the error logs:

            ...

            ANSWER

            Answered 2021-May-07 at 20:28

            spark-packages moved on May 1 2021. In my scala project I had to add a different resolver like so. It's got to be similar in java.

            Source https://stackoverflow.com/questions/67414623

            QUESTION

            java.lang.ClassNotFoundException: com.johnsnowlabs.nlp.DocumentAssembler spark in Pycharm with conda env
            Asked 2021-Mar-17 at 15:29

            I saved a pre-trained model from spark-nlp, then I'm trying to run a Python script in Pycharm with anaconda env:

            ...

            ANSWER

            Answered 2021-Mar-17 at 15:29

            some context first. The spark-nlp library depends on a jar file that needs to be present in the Spark classpath. There are three ways to provide this jar according to how you start the context in PySpark. a) When you start your Python app throught interpreter, you call sparknlp.start() and the jar will be automatically downloaded.

            b) You pass the jar to pyspark command using the --jars switch. In this case you took the jar from the releases page and download it manually.

            c) You start pyspark and pass --packages, here you need to pass a maven coordinate, example,

            Source https://stackoverflow.com/questions/66064423

            QUESTION

            spark-nlp 'JavaPackage' object is not callable
            Asked 2020-Dec-24 at 10:18

            I am using jupyter lab to run spark-nlp text analysis. At the moment I am just running the sample code:

            ...

            ANSWER

            Answered 2020-Dec-24 at 10:18

            Remove Spark 3.0.1, leave just PySpark 2.4.x. as Spark NLP still doesn't support Spark 3.x. Use Java 8 instead of Java 11 because it's not supported in Spark 2.4.

            Source https://stackoverflow.com/questions/65430871

            QUESTION

            How to install offline Spark NLP packages
            Asked 2020-Aug-26 at 14:41

            How can I install offline Spark NLP packages without internet connection. I've downloaded the package (recognizee_entities_dl) and uploaded it to the cluster.

            I've installed Spark NLP using pip install spark-nlp==2.5.5. I'm using PySpark and from the cluster I'm unable to download the packages.

            Already tried;

            ...

            ANSWER

            Answered 2020-Aug-26 at 14:41

            Looking at your error:

            Source https://stackoverflow.com/questions/63446312

            QUESTION

            object johnsnowlabs is not a member of package com
            Asked 2020-Jul-28 at 12:40

            I am very new to Zeppelin/spark and couldn't get an accurate description of steps to configure new dependencies like that of NLP libraries. Found similar issue here.

            I was trying to use Johnsnowlabs NLP library in Zeppelin notebook (spark version2.2.1). Setup included :

            1. In Zeppelin's Interpreters configurations for Spark, include the following artifact: com.johnsnowlabs.nlp:spark-nlp_2.11:2.5.4
            2. Then, in conf/zeppelin-env.sh, setup SPARK_SUBMIT_OPTIONS. export SPARK_SUBMIT_OPTIONS=” — packages JohnSnowLabs:spark-nlp:2.2.2". Then restarted Zeppelin.

            But the below program gives the error :

            ...

            ANSWER

            Answered 2020-Jul-28 at 12:40

            you don't need to edit the conf/zeppelin-env.sh (anyway you're using it incorrectly, as you're specifying completely different version), you can make all changes via Zeppelin UI. Go to the Spark interpreter configuration, and put com.johnsnowlabs.nlp:spark-nlp_2.11:2.5.4 into spark.jars.packages configuration property (or add it if it doesn't exist), and into the Dependencies at the end of configuration (for some reason, it isn't automatically pulled into driver classpath).

            Source https://stackoverflow.com/questions/63132135

            QUESTION

            Scala Spark: Multiple sources found for json
            Asked 2020-Jul-08 at 21:40

            I'm getting an exception when executing spark2-submit on my hadoop cluster, when reading a directory of .jsons in hdfs I have no idea how to resolve it.

            I have found some question on several board about this, but none of them popular or with an answer.

            I tried explicit importing org.apache.spark.sql.execution.datasources.json.JsonFileFormat, but it seems redundant, to importing SparkSession, so it's not getting recognised.

            I can however confirm that both of these classes are available.

            ...

            ANSWER

            Answered 2020-Jul-05 at 18:31

            It seems you have both Spark 2.x and 3.x jars in classpath. According to the sbt file, Spark 2.x should be used, however, JsonFileFormat was added in Spark 3.x with this issue

            Source https://stackoverflow.com/questions/62743053

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install spark-nlp

            This is a quick example of how to use Spark NLP pre-trained pipeline in Python and PySpark:.
            FAT-JAR for CPU on Apache Spark 3.0.x and 3.1.x
            FAT-JAR for GPU on Apache Spark 3.0.x and 3.1.x
            FAT-JAR for CPU on Apache Spark 3.2.x
            FAT-JAR for GPU on Apache Spark 3.2.x
            FAT-JAR for CPU on Apache Spark 2.4.x
            FAT-JAR for GPU on Apache Spark 2.4.x
            FAT-JAR for CPU on Apache Spark 2.3.x
            FAT-JAR for GPU on Apache Spark 2.3.x

            Support

            Slack For live discussion with the Spark NLP community and the teamGitHub Bug reports, feature requests, and contributionsDiscussions Engage with other community members, share ideas, and show off how you use Spark NLP!Medium Spark NLP articlesYouTube Spark NLP video tutorials
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            Install
          • PyPI

            pip install spark-nlp

          • CLONE
          • HTTPS

            https://github.com/JohnSnowLabs/spark-nlp.git

          • CLI

            gh repo clone JohnSnowLabs/spark-nlp

          • sshUrl

            git@github.com:JohnSnowLabs/spark-nlp.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link

            Consider Popular Natural Language Processing Libraries

            transformers

            by huggingface

            funNLP

            by fighting41love

            bert

            by google-research

            jieba

            by fxsjy

            Python

            by geekcomputers

            Try Top Libraries by JohnSnowLabs

            spark-nlp-workshop

            by JohnSnowLabsJupyter Notebook

            nlu

            by JohnSnowLabsPython

            nlptest

            by JohnSnowLabsPython

            spark-ocr-workshop

            by JohnSnowLabsJupyter Notebook

            johnsnowlabs

            by JohnSnowLabsPython