coreNlp | Extensions for and tools to work with CoreNlp | Natural Language Processing library
kandi X-RAY | coreNlp Summary
kandi X-RAY | coreNlp Summary
CoreNlp doesnt have stopword identification built in, so I wrote an extension to its analytics pipeline (called Annotators) to check if a token’s word and lemma value are stopwords. By default, the StopwordAnnotator uses the built in Lucene stopword list, but you have to option to pass in a custom list of stopwords for it to use instead. You can also specify if the StopwordAnnotator should check the lemma of the token against the stopword list or not. For examples of how to use the StopwordAnnotator, takea look at StopwordAnnotatorTest.java.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Creates a StanfordNLP analyzer object
- Generate the properties for nLP
- Creates a NLP analyzer object with the configured properties
- Generate the properties for nLP
- Require annotations
- Add stopwords annotation
- Gets the stop word list from a string
- Return the set of stopwordAnnotation
coreNlp Key Features
coreNlp Examples and Code Snippets
Community Discussions
Trending Discussions on coreNlp
QUESTION
I am processing some documents and I am getting many WORKDAY messages as seen below. There's a similar issue posted here for WEEKDAY. Does anyone know how to deal with this message. I am running corenlp in a Java server on Windows and accessing it using Juypyter Notebook and Python code.
...ANSWER
Answered 2021-Nov-20 at 19:28This is an error in the current SUTime rules file (and it's actually been there for quite a few versions). If you want to fix it immediately, you can do the following. Or we'll fix it in the next release. These are Unix commands, but the same thing will work elsewhere except for how you refer to and create folders.
Find this line in sutime/english.sutime.txt
and delete it. Save the file.
{ (/workday|work day|business hours/) => WORKDAY }
Then move the file to the right location for replacing in the jar file, and then replace it in the jar file. In the root directory of the CoreNLP distribution do the following (assuming you don't already have an edu
file/folder in that directory):
QUESTION
I am trying to run a RUTA script with an analysis pipeline.
I add my script to the pipeline like so createEngineDescription(RutaEngine.class, RutaEngine.PARAM_MAIN_SCRIPT, "mypath/myScript.ruta)
My ruta script file contains this:
...ANSWER
Answered 2021-Aug-15 at 10:09I solved the problem. This error was being thrown simply because the script could not be found and I had to change this line from: RutaEngine.PARAM_MAIN_SCRIPT, "myscript.ruta" to: RutaEngine.PARAM_MAIN_SCRIPT, "myscript"
However, I did a few other things before this that may have contributed to the solution so I am listing them here:
- I added the ruta nature to my eclipse project
- I moved the myscript from resources to a script package
QUESTION
I have a Python program where I am using os.sys to train the Stanford NER from the command line. This returns an output/training status which I save in the variable "status", and it is usually 0. However, I just ran it and got an output of 256, as well as not creating a file for the trained model. This error is only occurring for larger sets of training data. I searched through the documentation on the Stanford NLP website and there doesn't seem to be info on the meanings of the outputs or why increasing training data might affect the training. Thanks in advance for any help and problem code is below.
...ANSWER
Answered 2021-Jul-28 at 05:35Status is an exit code, and non-zero exit codes mean your program failed. This is not a Stanford NLP convention, it's how all programs work on Unix/Linux.
There should be an error somewhere, maybe you ran out of memory? You'll have to track that down to find out what's wrong.
QUESTION
I am trying to write a standalone Java application in IntelliJ using edu.stanford.nlp.trees.GrammaticalStructure. Therefore, I have imported the module:
...ANSWER
Answered 2021-Mar-11 at 20:04If you want to use it means you want to execute the code in them. How is the runtime supposed to execute code that is does not have? How is the compiler supposed to know how the code is defined (e.g. what the classes look like)? This is simply impossible. If you want to use the code you have to provide it to the compiler as well as the runtime.
If you just dont want to include all of that code into your application, you need either access to the sources and just pick the class you need or you need some kind of JAR minimizer as @CrazyCoder suggested.
QUESTION
I'm dealing with german law documents and would like to generate parse trees for sentences. I could find and use Standford CoreNLP Parser. However, it does not recognize sentence limits as good as other tools (e.g. spaCy) when parsing the sentences of a document. For example, it would break sentences at every single '.'-character, incl. the dot at the end of abbreviations such as "incl.") Since it is crucial to cover the whole sentence for creating syntax trees, this does not really work out for me.
I would appreciate any suggestions to tackle this problem, espacially pointers to other software that might be better suited for my problem. If I overlooked the possibility to tweak the Stanford parser, I would be very grateful for any hints on how to make it better detect sentence limits.
...ANSWER
Answered 2021-Feb-19 at 07:27A quick glance into the docs did the trick: You can run your pipeline, which might include the sentence splitter, with the attribute
ssplit.isOneSentence = true
to basically disable it. This means you can split the sentences beforehand, e.g. using spaCy, and then feed single sentences into the pipeline.
QUESTION
I am looking for a way to extract and merge annotation results from CoreNLP. To specify,
...ANSWER
Answered 2021-Jan-07 at 22:46The coref chains have a sentenceIndex and a beginIndex which should correlate to the position in the sentence. You can use this to correlate the two.
Edit: quick and dirty change to your example code:
QUESTION
To be honest, Java is a mystery for me. I just started a few days ago to learn Python, but now I need to use Stanford CoreNLP which needs Java (GOD!!!!!)
When I import Stanford CoreNlp in CMD, it always shows "Error: can't find or load main class ... Reason: java.lang.ClassNotFoundException: ..."
But in fact, I have already made some changes in the environment (though they may not be correct).
It may be an error of the setting of environement path, but I really don't know how to solve it...
...ANSWER
Answered 2021-Jan-06 at 09:53You are facing with classpath issue
From your screenshot, current working directory is C:\Users(Name) which does not contains code of the SCNLP.
From Command Line Usage page, the minimal command to run Stanford CoreNLP from the command line is:
QUESTION
I am trying to perform Semgrex in https://corenlp.run/ on the below sentence to extract the transition event. Since the dependency relation "obl:from" has a colon in it, I get an error. But instead, if I used nsubj, I get the desired result. Can someone tell me how to work around this?
My text: The automobile shall change states from OFF to ON when the driver is in control.
...ANSWER
Answered 2020-Dec-19 at 09:46Found answer Just wrap the expr within / /
and it works!
For eg.
QUESTION
I am running the StanfordCoreNLP server through my docker container. Now I want to access it through my python script.
Github repo I'm trying to run: https://github.com/swisscom/ai-research-keyphrase-extraction
I ran the command which gave me the following output:
...ANSWER
Answered 2020-Oct-07 at 08:08As seen in the log, your service is listening to port 9000 inside the container. However, from outside you need further information to be able to access it. Two pieces of information that you need:
- The IP address of the container
- The external port that docker exports this 9000 to the outside (by default docker does not export locally open ports).
To get the IP address you need to use docker inspect
, for example via
QUESTION
I am relatively new to NLP and at the moment I'm trying to extract different phrase scructures in german texts. For that I'm using the Stanford corenlp implementation of stanza with the tregex feature for pattern machting in trees.
So far I didn't have any problem an I was able to match simple patterns like "NPs" or "S > CS". No I'm trying to match S nodes that are immediately dominated either by ROOT or by a CS node that is immediately dominated by ROOT. For that im using the pattern "S > (CS > TOP) | > TOP". But it seems that it doesn't work properly. I'm using the following code:
...ANSWER
Answered 2020-Sep-22 at 22:01A few comments:
1.) Assuming you are using a recent version of CoreNLP (4.0.0+), you need to use the mwt annotator with German. So your annotators list should be tokenize,ssplit,mwt,pos,parse
2.) Here is your sentence in PTB for clarity:
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install coreNlp
You can use coreNlp like any standard Java library. Please include the the jar files in your classpath. You can also use any IDE and you can run and debug the coreNlp component as you would do with any other Java program. Best practice is to use a build tool that supports dependency management such as Maven or Gradle. For Maven installation, please refer maven.apache.org. For Gradle installation, please refer gradle.org .
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page