jcrfsuite | Java interface for CRFsuite http | Natural Language Processing library
kandi X-RAY | jcrfsuite Summary
kandi X-RAY | jcrfsuite Summary
This is a Java interface for crfsuite, a fast implementation of Conditional Random Fields, using SWIG and class injection technique (the same technique used in snappy-java). Jcrfsuite provides API for loading trained model into memory and do sequential tagging in memory. Model training is done via command line interface. The library is designed for building Java applications for fast text sequential tagging such as Part-Of-Speech (POS) tagging, phrase chunking, Named-Entity Recognition (NER), etc. Jcrfsuite can be dropped into any Java web applications and run without problem with JVM's class loader.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Loads CrfSuiteNativeNative and creates a CrfSuiteNativeNative implementation
- Computes the MD5 hash of the given input stream
- Extract a native library file to the target folder
- Injects CrfSuiteNativeLoader class loader
- Finds the native library
- Returns the version of the crfsuite
- Returns the byte code from classpath
- Loads native library
- Checks if native library is already loaded
- Main train method
- Train CRF Suite with an item sequence
- Load data in CRF Suite format
- Prints the path of the library
- Translates an operating name to a folder name
- Tags a model
- Tag an item sequence
- Gets a list of possible labels for this tagger
- Load system properties
jcrfsuite Key Features
jcrfsuite Examples and Code Snippets
import com.github.jcrfsuite.CrfTrainer;
...
String trainFile = "data/tweet-pos/train-oct27.txt";
String modelFile = "twitter-pos.model";
CrfTrainer.train(trainFile, modelFile);
import com.github.jcrfsuite.CrfTagger;
import com.github.jcrfsuite.util.
Community Discussions
Trending Discussions on jcrfsuite
QUESTION
From what I understand from the example of POS Tagging given in the examples of jcrfsuite. The training file is tab separated and first token is the label. But I do not get the BigCluster| thing. Can somebody help me with how to specify tokens in training file.
Example below:
O BigCluster|00 BigCluster|0000 BigCluster|000000 BigCluster|00000000 BigCluster|0000000000 BigCluster|000000000000 BigCluster|00000000000000 BigCluster|0000000000000000 NextBigCluster|0100 NextBigCluster|01000101 NextBigCluster|010001011111 POSTagDict|D POSTagDict|N POSTagDict|^ POSTagDict|$ POSTagDict|G NextPOSTag|V 1gramSuff|i 1gramPref|i prevword| prevcurr||i nextword|predict nextword|predict currnext|i|predict Word|I Lower|i Xxdshape|X charclass|1, first-shortcap prevnext||predict t=0
Test file format:
! BigCluster|01 BigCluster|0110 BigCluster|011011 BigCluster|01101100 BigCluster|0110110011 BigCluster|011011001100 BigCluster|01101100110000 BigCluster|0110110011000000 NextBigCluster|1000 NextBigCluster|10001000 NextBigCluster|100010000000 POSTagDict|V NextPOSTag|, metaph_POSDict|N 1gramSuff|n 2gramSuff|nn 3gramSuff|mnn 4gramSuff|mmnn 5gramSuff|mmmnn 6gramSuff|ammmnn 7gramSuff|aammmnn 8gramSuff|aaammmnn 9gramSuff|daaammmnn 1gramPref|d 2gramPref|da 3gramPref|daa 4gramPref|daaa 5gramPref|daaam 6gramPref|daaamm 7gramPref|daaammm 8gramPref|daaammmn 9gramPref|daaammmnn prevword| prevcurr||daaammmnn nextword|. nextword|. currnext|daaammmnn|. Word|Daaammmnn Lower|daaammmnn Xxdshape|Xxxxxxxxx charclass|1,2,2,2,2,2,2,2,2, first-initcap prevnext||. t=0
...ANSWER
Answered 2017-Jun-05 at 12:48What is specified after the label is a list of feature-name and feature-value. It is in a sparse representation instead of tabular representation.
BigCluster is just one of the features and it's relevant to the specific example only. You should create your own features if you are training from scratch.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install jcrfsuite
You can use jcrfsuite like any standard Java library. Please include the the jar files in your classpath. You can also use any IDE and you can run and debug the jcrfsuite component as you would do with any other Java program. Best practice is to use a build tool that supports dependency management such as Maven or Gradle. For Maven installation, please refer maven.apache.org. For Gradle installation, please refer gradle.org .
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page