CAMeLBERT | Task Type in Arabic Pre | Natural Language Processing library
kandi X-RAY | CAMeLBERT Summary
kandi X-RAY | CAMeLBERT Summary
This repo contains code for the experiments presented in our paper: The Interplay of Variant, Size, and Task Type in Arabic Pre-trained Language Models.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Train model
- Evaluate the model
- Calculates the acc and F1D and F1D
- Compute the metrics for the given task
- Loads the training examples from the given directory
- Create a list of InputExample objects
- Process Tweet
- Get dev examples
- Create input examples from text
- Load train examples from the given directory
- Get dev examples from MADAR
- Get test examples from arSAS
- Gets test examples from the tsv file
- Gets test examples from MADAR
- Loads the test examples from the data directory
- Parses the MADAR dataset
- Returns a list of dev examples
- Evaluate a model
- Write the most common prediction to a file
- Return a dict of user prediction predictions
- Get dev examples from tsv file
- Gets training examples from the training data directory
- Reads the labels from file
- Get train examples from the given directory
- Get dev examples from a directory
- Get test examples from MADAR
CAMeLBERT Key Features
CAMeLBERT Examples and Code Snippets
Community Discussions
Trending Discussions on CAMeLBERT
QUESTION
ANSWER
Answered 2022-Mar-16 at 11:34The script you are using loads the labels from $DATA_DIR/train.txt
.
See https://github.com/CAMeL-Lab/CAMeLBERT/blob/master/token-classification/run_token_classification.py#L105 for what the model expects.
It then tries to load the label list as first file file from the corpus (even before loading the training data), see https://github.com/CAMeL-Lab/CAMeLBERT/blob/master/token-classification/run_token_classification.py#L183 and put it into label_map.
But that fails for some reason. My assumption would be that it doensnt find anything and label_map is an empty dict, so the first attempt to get the labels from it fails with KeyError. Probably either your input data is not there or not in the path as expected (check if you have the right files and the right value for $DATA_DIR
). From my experience relative paths in Google Drive can be tricky. Try something simple to see if it works, like os.listdir($DATA_DIR)
to see if that is actually the directly you expect it to be.
If that is not the problem then probably something about the labels is actually wrong. Does ANERCorp use this exact way of writing labels (B-LOC
etc.)? If it is different (e.g. B-Location
or something) it would fail too.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install CAMeLBERT
You can use CAMeLBERT like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page