newsqa | Tools for using Maluuba 's NewsQA Dataset | Dataset library
kandi X-RAY | newsqa Summary
kandi X-RAY | newsqa Summary
We originally only compiled to CSV but now we also build a JSON file.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Pack the dataset as JSON
- Calculate the consensus answer
- Returns a list of answers
- Returns a dict of the training data
- Refine the answers in the given token list
- Check if two spans overlap
- Find the most overlap between the given ranges
- Tokenize the newsQA dataset
- Unpack a dataset
- Pack a tqdm task
- Returns the answer length of the answer
- Get the answers from the dataset
- Load the combined questions from a file
- Get a logger
- Create a list of spans from the tagged text
- Rebase a list of spans
- Compute the average answer length for each question
- Gather questions and answers
- Loads data from the newsQA dataset
- Write data to a csv file
- Get the number of words in question
- Returns a list of valid spans from a string
- Return a list of spans from a string
- Simplify training data
- Extract version number from path
- Convert a spanrack to a string
newsqa Key Features
newsqa Examples and Code Snippets
Community Discussions
Trending Discussions on newsqa
QUESTION
I'm trying to work on the code in this GitHub repository to process datasets from News articles. I'm following their docker installation steps and the first two execute without any errors.
However, with the third one, docker run --rm -it -v ${PWD}:/usr/src/newsqa --name newsqa maluuba/newsqa python maluuba/newsqa/data_generator.py
,
I get the following error:
ANSWER
Answered 2018-Nov-18 at 09:06The issue is that if you run it like this:
docker run --rm -it -v ${PWD}:/usr/src/newsqa --name newsqa maluuba/newsqa python maluuba/newsqa/data_generator.py
bash
never enters the picture, therefore the correct version of the Python environment is never chosen (in fact, only Python will be running, no shell at all).
An easy fix is to invoke it like this:
docker run --rm -it -v ${PWD}:/usr/src/newsqa --name newsqa newsqa /bin/bash --login -c "python maluuba/newsqa/data_generator.py"
which will execute it via bash with the --login
option will also source the necessary environment.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install newsqa
Clone this repo.
Download the tar.gz file for the questions and answers from here to the maluuba/newsqa folder. No need to extract anything.
Download the CNN stories from here to the maluuba/newsqa folder (for legal and technical reasons, we can't distribute this to you).
Use Python 2.7 to package the dataset (Python 2.7 was originally used to handle the stories and they got encoded strangely - once the dataset is packaged by these scripts, you should be able to load the files with whatever tools you'd like). You can create a Conda environment like so:
Install the requirements in your environment:
(Optional - Tokenization) To tokenize the data, you must install a JDK (Java Development Kit) so that you can compile and run Java code.
(Optional - Tokenization) To tokenize the data, you must get some JAR files. We use some libraries from Stanford. You just need to put the English option of version 3.6.0 in the maluuba/newsqa folder.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page