fakenewschallenge | UCL Machine Reading - FNC-1 Submission | Machine Learning library
kandi X-RAY | fakenewschallenge Summary
kandi X-RAY | fakenewschallenge Summary
UCL Machine Reading - FNC-1 Submission
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Train the model .
- Create a pipeline .
- Reads a table from a file .
- Save predictions to file .
- Initialize instance attributes .
- Loads the checkpoint .
fakenewschallenge Key Features
fakenewschallenge Examples and Code Snippets
Community Discussions
Trending Discussions on fakenewschallenge
QUESTION
The solution I found that works in my case is posted below. Hope this helps someone. How would I concatenate the output of TF-IDF created with sklearn to be passed into a Keras model or tensor that could then be fed into a dense neural network? I'm working on the FakeNewsChallenge dataset. Any guidance would be helpful.
The FakeNewsChallenge dataset is as such:
Training Set - [Headline, Body text, label]
- Training Set is split into two different CSVs (train_bodies, train_stances) and are linked by BodyIDs.
- train_bodies - [Body ID (num), articleBody (text)]
- train_stances - [Headline (text), Body ID (num), Stance (text)]
Test Set - [Headline, Bodytext]
- Test set is split into two different CSVs (test_stances_inlabled, test_bodies]
- Test_bodies - [Body ID, aritcleBody]
- Test_stances_unlabled - [Headline, Body ID]
Distribution makes it extremely hard:
- rows - 49972
- unrelated - 0.73131
- discuss - 0.17828
- agree - 0.076012
- disagree - 0.0168094
Stance - [ unrelated, discuss, agree, disagree]
What I would like to do is concatenate two separate TF-IDF Vectors as well as other features that I can then feed into a some layer for instance a dense layer. How would you go about that? I
...ANSWER
Answered 2020-Aug-17 at 21:30There was a comment prior to mine that answered the question but I do not see the comment anymore. I apparently forgot about this method, but was using it in other areas of my program.
You use the numpy.hstack(tup) or numpy.vstack(tup), where
tup - sequence of ndarrays
The arrays must have the same shape along all but the second axis, except 1-D arrays which can be any length.
It returns a stacked: ndarray.
Here is some code just incase.
Note: I do not have cosine similarity calculation here. Do that however you want. I'm trying to do this fast but also as clear as possible. Hope this helps someone.
QUESTION
The picture above is what I'm trying to replicate. I just don't know if I'm going about it the right way. I'm working with the FakeNewsChallenge dataset and its extremely unbalanced, and I'm trying to replicate and improve on a method used in a paper.
Agree - 7.36%
Disagree - 1.68%
Discuss - 17.82%
Unrelated - 73.13%
I'm splitting the data in this way:
(split dataset into 67/33 split)
- train 67%, test 33%
(split training further 80/20 for validation)
- training 80%, validation 20%
(Then split training and validation using 3 fold cross validation set)
As an aside, getting that 1.68% of disagree and agree has been extremely difficult.
This is where I'm having an issue as it's not making total sense to me. Is the validation set created in the 80/20 split being stratified as well in the 5fold?
Here is where I am at currently:
Split data into 67% Training Set and 33% Test Set
...ANSWER
Answered 2020-Aug-16 at 06:14You need to add one more parameter in the function 'train_test_split()':
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install fakenewschallenge
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page