kandi X-RAY | python-info Summary
kandi X-RAY | python-info Summary
python-info
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of python-info
python-info Key Features
python-info Examples and Code Snippets
Community Discussions
Trending Discussions on python-info
QUESTION
I want to calculate the information gain
on 20_newsgroup
data set.
I am using the code here(also I put a copy of the code down of the question).
As you see the input to the algorithm is X,y
My confusion is that, X
is going to be a matrix
with documents
in rows and features as column
. (according to 20_newsgroup it is 11314,1000
in case i only considered 1000 features).
but according to the concept of information gain, it should calculate information gain for each feature.
(So I was expecting to see the code in a way loop through each feature, so the input to the function be a matrix where rows are features and columns are class)
But X is not feature here but X stands for documents, and I can not see the part in the code that take care of this part! ( I mean considering each document, and then going through each feature of that document; like looping through rows but at the same time looping through columns as the features are stored in columns).
I have read this and this and many similar questions but they are not clear in terms of input matrix shape.
this is the code for reading 20_newsgroup:
...ANSWER
Answered 2018-Nov-28 at 16:59Well, after going through the code in detail, I learned more about X.T.nonzero()
.
Actually it is correct that information gain needs to loop through features.
Also it is correct that the matrix scikit-learn
give us here is based on doc-features
.
But:
in code it uses X.T.nonzero()
which technically transform all the nonzero values into array. and then in the next row loop through the length of that array range(0, len(X.T.nonzero()[0])
.
Overall, this part X.T.nonzero()[0] is returning all the none zero features to us :)
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install python-info
You can use python-info like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page