liblinear | large data for which with/without nonlinear mappings | Machine Learning library
kandi X-RAY | liblinear Summary
kandi X-RAY | liblinear Summary
There are some large data for which with/without nonlinear mappings gives similar performances. Without using kernels, one can efficiently train a much larger set via linear classification/regression. These data usually have a large number of features. Document classification is an example. Warning: While generally liblinear is very fast, its default solver may be slow under certain situations (e.g., data not scaled or C is large). See Appendix B of our SVM guide about how to handle such cases. Warning: If you are a beginner and your data sets are not large, you should consider LIBSVM first.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of liblinear
liblinear Key Features
liblinear Examples and Code Snippets
Community Discussions
Trending Discussions on liblinear
QUESTION
I would like to use the AUC for the Precision and Recall curve as a metric to train my model. Do I need to make a specific scorer for this when using cross validation?
Consider the below reproducible example. Note the imbalanced target variable.
...ANSWER
Answered 2021-May-25 at 02:10"Average precision" is what you probably want, measuring a non-interpolated area under the PR curve. See the last few paragraphs of this example and this section of the User Guide.
For the scorer, use "average_precision"
; the metric function is average_precision_score
.
QUESTION
I built a Logistic Regression Model to predict Loan Acceptors. The dataset is 94% non acceptors and 6% acceptors. I've run several logistic regression models one with the original dataset, one after upsampling to 50/50 and removing some predictor variables, and one without the upsampling, but after removing some predictor variables.
Model 1: Better than 90% accuracy, precision and recall on 25 feature columns. After running the model, I output the predicting to a different CSV (same people as original csv though) and it's returning 10,000 acceptors. My guess was this could be caused by overfitting? Wasn't sure, but then tried it on the same 94% non-acceptors and 6% acceptors, but with fewer variables (19 feature columns). This time the accuracy is 81%, but the precision is only 21%, while recall is 765 (for training and test). This time it only returns 8 total acceptors ( out of 18,000)
Finally, I tried upsampling and upsampled to a balanced set. The accuracy is only 68% (which I can work with) and the precision and recall is 66% for both training and test. Ran the model then outputted the prediction to the csv file (again same people, different CSV file, not sure if that's messing it up) and this time it returned 0 acceptors.
Does anyone have any advice on what is causing this and how I can fix this?
I'm not sure which regression code would be most beneficial. I'm happy to post the upsampling code if that would be more helpful.
...ANSWER
Answered 2021-May-14 at 22:54You don't use a validation set (test set in the code above). To fix it, let
residuals = np.abs(y_test - y_hat_test)
instead of using y_train
.
Also, it is useful to apply cross-validation to ensure that the model is consistently good.
QUESTION
The goal of this program is to predict the number of stars from a number of features in a github repo.
This works fine, good accuracy, now what I want, is to find the feature importance of these features. I see the coef_ being used a lot for this as it seems a simple solution.
The issue I am having is that coef_ returns a lis of lists 3x10 (10 being the number of features as seen below)
My question is, why are 3 different lists returned?
...ANSWER
Answered 2021-Apr-12 at 15:43From the docs, coefs_
has shape (n_classes, n_features)
, which suggests that you have somehow fitted the model to have three classes. Check the unique values in Y_train
, or model.classes_
.
QUESTION
I am trying to scale my data within the crossvalidation folds of a MLENs Superlearner pipeline. When I use StandardScaler in the pipeline (as demonstrated below), I receive the following warning:
/miniconda3/envs/r_env/lib/python3.7/site-packages/mlens/parallel/_base_functions.py:226: MetricWarning: [pipeline-1.mlpclassifier.0.2] Could not score pipeline-1.mlpclassifier. Details: ValueError("Classification metrics can't handle a mix of binary and continuous-multioutput targets") (name, inst_name, exc), MetricWarning)
Of note, when I omit the StandardScaler() the warning disappears, but the data is not scaled.
...ANSWER
Answered 2021-Apr-06 at 21:50You are currently passing your preprocessing steps as two separate arguments when calling the add method. You can instead combine them as follows:
QUESTION
I am trying to run GridSearchCV with the LogisticRegression estimator and record the model accuracy, precision, recall, f1 metrics.
However, I get the following error on the precision metric:
...ANSWER
Answered 2021-Mar-10 at 21:09From reading further into this issue, my understanding is that the error is occurring because not all the labels in my y_test are appearing in my y_pred. This is not the case for my data.
I used the comment from G.Anderson to remove the warning (but it doesn't answer my question)
Created new custom_scorer object
Created customer_scoring dictionary
Updated GridSearchCV scoring and refit parameters
QUESTION
I tried to train a LinearSVC model and evaluate it with cross_val_score
on a linearly separable dataset that I created, but I'm getting an error.
Here is a reproducible example:
...ANSWER
Answered 2021-Mar-06 at 01:53This is not an error, but a warning, and it already contains some advice:
increase the number of iterations
which by default is 1000 (docs).
Moreover, LinearSVC
is a classifier, so using scoring="neg_mean_squared_error"
(i.e. a regression metric) in cross_val_score
makes no sense; see the documentation for a rough list of relevant metrics per kind of problem.
So, with the following changes:
QUESTION
I am trying to solve a multiclass classification problem using Logistic regression. My dataset has 3 distinct classes, and each data point belongs to only one class. Here is the sample training_data;
Here the first column is vector of ones I have added as bias term. And the target column has been binarized using the concept of label binarize, as mentioned in sickit-learn
Then I got the target as follows;
...ANSWER
Answered 2021-Mar-04 at 00:26- It does not matter; as you might see here, either choosing
multi_class='auto'
ormulti_class='ovr'
will lead to same results wheneversolver='liblinear'
. - In case
solver='liblinear'
a default bias term equal to 1 is used and appended to X viaintercept_scaling
attribute (which is in turn useful only iffit_intercept=True
), as you can see here. You'll have the fitted bias (dimension(n_classes,)
) returned byintercept_
after fitting (zero-valued iffit_intercept=False
). Fitted coefficients are returned bycoef_
(dimension(n_classes, n_features)
and not(n_classes, n_features + 1)
- splitting done here).
Here an example, considering Iris dataset (having 3 classes and 4 features):
QUESTION
I'm currently getting a mixture of the following errors:
- ValueError: Unknown label type: 'unknown'
- ValueError: Expected 2D array, got 1D array instead: array=[0. 0. 0. ... 1. 1. 1.]. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
- TypeError: 'tuple' object is not callable
When I search for others who have had the same issue, the answer usually leads me from one of the above errors to another. Below is a screenshot of my code. Lines 7-9 are the solutions I found for my errors that just lead to different errors. Comment out line 8 or 9 or both and it gives you the wrong shape error. Comment out all three and you get the label type unknown error.
For line 7 I have tried bool, int, and float.
...ANSWER
Answered 2021-Feb-24 at 15:28Line 9: In your code, please note that shape
is a tuple and a property of the DataFrame
object, i.e., you cannot call it but only access it; see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.shape.html
Maybe you wanted to use reshape
there?
Line 7: astype(float)
changes the type of the columns to float
(see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.astype.html); if you want to replace Yes
and No
with True
and False
respectively, you could set it as such on line 1 and 2. After that, you can use df = df.astype(bool)
to set the type to bool
.
Example:
QUESTION
I want to train linear SVM with multilabel classification with the following code:
...ANSWER
Answered 2021-Feb-17 at 11:32It seems like a "TicTacToe" dataset (from the filename and the format).
Assuming that the first nine columns of the datset provide the description of the 9 cells in a specific moment of the game and that the other nine represent the cells corresponding to the good moves, you can train a classifier cell by cell, in order to predict if a cell is a good move or not.
So, you actually need to train 9 binary classifiers, not one. I sketched a very simple approach in the following code based on this idea. Start with simple cross-validation, after splitting the dataset in train/test (80/20):
QUESTION
I got a convergence warning using linear support vector machine in Scikit learn with breast cancer data.
Below is the code:
...ANSWER
Answered 2021-Jan-10 at 02:13Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install liblinear
On Unix systems, type `make' to build the `train', `predict', and `svm-scale' programs. Run them without arguments to show the usages. On other systems, consult `Makefile' to build them (e.g., see 'Building Windows binaries' in this file) or use the pre-built binaries (Windows binaries are in the directory `windows'). This software uses some level-1 BLAS subroutines. The needed functions are included in this package. If a BLAS library is available on your machine, you may use it by modifying the Makefile: Unmark the following line. The tool `svm-scale', borrowed from LIBSVM, is for scaling input data file.
Windows binaries are available in the directory `windows'. To re-build them via Visual C++, use the following steps:. "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvars64.bat". You may have to modify the above command according which version of VC++ or where it is installed. nmake -f Makefile.win clean all. nmake -f Makefile.win lib.
Open a dos command box and change to liblinear directory. If environment variables of VC++ have not been set, type
Type
(optional) To build shared library liblinear.dll, type
(Optional) To build 32-bit windows binaries, you must (1) Setup "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvars32.bat" instead of vcvars64.bat (2) Change CFLAGS in Makefile.win: /D _WIN64 to /D _WIN32
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page