liblinear | large data for which with/without nonlinear mappings | Machine Learning library

 by   cjlin1 C++ Version: Current License: BSD-3-Clause

kandi X-RAY | liblinear Summary

kandi X-RAY | liblinear Summary

liblinear is a C++ library typically used in Artificial Intelligence, Machine Learning, Pytorch applications. liblinear has no bugs, it has no vulnerabilities, it has a Permissive License and it has medium support. You can download it from GitHub.

There are some large data for which with/without nonlinear mappings gives similar performances. Without using kernels, one can efficiently train a much larger set via linear classification/regression. These data usually have a large number of features. Document classification is an example. Warning: While generally liblinear is very fast, its default solver may be slow under certain situations (e.g., data not scaled or C is large). See Appendix B of our SVM guide about how to handle such cases. Warning: If you are a beginner and your data sets are not large, you should consider LIBSVM first.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              liblinear has a medium active ecosystem.
              It has 889 star(s) with 329 fork(s). There are 73 watchers for this library.
              OutlinedDot
              It had no major release in the last 6 months.
              There are 32 open issues and 13 have been closed. On average issues are closed in 101 days. There are 14 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of liblinear is current.

            kandi-Quality Quality

              liblinear has no bugs reported.

            kandi-Security Security

              liblinear has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

            kandi-License License

              liblinear is licensed under the BSD-3-Clause License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              liblinear releases are not available. You will need to build from source code and install.
              Installation instructions, examples and code snippets are available.

            Top functions reviewed by kandi - BETA

            kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
            Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of liblinear
            Get all kandi verified functions for this library.

            liblinear Key Features

            No Key Features are available at this moment for liblinear.

            liblinear Examples and Code Snippets

            No Code Snippets are available at this moment for liblinear.

            Community Discussions

            QUESTION

            Sklearn -> Using Precision Recall AUC as a scoring metric in cross validation
            Asked 2021-May-25 at 02:10

            I would like to use the AUC for the Precision and Recall curve as a metric to train my model. Do I need to make a specific scorer for this when using cross validation?

            Consider the below reproducible example. Note the imbalanced target variable.

            ...

            ANSWER

            Answered 2021-May-25 at 02:10

            "Average precision" is what you probably want, measuring a non-interpolated area under the PR curve. See the last few paragraphs of this example and this section of the User Guide.

            For the scorer, use "average_precision"; the metric function is average_precision_score.

            Source https://stackoverflow.com/questions/67678705

            QUESTION

            Logistic Regression Model in Python Has good Accuracy and Precision,but predictions are way off
            Asked 2021-May-14 at 22:54

            I built a Logistic Regression Model to predict Loan Acceptors. The dataset is 94% non acceptors and 6% acceptors. I've run several logistic regression models one with the original dataset, one after upsampling to 50/50 and removing some predictor variables, and one without the upsampling, but after removing some predictor variables.

            Model 1: Better than 90% accuracy, precision and recall on 25 feature columns. After running the model, I output the predicting to a different CSV (same people as original csv though) and it's returning 10,000 acceptors. My guess was this could be caused by overfitting? Wasn't sure, but then tried it on the same 94% non-acceptors and 6% acceptors, but with fewer variables (19 feature columns). This time the accuracy is 81%, but the precision is only 21%, while recall is 765 (for training and test). This time it only returns 8 total acceptors ( out of 18,000)

            Finally, I tried upsampling and upsampled to a balanced set. The accuracy is only 68% (which I can work with) and the precision and recall is 66% for both training and test. Ran the model then outputted the prediction to the csv file (again same people, different CSV file, not sure if that's messing it up) and this time it returned 0 acceptors.

            Does anyone have any advice on what is causing this and how I can fix this?

            I'm not sure which regression code would be most beneficial. I'm happy to post the upsampling code if that would be more helpful.

            ...

            ANSWER

            Answered 2021-May-14 at 22:54

            You don't use a validation set (test set in the code above). To fix it, let residuals = np.abs(y_test - y_hat_test) instead of using y_train.

            Also, it is useful to apply cross-validation to ensure that the model is consistently good.

            Source https://stackoverflow.com/questions/67541243

            QUESTION

            Sklearn Logistical Regression coef_ giving list of lists
            Asked 2021-Apr-12 at 15:43

            The goal of this program is to predict the number of stars from a number of features in a github repo.

            This works fine, good accuracy, now what I want, is to find the feature importance of these features. I see the coef_ being used a lot for this as it seems a simple solution.

            The issue I am having is that coef_ returns a lis of lists 3x10 (10 being the number of features as seen below)

            My question is, why are 3 different lists returned?

            ...

            ANSWER

            Answered 2021-Apr-12 at 15:43

            From the docs, coefs_ has shape (n_classes, n_features), which suggests that you have somehow fitted the model to have three classes. Check the unique values in Y_train, or model.classes_.

            Source https://stackoverflow.com/questions/67061077

            QUESTION

            Using StandardScaler as Preprocessor in Mlens Pipeline generates Classification Warning
            Asked 2021-Apr-06 at 21:50

            I am trying to scale my data within the crossvalidation folds of a MLENs Superlearner pipeline. When I use StandardScaler in the pipeline (as demonstrated below), I receive the following warning:

            /miniconda3/envs/r_env/lib/python3.7/site-packages/mlens/parallel/_base_functions.py:226: MetricWarning: [pipeline-1.mlpclassifier.0.2] Could not score pipeline-1.mlpclassifier. Details: ValueError("Classification metrics can't handle a mix of binary and continuous-multioutput targets") (name, inst_name, exc), MetricWarning)

            Of note, when I omit the StandardScaler() the warning disappears, but the data is not scaled.

            ...

            ANSWER

            Answered 2021-Apr-06 at 21:50

            You are currently passing your preprocessing steps as two separate arguments when calling the add method. You can instead combine them as follows:

            Source https://stackoverflow.com/questions/66959756

            QUESTION

            Precision calculation warning when using GridSearchCV for Logistic Regression
            Asked 2021-Mar-10 at 21:09

            I am trying to run GridSearchCV with the LogisticRegression estimator and record the model accuracy, precision, recall, f1 metrics.

            However, I get the following error on the precision metric:

            ...

            ANSWER

            Answered 2021-Mar-10 at 21:09

            From reading further into this issue, my understanding is that the error is occurring because not all the labels in my y_test are appearing in my y_pred. This is not the case for my data.

            I used the comment from G.Anderson to remove the warning (but it doesn't answer my question)

            • Created new custom_scorer object

            • Created customer_scoring dictionary

            • Updated GridSearchCV scoring and refit parameters

            Source https://stackoverflow.com/questions/66538197

            QUESTION

            ConvergenceWarning when running cross validation with SVM model
            Asked 2021-Mar-06 at 02:05

            I tried to train a LinearSVC model and evaluate it with cross_val_score on a linearly separable dataset that I created, but I'm getting an error.

            Here is a reproducible example:

            ...

            ANSWER

            Answered 2021-Mar-06 at 01:53

            This is not an error, but a warning, and it already contains some advice:

            increase the number of iterations

            which by default is 1000 (docs).

            Moreover, LinearSVC is a classifier, so using scoring="neg_mean_squared_error" (i.e. a regression metric) in cross_val_score makes no sense; see the documentation for a rough list of relevant metrics per kind of problem.

            So, with the following changes:

            Source https://stackoverflow.com/questions/66501258

            QUESTION

            Optimization solver Used for One vs rest in Sickit learn
            Asked 2021-Mar-04 at 00:26

            I am trying to solve a multiclass classification problem using Logistic regression. My dataset has 3 distinct classes, and each data point belongs to only one class. Here is the sample training_data;

            Here the first column is vector of ones I have added as bias term. And the target column has been binarized using the concept of label binarize, as mentioned in sickit-learn

            Then I got the target as follows;

            ...

            ANSWER

            Answered 2021-Mar-04 at 00:26
            1. It does not matter; as you might see here, either choosing multi_class='auto' or multi_class='ovr' will lead to same results whenever solver='liblinear'.
            2. In case solver='liblinear' a default bias term equal to 1 is used and appended to X via intercept_scaling attribute (which is in turn useful only if fit_intercept=True), as you can see here. You'll have the fitted bias (dimension (n_classes,)) returned by intercept_ after fitting (zero-valued if fit_intercept=False). Fitted coefficients are returned by coef_ (dimension (n_classes, n_features) and not (n_classes, n_features + 1) - splitting done here).

            Here an example, considering Iris dataset (having 3 classes and 4 features):

            Source https://stackoverflow.com/questions/66465103

            QUESTION

            Python Logistic Regression Y Value Issues
            Asked 2021-Feb-24 at 15:28

            I'm currently getting a mixture of the following errors:

            • ValueError: Unknown label type: 'unknown'
            • ValueError: Expected 2D array, got 1D array instead: array=[0. 0. 0. ... 1. 1. 1.]. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
            • TypeError: 'tuple' object is not callable

            When I search for others who have had the same issue, the answer usually leads me from one of the above errors to another. Below is a screenshot of my code. Lines 7-9 are the solutions I found for my errors that just lead to different errors. Comment out line 8 or 9 or both and it gives you the wrong shape error. Comment out all three and you get the label type unknown error.

            For line 7 I have tried bool, int, and float.

            ...

            ANSWER

            Answered 2021-Feb-24 at 15:28

            Line 9: In your code, please note that shape is a tuple and a property of the DataFrame object, i.e., you cannot call it but only access it; see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.shape.html

            Maybe you wanted to use reshape there?

            Line 7: astype(float) changes the type of the columns to float (see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.astype.html); if you want to replace Yes and No with True and False respectively, you could set it as such on line 1 and 2. After that, you can use df = df.astype(bool) to set the type to bool.

            Example:

            Source https://stackoverflow.com/questions/66333392

            QUESTION

            Sklearn Linear SVM cannot train in multilabel classification
            Asked 2021-Feb-17 at 11:32

            I want to train linear SVM with multilabel classification with the following code:

            ...

            ANSWER

            Answered 2021-Feb-17 at 11:32

            It seems like a "TicTacToe" dataset (from the filename and the format).

            Assuming that the first nine columns of the datset provide the description of the 9 cells in a specific moment of the game and that the other nine represent the cells corresponding to the good moves, you can train a classifier cell by cell, in order to predict if a cell is a good move or not.

            So, you actually need to train 9 binary classifiers, not one. I sketched a very simple approach in the following code based on this idea. Start with simple cross-validation, after splitting the dataset in train/test (80/20):

            Source https://stackoverflow.com/questions/66219835

            QUESTION

            How to deal with convergence warning when using LinearSVC in sklearn?
            Asked 2021-Jan-10 at 02:13

            I got a convergence warning using linear support vector machine in Scikit learn with breast cancer data.

            Below is the code:

            ...

            ANSWER

            Answered 2021-Jan-10 at 02:13

            svm methods are distance based, and your columns are on different scales. so it makes sense to scale the data first before fitting the model. See more at post such as this or this

            So if we do it again with scaling :

            Source https://stackoverflow.com/questions/65649337

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install liblinear

            See the section ``Installation'' for installing LIBLINEAR. After installation, there are programs `train' and `predict' for training and testing, respectively. About the data format, please check the README file of LIBSVM. Note that feature index must start from 1 (but not 0). A sample classification data included in this package is `heart_scale'. Type `train heart_scale', and the program will read the training data and output the model file `heart_scale.model'. If you have a test set called heart_scale.t, then type `predict heart_scale.t heart_scale.model output' to see the prediction accuracy. The `output' file contains the predicted class labels. For more information about `train' and `predict', see the sections `train' Usage and `predict' Usage. To obtain good performances, sometimes one needs to scale the data. Please check the program `svm-scale' of LIBSVM. For large and sparse data, use `-l 0' to keep the sparsity.
            On Unix systems, type `make' to build the `train', `predict', and `svm-scale' programs. Run them without arguments to show the usages. On other systems, consult `Makefile' to build them (e.g., see 'Building Windows binaries' in this file) or use the pre-built binaries (Windows binaries are in the directory `windows'). This software uses some level-1 BLAS subroutines. The needed functions are included in this package. If a BLAS library is available on your machine, you may use it by modifying the Makefile: Unmark the following line. The tool `svm-scale', borrowed from LIBSVM, is for scaling input data file.
            Windows binaries are available in the directory `windows'. To re-build them via Visual C++, use the following steps:. "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvars64.bat". You may have to modify the above command according which version of VC++ or where it is installed. nmake -f Makefile.win clean all. nmake -f Makefile.win lib.
            Open a dos command box and change to liblinear directory. If environment variables of VC++ have not been set, type
            Type
            (optional) To build shared library liblinear.dll, type
            (Optional) To build 32-bit windows binaries, you must (1) Setup "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvars32.bat" instead of vcvars64.bat (2) Change CFLAGS in Makefile.win: /D _WIN64 to /D _WIN32

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/cjlin1/liblinear.git

          • CLI

            gh repo clone cjlin1/liblinear

          • sshUrl

            git@github.com:cjlin1/liblinear.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link