treeinterpreter

 by   andosa Python Version: 0.2.3 License: BSD-3-Clause

kandi X-RAY | treeinterpreter Summary

kandi X-RAY | treeinterpreter Summary

treeinterpreter is a Python library. treeinterpreter has no vulnerabilities, it has build file available, it has a Permissive License and it has low support. However treeinterpreter has 1 bugs. You can install using 'pip install treeinterpreter' or download it from GitHub, PyPI.

treeinterpreter
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              treeinterpreter has a low active ecosystem.
              It has 663 star(s) with 134 fork(s). There are 25 watchers for this library.
              OutlinedDot
              It had no major release in the last 12 months.
              There are 18 open issues and 4 have been closed. On average issues are closed in 62 days. There are 6 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of treeinterpreter is 0.2.3

            kandi-Quality Quality

              treeinterpreter has 1 bugs (0 blocker, 0 critical, 1 major, 0 minor) and 20 code smells.

            kandi-Security Security

              treeinterpreter has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              treeinterpreter code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              treeinterpreter is licensed under the BSD-3-Clause License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              treeinterpreter releases are not available. You will need to build from source code and install.
              Deployable package is available in PyPI.
              Build file is available. You can build the component from source.
              treeinterpreter saves you 104 person hours of effort in developing the same functionality from scratch.
              It has 265 lines of code, 14 functions and 6 files.
              It has high code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi has reviewed treeinterpreter and discovered the below as its top functions. This is intended to give you an instant insight into treeinterpreter implemented functionality, and help decide if they suit your requirements.
            • Predict the decision tree .
            • Predict a forest .
            • Predict class for given model .
            • Get the list of paths between node_id .
            • Calculate the mean of an iterative .
            • Calculate the aggregated contribution of a distribution .
            Get all kandi verified functions for this library.

            treeinterpreter Key Features

            No Key Features are available at this moment for treeinterpreter.

            treeinterpreter Examples and Code Snippets

            No Code Snippets are available at this moment for treeinterpreter.

            Community Discussions

            QUESTION

            A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
            Asked 2020-Dec-04 at 18:52

            I read this blog

            https://www.dataquest.io/blog/settingwithcopywarning/

            And I think I'm so near to find the issue so can anyone help me to solve this according to this blog

            ...

            ANSWER

            Answered 2020-Dec-04 at 18:52

            It's just a warning. I will suggest you to use higher and latest version of pandas.

            You should assign like this:

            Source https://stackoverflow.com/questions/65148662

            QUESTION

            How can I get treeinterpreter's Tree Contributions, if we are using a Pipeline?
            Asked 2020-Dec-03 at 07:31

            I am using sklearns' pipeline function, to one hot encode, and to model. Almost exactly as in this post.

            After using a Pipeline, I am not able to get tree contributions anymore. Getting this error:

            AttributeError: 'Pipeline' object has no attribute 'n_outputs_'

            I tried to play around with the parameters of the treeinterpreter, but I am stuck.

            Therefore my question: is there any way how we can get the contributions out of a Tree, when we are using sklearns Pipeline?

            EDIT 2 - Real data as requested by Venkatachalam:

            ...

            ANSWER

            Answered 2020-Nov-30 at 17:01

            To access the Pipeline's fitted model, just retrieve the ._final_estimator attribute from your pipeline

            Source https://stackoverflow.com/questions/65040249

            QUESTION

            Understanding the output of the TreeInterpreter with RandomForestClassifier
            Asked 2018-Aug-26 at 10:51

            I have applied random forest classifier to get the feature that contributed for a specific row in a dateset. However, I get 2 values for the feature, instead of one. I am not quite sure why. Here is my code.

            ...

            ANSWER

            Answered 2018-Feb-19 at 00:02

            You are getting arrays of length 2 for bias and feature contributions for the very simple reason that you have a 2-class classification problem.

            As explained clearly in this blog post by the package creators, in the 3-class case of the iris dataset you get arrays of length 3 (i.e. one array element for each class):

            Source https://stackoverflow.com/questions/48826588

            QUESTION

            Lime vs TreeInterpreter for interpreting decision tree
            Asked 2018-Apr-28 at 09:21

            Lime source: https://github.com/marcotcr/lime

            treeinterpreter source: tree interpreter

            I am trying to understand how the DecisionTree made its predictions using Lime and treeinterpreter. While both claim they are able to interpret the decision tree in their description. It seems like both interpret the same DecisionTree in different ways. That is, the feature contribution order. How is that possible? if both are looking at the same thing and are trying to describe the same event but assign importance in difference order.

            Who should we trust? Especially where the top feature does matter in prediction.

            The code for tree

            ...

            ANSWER

            Answered 2018-Feb-25 at 16:02
            Why is it possible for the two approaches to have different results?

            Lime: A short explanation of how it works, taken from their github page:

            Intuitively, an explanation is a local linear approximation of the model's behaviour. While the model may be very complex globally, it is easier to approximate it around the vicinity of a particular instance. While treating the model as a black box, we perturb the instance we want to explain and learn a sparse linear model around it, as an explanation. The figure below illustrates the intuition for this procedure. The model's decision function is represented by the blue/pink background, and is clearly nonlinear. The bright red cross is the instance being explained (let's call it X). We sample instances around X, and weight them according to their proximity to X (weight here is indicated by size). We then learn a linear model (dashed line) that approximates the model well in the vicinity of X, but not necessarily globally.

            There is much more detailed information in various links on the github page.

            treeinterpreter: An explanation of how this one works is available on http://blog.datadive.net/interpreting-random-forests/ (this is for regression; an example for classification, which works very similarly, can be found here).

            In short: suppose we have a node that compares feature F to some value and splits instances based on that. Suppose that 50% of all instances reaching that node belong to class C. Suppose we have a new instance, and it ends up getting assigned to the left child of this node, where now 80% of all instances belong to class C. Then, the contribution of feature F for this decision is computed as 0.8 - 0.5 = 0.3 (plus additional terms if there are more nodes along the path to leaf that also use feature F).

            Comparison: The important thing to note is that Lime is a model-independent method (not specific to Decision Trees / RFs), which is based on local linear approximation. Treeinterpreter, on the other hand, specifically operates in a similar manner to the Decision Tree itself, and really looks at which features are actually used in comparisons by the algorithm. So they're really fundamentally doing quite different things. Lime says "a feature is important if we wiggle it a bit and this results in a different prediction". Treeinterpreter says "a feature is important if it was compared to a threshold in one of our nodes and this caused us to take a split that drastically changed our prediction".

            Which one to trust?

            This is difficult to answer definitively. They're probably both useful in their own way. Intuitively, you may be inclined to lean towards treeinterpreter at first glance, because it was specifically created for Decision Trees. However, consider the following example:

            • Root Node: 50% of instances class 0, 50% class 1. IF F <= 50, go left, otherwise go right.
            • Left Child: 48% of instances class 0, 52% class 1. Subtree below this.
            • Right Child: 99% of instances class 0, 1% of instances class 1. Subtree below this.

            This kind of setup is possible if the majority of instances go left, only some right. Now suppose we have an instance with F = 49 that got assigned to the left and ultimately assigned class 1. Treeinterpreter won't care that F was really close to ending up on the other side of the equation in the root node, and only assign a low contribution of 0.48 - 0.50 = -0.02. Lime will notice that changing F just a little bit would completely change the odds.

            Which one is right? That's not really clear. You could say that F was really important because if it had been only a little bit different the prediction would be different (then lime wins). You could also argue that F did not contribute to our final prediction because we hardly got any closer to a decision after inspecting its value, and still had to investigate many other features afterwards. Then treeinterpreter wins.

            To get a better idea here, it may help to also actually plot the learned Decision Tree itself. Then you can manually follow along its decision path and decide which features you think were important and/or see if you can understand why both Lime and treeinterpreter say what they say.

            Source https://stackoverflow.com/questions/48909418

            QUESTION

            How to know the features and their contributions of a specific sample in RandomForest
            Asked 2018-Feb-28 at 13:10

            How can I find the features and their contributions that impacted the prediction of a specific sample, say row 5

            Update

            thanks to @FatihAkici

            I can apply the TreeInterpreter now

            ...

            ANSWER

            Answered 2018-Feb-19 at 09:43

            Yes, you can know the features and their contributions (weight is not the right term) that impacted the prediction of a specific observation. This actually constitutes the decision_path for how it made that decision of that particular observation. What you are looking for is TreeInterpreter.

            The second question is: Why are there always two values for each variable and instance (such as [0.12 -0.12] for the first feature and first instance) that seem to be positive and negative, rather than one value of that feature?

            So my answer is: Each of these lists (such as [0.12 -0.12]) just represents the contribution of a feature to the final probability of an instance being in class 1 and class 2. Remember, features never dictate what class an instance must be in, but instead, they increase or decrease the final class probabilities of an instance. So 0.12 means that Feature 1 added 0.12 to the probability of instance 0 being in class 1, and reduced its probability of being in class 2 by 0.12. These values are always symmetric, meaning that whatever makes an instance more likely to be in class 1 makes it less likely to be in class 2.

            Similarly, Feature 2 reduced probability of instance 1 being class 1 by 0.05, and increased its probability of being class 2 by 0.05.

            So each feature contributed (added or reduced) to instance 1 being in class 1 by: 0.12, -0.05, 0.22, 0.14, 0.07, 0.01. Now add these all up to the bias of being in class 1 (0.49854), you get 1, which is the final probability of that instance being in class 1, as shown by the model output.

            Similarly, add all of the second values of each list, and add the bias of being in class 2 (0.50146), you get 0, which is the final probability of that instance being in class 2, as shown by the model output above.

            Finally repeat the exact same exercise for instance 1, i.e. add -0.03, -0.11, 0.06, 0, -0.06, -0.04 to the bias of 0.49854, you get 0.32, which is the P{instance1 = class1}. And add 0.03, 0.11, -0.06, 0, 0.06, 0.04 to the bias of 0.50146, you get 0.68, which is the P{instance1 = class2}. Therefore these numbers constitute a full trajectory of contributions, from the initial bias to the final classification probability of an instance.

            I had answered a very similar question at a conceptual level on datascience.stackexchange.com, please feel free to check it out by clicking here.

            Source https://stackoverflow.com/questions/48808985

            QUESTION

            Print maximum value name as string
            Asked 2018-Feb-22 at 17:51
            import sklearn
            import sklearn.datasets
            import sklearn.ensemble
            import numpy as np
            from treeinterpreter import treeinterpreter as ti
            
            
            iris = sklearn.datasets.load_iris()
            
            rf = sklearn.ensemble.RandomForestClassifier(n_estimators=500, random_state = 50 )
            rf.fit(iris.data, iris.target)
            
            instances =iris.data[100].reshape(1,-1)
            
            prediction, biases, contributions = ti.predict(rf, instances)
            
            
            for i in range(len(instances)):
            
                for c, feature in sorted(zip(contributions[i], 
                                             iris.feature_names), 
                                         key=lambda x: ~abs(x[0].any())):
            
                    print (feature, c)
            
            ...

            ANSWER

            Answered 2018-Feb-22 at 17:51

            You should be using c.max() instead c.all() if you want to get the max element of the array. This section of code should give you what you want:

            Source https://stackoverflow.com/questions/48912427

            QUESTION

            Sorting a zipped list in a for loop
            Asked 2018-Feb-19 at 13:25

            Here is my code if you want to run it

            ...

            ANSWER

            Answered 2018-Feb-19 at 13:25

            First of all, you need to cast your data in feasible format: contributions.shape is (1, 6, 2). Having contributions[0] makes it easy to iterate with zip:

            Source https://stackoverflow.com/questions/48865972

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install treeinterpreter

            You can install using 'pip install treeinterpreter' or download it from GitHub, PyPI.
            You can use treeinterpreter like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            Install
          • PyPI

            pip install treeinterpreter

          • CLONE
          • HTTPS

            https://github.com/andosa/treeinterpreter.git

          • CLI

            gh repo clone andosa/treeinterpreter

          • sshUrl

            git@github.com:andosa/treeinterpreter.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link