featuretools | open source python library for automated feature engineering | Machine Learning library

 by   alteryx Python Version: 1.31.0 License: BSD-3-Clause

kandi X-RAY | featuretools Summary

kandi X-RAY | featuretools Summary

featuretools is a Python library typically used in Institutions, Learning, Education, Artificial Intelligence, Machine Learning, Deep Learning applications. featuretools has no bugs, it has no vulnerabilities, it has a Permissive License and it has medium support. However featuretools build file is not available. You can install using 'pip install featuretools' or download it from GitHub, PyPI.

"One of the holy grails of machine learning is to automate more and more of the feature engineering process." ― Pedro Domingos, A Few Useful Things to Know about Machine Learning. Featuretools is a python library for automated feature engineering. See the documentation for more information.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              featuretools has a medium active ecosystem.
              It has 6658 star(s) with 841 fork(s). There are 156 watchers for this library.
              There were 3 major release(s) in the last 12 months.
              There are 170 open issues and 780 have been closed. On average issues are closed in 178 days. There are 5 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of featuretools is 1.31.0

            kandi-Quality Quality

              featuretools has 0 bugs and 0 code smells.

            kandi-Security Security

              featuretools has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              featuretools code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              featuretools is licensed under the BSD-3-Clause License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              featuretools releases are available to install and integrate.
              Deployable package is available in PyPI.
              featuretools has no build file. You will be need to create the build yourself to build the component from source.
              Installation instructions, examples and code snippets are available.
              It has 24109 lines of code, 1854 functions and 182 files.
              It has high code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi has reviewed featuretools and discovered the below as its top functions. This is intended to give you an instant insight into featuretools implemented functionality, and help decide if they suit your requirements.
            • Calculate DeepFeatureSynthesization
            • Calculate the chunk size
            • Calculate the feature matrix
            • Calculates a single chunk of data from an entity set
            • Calculate the features for each feature
            • Check whether a feature can be aggregated
            • Update the feature columns
            • Agg aggregation function
            • Check that all notebooks have the same output
            • Load feature plots
            • Calculate the holiday function
            • Deserialize a primitive
            • Standardize notebooks
            • Returns the number of unique words
            • Load primitive primitives
            • Calculate DirectFeatures based on child features
            • Calculate the groupby features
            • Handle a relationship path
            • Lists available transformations
            • Calculate transformation features
            • Read an entity set
            • Write a data description to an entity set
            • Return a function to calculate the mean value of the time series
            • Return a pandas DataFrame of free domains
            • Handle a relationship
            • Return the column schema for this feature
            Get all kandi verified functions for this library.

            featuretools Key Features

            No Key Features are available at this moment for featuretools.

            featuretools Examples and Code Snippets

            Featuretools TSFresh Primitives,Combining Primitives
            Pythondot img1Lines of Code : 26dot img1License : Permissive (MIT)
            copy iconCopy
            import featuretools as ft
            from featuretools.tsfresh import AggAutocorrelation, Mean
            
            entityset = ft.demo.load_mock_customer(return_entityset=True)
            agg_primitives = [Mean, AggAutocorrelation(f_agg='mean', maxlag=5)]
            feature_matrix, features = ft.dfs(e  
            Featuretools TSFresh Primitives,Calculating Features
            Pythondot img2Lines of Code : 11dot img2License : Permissive (MIT)
            copy iconCopy
            from tsfresh.feature_extraction.feature_calculators import agg_autocorrelation
            
            data = list(range(10))
            param = [{'f_agg': 'mean', 'maxlag': 5}]
            agg_autocorrelation(data, param=param)
            
            [('f_agg_"mean"__maxlag_5', 0.1717171717171717)]
            
            from featuretool  

            Community Discussions

            QUESTION

            Featuretools - unable to add relationship in Entityset
            Asked 2022-Feb-04 at 16:43

            I'm writing a notebook using this data from Kaggle. Here's a screenshot of the two tables just to show we have ID columns in both.

            Here's my code when trying to set up the Entity Set and add a relationship.

            ...

            ANSWER

            Answered 2022-Feb-04 at 13:41

            If you are adding a relationship to an EntitySet by passing in a Relationship object, you need to make sure to use the relationship keyword in your call like this:

            Source https://stackoverflow.com/questions/70986231

            QUESTION

            Proper way to pass cutoff_time to dfs in featuretools 1.0.0
            Asked 2021-Nov-02 at 18:00

            Recently I've updated featuretools to v1.0.0 and faced the following issue. I have instances that vary within time and I want to build time dependent features for them. Besides, I want to save some historical characteristics of those instances. So my cutoff time dataset consists of such columns as: time, instance_id and feature1, feature2, ..., target

            When I tried to to run dfs, I got the error 'NoneType' object has no attribute 'logical_types'

            I have found out that it is caused by the inner function get_ww_types_from_features

            It tries to get the column types of cutoff time df, assuming it has woodwork type

            ...

            ANSWER

            Answered 2021-Oct-29 at 11:35
            cutoff_times = pd.DataFrame()
            cutoff_times['customer_id'] = [1, 2, 3, 1]
            cutoff_times['time'] = pd.to_datetime(['2014-1-1 04:00',
                                         '2014-1-1 05:00',
                                         '2014-1-1 06:00',
                                         '2014-1-1 08:00'])
            cutoff_times['label'] = [True, True, False, True]
            cutoff_times
            fm, features = ft.dfs(entityset=es,`enter code here`
                                  target_dataframe_name='customers',
                                  cutoff_time=cutoff_times,
                                  cutoff_time_in_index=True)
            fm
            

            Source https://stackoverflow.com/questions/69768008

            QUESTION

            [featuretools]'EntitySet' object has no attribute 'entity_from_dataframe'
            Asked 2021-Oct-21 at 17:27

            I tried to learn featuretools following documentation from featuretools.com.

            A error came up: AttributeError: 'EntitySet' object has no attribute 'entity_from_dataframe'

            Could you help me? Thank you.

            Code:

            ...

            ANSWER

            Answered 2021-Oct-21 at 17:08

            The documentation you are using is for an older version of Featuretools. You can find the updated Getting Started documentation that works with Featuretools version 1.0 here: https://featuretools.alteryx.com/en/stable/getting_started/getting_started_index.html

            Source https://stackoverflow.com/questions/69665765

            QUESTION

            Unable to add relationship in featuretools entity set
            Asked 2021-Oct-21 at 01:43

            New to feature tools, getting this error while creating entity

            ...

            ANSWER

            Answered 2021-Oct-20 at 15:07

            Yes, Featuretools generally expects a one to many relationship between tables in an EntitySet, which is why the child column cannot be the index of its table.

            There's not a way to override this in relationship creation, but you can take steps to use a different index column in the child dataframe, allowing order_id to be the child column of the relationship.

            You could create a new index column in prejoin_foodorder by setting make_index=True and the index to be some column name that's not in the DataFrame when adding the table to the EntitySet. This will create a new integer column in the DataFrame that ranges from 0 to the length of the dataframe. That column will then be used as the DataFrame's index, leaving order_id to be used as the child column of a relationship.

            Source https://stackoverflow.com/questions/69612666

            QUESTION

            TypeError: import_optional_dependency() got an unexpected keyword argument 'errors'
            Asked 2021-Oct-08 at 03:00

            I am trying to work with Featuretools to develop an automated feature engineering workflow for the customer churn dataset. The end outcome is a function that takes in a dataset and label times for customers and builds a feature matrix that can be used to train a machine learning model.

            As part of this exercise I am trying to execute the below code for plotting a histogram and got "TypeError: import_optional_dependency() got an unexpected keyword argument 'errors' ". Please help resolve this TypeError.

            ...

            ANSWER

            Answered 2021-Sep-14 at 20:32

            Try to upgrade pandas:

            Source https://stackoverflow.com/questions/69148495

            QUESTION

            Features Created by FeatureTools Build Inconsistent Models
            Asked 2021-Aug-02 at 16:31

            I have an imbalanced dataset which has 200 million data from class 0 and 8000 data from class 1. I followed two different approaches to build a model.

            1. Randomly sample a new dataset which has a ratio of 1:4. Meaning 32000 from class 0 and 8000 from class 1. Then use featuretools to generate features(70 features generated in my case) and split dataset into train and test set with test_size = 0.2 and stratify minority class. Build a model with Random Forest algorithm and predict the test set.

            Code:

            ...

            ANSWER

            Answered 2021-Aug-02 at 16:31

            I agree the model possibly overfitted and failed to generalize given the new personal id. I suggest passing the labels in with the cutoff times to get a more structured training and testing set. I'll go through a quick example using this data.

            Source https://stackoverflow.com/questions/68549831

            QUESTION

            Is it possible to calculate a feature matrix only for test data?
            Asked 2021-Jul-14 at 14:52

            I have more than 100,000 rows of training data with timestamps and would like to calculate a feature matrix for new test data, of which there are only 10 rows. Some of the features in the test data will end up aggregating some of the training data. I need the implementation to be fast since this is one step in a real-time inference pipeline.

            I can think of two ways this can be implemented:

            1. Concatenating the train and test entity sets and running DFS and then only using the last 10 rows and throwing away the rest. This is very time consuming. Is there a way to calculate a subset of an entity set while using data from the entire entity set?

            2. Using the steps outlined in Calculating Feature Matrix for New Data section on the Featuretools Deployment page. However, as demonstrated below, this doesn't seem to work.

            Create all/train/test entity sets:

            ...

            ANSWER

            Answered 2021-Jul-14 at 14:52

            You can control which instances you want to generate features for with the cutoff_time dataframe (or the instance_ids argument in DFS if the cutoff time is a single datetime). Featuretools will only generate features for instances whose IDs are in the cutoff time dataframe and will ignore all others:

            Source https://stackoverflow.com/questions/68345070

            QUESTION

            Regarding featuretools, the rank results are wrong
            Asked 2021-Jul-12 at 13:22

            Using Featuretools, I want to convert the value of a certain feature to rank.

            This will be the exact question. If anyone can help me, please answer.

            First, the following code uses the rank function of pandas and displays the result. I believe this result is correct.

            ...

            ANSWER

            Answered 2021-Jul-12 at 13:22

            NEW ANSWER: Based on your updated code, the problem is arising because you are setting njobs=-1. When you do this, behind the scenes, Featuretools is distributing the calculation of the feature matrix to multiple workers. In doing so, Featuretools is breaking up the dataframe for calculating the transform feature values among the workers and sending pieces to each worker.

            This creates a problem with the Rank primitive you have defined as this primitive requires all of the data to be present to get a correct answer. For situations like this you need to set uses_full_entity=True when defining the primitive to force featuretools to include all of the data when the primitive function is called to compute the feature values.

            If you update the Rank primitive definition as follows, you will get the correct answer:

            Source https://stackoverflow.com/questions/68296670

            QUESTION

            How to get trans_primitives of highest entity in featuretools?
            Asked 2021-Feb-10 at 16:47

            In the classic mock customer dataset example in featuretools, if I have to derive trans_primitives like month, day, year etc. of transaction_time attribute of transactions entity. How do I do that?

            ...

            ANSWER

            Answered 2021-Feb-10 at 16:47

            Thanks for the question! You can get those features from the transaction time by setting the target entity to transactions in DFS. You also want to specify which transform primitives to apply. Let me know if this helps.

            Source https://stackoverflow.com/questions/65970461

            QUESTION

            How to implement custom naming for multioutput primitives in FeatureTools
            Asked 2021-Jan-09 at 02:08

            As of version v0.12.0, FeatureTools allows you to assign custom names to multi-output primitives: https://github.com/alteryx/featuretools/pull/794. By default, the when you define custom multi-output primitives, the column names for the generated features are appended with a [0], [1], [2], etc. So let us say that I have the following code to output a multi-output primitive:

            ...

            ANSWER

            Answered 2021-Jan-09 at 02:08

            Thanks for the question! This feature hasn't been documented well.

            The main issue with your code was that string_count_generate_name should return a list of strings, one for each column.

            It looks like you were adapting the StringCount example from the docs -- I think for this primitive it would be less error-prone to always use "sine" and "cosine" for the custom names, and remove the optional string argument from sine_and_cosine_datestamp. I also updated the feature name text to match your desired text.

            After these changes:

            Source https://stackoverflow.com/questions/65637245

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install featuretools

            or from the Conda-forge channel on conda:.

            Support

            The Featuretools community is happy to provide support to users of Featuretools. Project support can be found in four places depending on the type of question:.
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            Install
          • PyPI

            pip install featuretools

          • CLONE
          • HTTPS

            https://github.com/alteryx/featuretools.git

          • CLI

            gh repo clone alteryx/featuretools

          • sshUrl

            git@github.com:alteryx/featuretools.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link