Popular New Releases in Transformer
transformers
v4.18.0: Checkpoint sharding, vision models
vit-pytorch
0.33.2
gpt-neo
v1.1.1
sentence-transformers
v2.0.0 - Integration into Huggingface Model Hub
tokenizers
Python v0.12.1
Popular Libraries in Transformer
by huggingface python
61400 Apache-2.0
π€ Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
by google-research python
28940 Apache-2.0
TensorFlow code and pre-trained models for BERT
by hanxiao python
9373 MIT
Mapping a variable-length sentence to a fixed-length vector using BERT model
by lucidrains python
9247 MIT
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
by jadore801120 python
6215 MIT
A PyTorch implementation of the Transformer model in "Attention is All You Need".
by EleutherAI python
6100 MIT
An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.
by UKPLab python
5944 Apache-2.0
Multilingual Sentence & Image Embeddings with BERT
by tensorflow python
5896 Apache-2.0
TensorFlow Neural Machine Translation Tutorial
by ymcui python
5862 Apache-2.0
Pre-Training with Whole Word Masking for Chinese BERTοΌδΈζBERT-wwmη³»ε樑εοΌ
Trending New libraries in Transformer
by lucidrains python
9247 MIT
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
by EleutherAI python
6100 MIT
An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.
by openai python
4589 NOASSERTION
Code for the paper "Jukebox: A Generative Model for Music"
by lucidrains python
4058 MIT
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
by PaddlePaddle python
3119 Apache-2.0
Easy-to-use and Fast NLP library with awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications.
by facebookresearch python
2776 Apache-2.0
PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
by MaartenGr python
2187 MIT
Leveraging BERT and c-TF-IDF to create easily interpretable topics.
by facebookresearch python
2092 Apache-2.0
Official DeiT repository
by EleutherAI python
2012 Apache-2.0
An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.
Top Authors in Transformer
1
43 Libraries
21211
2
17 Libraries
10083
3
13 Libraries
8331
4
11 Libraries
1079
5
9 Libraries
525
6
9 Libraries
424
7
9 Libraries
966
8
9 Libraries
2867
9
8 Libraries
33713
10
8 Libraries
42
1
43 Libraries
21211
2
17 Libraries
10083
3
13 Libraries
8331
4
11 Libraries
1079
5
9 Libraries
525
6
9 Libraries
424
7
9 Libraries
966
8
9 Libraries
2867
9
8 Libraries
33713
10
8 Libraries
42
Trending Kits in Transformer
No Trending Kits are available at this moment for Transformer
Trending Discussions on Transformer
Cannot read properties of undefined (reading 'transformFile') at Bundler.transformFile
What is this GHC feature called? `forall` in type definitions
Why Reader implemented based ReaderT?
How to get all properties of type alias into an array?
Netlify says, "error Gatsby requires Node.js 14.15.0 or higher (you have v12.18.0)"βyet I have the newest Node version?
Determine whether the Columns of a Dataset are invariant under any given Scikit-Learn Transformer
ValueError after attempting to use OneHotEncoder and then normalize values with make_column_transformer
What are differences between AutoModelForSequenceClassification vs AutoModel
How can I check a confusion_matrix after fine-tuning with custom datasets?
How to get SHAP values for Huggingface Transformer Model Prediction [Zero-Shot Classification]?
QUESTION
Cannot read properties of undefined (reading 'transformFile') at Bundler.transformFile
Asked 2022-Mar-29 at 12:36I have updated node
today and I'm getting this error:
1error: TypeError: Cannot read properties of undefined (reading 'transformFile')
2 at Bundler.transformFile (/Users/.../node_modules/metro/src/Bundler.js:48:30)
3 at runMicrotasks (<anonymous>)
4 at processTicksAndRejections (node:internal/process/task_queues:96:5)
5 at async Object.transform (/Users/.../node_modules/metro/src/lib/transformHelpers.js:101:12)
6 at async processModule (/Users/.../node_modules/metro/src/DeltaBundler/traverseDependencies.js:137:18)
7 at async traverseDependenciesForSingleFile (/Users/.../node_modules/metro/src/DeltaBundler/traverseDependencies.js:131:3)
8 at async Promise.all (index 0)
9 at async initialTraverseDependencies (/Users/.../node_modules/metro/src/DeltaBundler/traverseDependencies.js:114:3)
10 at async DeltaCalculator._getChangedDependencies (/Users/.../node_modules/metro/src/DeltaBundler/DeltaCalculator.js:164:25)
11 at async DeltaCalculator.getDelta (/Users/.../node_modules/metro/src/DeltaBundler/DeltaCalculator.js:94:16)
12
Other than that I haven't done anything unusual, so I'm not sure what to share. If I'm missing any info please comment and I'll add it.
While building the terminal also throws this error:
1error: TypeError: Cannot read properties of undefined (reading 'transformFile')
2 at Bundler.transformFile (/Users/.../node_modules/metro/src/Bundler.js:48:30)
3 at runMicrotasks (<anonymous>)
4 at processTicksAndRejections (node:internal/process/task_queues:96:5)
5 at async Object.transform (/Users/.../node_modules/metro/src/lib/transformHelpers.js:101:12)
6 at async processModule (/Users/.../node_modules/metro/src/DeltaBundler/traverseDependencies.js:137:18)
7 at async traverseDependenciesForSingleFile (/Users/.../node_modules/metro/src/DeltaBundler/traverseDependencies.js:131:3)
8 at async Promise.all (index 0)
9 at async initialTraverseDependencies (/Users/.../node_modules/metro/src/DeltaBundler/traverseDependencies.js:114:3)
10 at async DeltaCalculator._getChangedDependencies (/Users/.../node_modules/metro/src/DeltaBundler/DeltaCalculator.js:164:25)
11 at async DeltaCalculator.getDelta (/Users/.../node_modules/metro/src/DeltaBundler/DeltaCalculator.js:94:16)
12Failed to construct transformer: Error: error:0308010C:digital envelope routines::unsupported
13 at new Hash (node:internal/crypto/hash:67:19)
14 at Object.createHash (node:crypto:130:10)
15 at stableHash (/Users/.../node_modules/metro-cache/src/stableHash.js:19:8)
16 at Object.getCacheKey (/Users/.../node_modules/metro-transform-worker/src/index.js:593:7)
17 at getTransformCacheKey (/Users/.../node_modules/metro/src/DeltaBundler/getTransformCacheKey.js:24:19)
18 at new Transformer (/Users/.../node_modules/metro/src/DeltaBundler/Transformer.js:48:9)
19 at /Users/.../node_modules/metro/src/Bundler.js:22:29
20 at processTicksAndRejections (node:internal/process/task_queues:96:5) {
21 opensslErrorStack: [ 'error:03000086:digital envelope routines::initialization error' ],
22 library: 'digital envelope routines',
23 reason: 'unsupported',
24 code: 'ERR_OSSL_EVP_UNSUPPORTED'
25}
26
My node, npx and react-native versions are:
- node: 17.0.0
- npx: 8.1.0
- react-native-cli: 2.0.1
ANSWER
Answered 2021-Oct-27 at 17:19Ran into the same issue with Node.js 17.0.0. To solve it, I downgraded to version 14.18.1, deleted node_modules
and reinstalled.
QUESTION
What is this GHC feature called? `forall` in type definitions
Asked 2022-Feb-01 at 19:28I learned that you can redefine ContT
from transformers such that the r
type parameter is made implicit (and may be specified explicitly using TypeApplications
), viz.:
1-- | Same as `ContT` but with the `r` made implicit
2type ContT ::
3 forall (r :: Type).
4 (Type -> Type) ->
5 Type ->
6 Type
7data ContT m a where
8 ContT ::
9 forall r m a.
10 {runContT :: (a -> m r) -> m r} ->
11 ContT @r m a
12
13type ContVoid :: (Type -> Type) -> Type -> Type
14type ContVoid = ContT @()
15
I hadn't realized this was possible in GHC. What is the larger feature called to refer to this way of defining a family of types with implicit type parameters, that is specified using forall
in type definition (referring, in the example above, to the outer forall
- rather than the inner forall
which simply unifies the r
)?
ANSWER
Answered 2022-Feb-01 at 19:28Nobody uses this (invisible dependent quantification) for this purpose (where the dependency is not used) but it is the same as giving a Type -> ..
parameter, implicitly.
1-- | Same as `ContT` but with the `r` made implicit
2type ContT ::
3 forall (r :: Type).
4 (Type -> Type) ->
5 Type ->
6 Type
7data ContT m a where
8 ContT ::
9 forall r m a.
10 {runContT :: (a -> m r) -> m r} ->
11 ContT @r m a
12
13type ContVoid :: (Type -> Type) -> Type -> Type
14type ContVoid = ContT @()
15type EITHER :: forall (a :: Type) (b :: Type). Type
16data EITHER where
17 LEFT :: a -> EITHER @a @b
18 RIGHT :: b -> EITHER @a @b
19
20eITHER :: (a -> res) -> (b -> res) -> (EITHER @a @b -> res)
21eITHER left right = \case
22 LEFT a -> left a
23 RIGHT b -> right b
24
You can also use "visible dependent quantification" where forall->
is the visible counterpart to forall.
, so forall (a :: Type) -> ..
is properly like Type -> ..
where a does not appear in ..:
1-- | Same as `ContT` but with the `r` made implicit
2type ContT ::
3 forall (r :: Type).
4 (Type -> Type) ->
5 Type ->
6 Type
7data ContT m a where
8 ContT ::
9 forall r m a.
10 {runContT :: (a -> m r) -> m r} ->
11 ContT @r m a
12
13type ContVoid :: (Type -> Type) -> Type -> Type
14type ContVoid = ContT @()
15type EITHER :: forall (a :: Type) (b :: Type). Type
16data EITHER where
17 LEFT :: a -> EITHER @a @b
18 RIGHT :: b -> EITHER @a @b
19
20eITHER :: (a -> res) -> (b -> res) -> (EITHER @a @b -> res)
21eITHER left right = \case
22 LEFT a -> left a
23 RIGHT b -> right b
24type EITHER :: forall (a :: Type) -> forall (b :: Type) -> Type
25data EITHER a b where
26 LEFT :: a -> EITHER a b
27 RIGHT :: b -> EITHER a b
28
29eITHER :: (a -> res) -> (b -> res) -> (EITHER a b -> res)
30eITHER left right = \case
31 LEFT a -> left a
32 RIGHT b -> right b
33
QUESTION
Why Reader implemented based ReaderT?
Asked 2022-Jan-11 at 17:11I found that Reader
is implemented based on ReaderT
using Identity
. Why don't make Reader
first and then make ReaderT
? Is there specific reason to implement that way?
ANSWER
Answered 2022-Jan-11 at 17:11They are the same data type to share as much code as possible between Reader
and ReaderT
. As it stands, only runReader
, mapReader
, and withReader
have any special cases. And withReader
doesn't have any unique code, it's just a type specialization, so only two functions actually do anything special for Reader
as opposed to ReaderT
.
You might look at the module exports and think that isn't buying much, but it actually is. There are a lot of instances defined for ReaderT
that Reader
automatically has as well, because it's the same type. So it's actually a fair bit less code to have only one underlying type for the two.
Given that, your question boils down to asking why Reader
is implemented on top of ReaderT
, and not the other way around. And for that, well, it's just the only way that works.
Let's try to go the other direction and see what goes wrong.
1newtype Reader r a = Reader (r -> a)
2type ReaderT r m a = Reader r (m a)
3
Yep, ok. Inline the alias and strip out the newtype wrapping and ReaderT r m a
is equivalent to r -> m a
, as it should be. Now let's move forward to the Functor
instance:
1newtype Reader r a = Reader (r -> a)
2type ReaderT r m a = Reader r (m a)
3instance Functor (Reader r) where
4 fmap f (Reader g) = Reader (f . g)
5
Yep, it's the only possible instance for Functor
for that definition of Reader
. And since ReaderT
is the same underlying type, it also provides an instance of Functor
for ReaderT
. Except something has gone horribly wrong. If you fix the second argument and result types to be what you'd expect, fmap
specializes to the type (m a -> m b) -> ReaderT r m a -> ReaderT r m b
. That's not right at all. fmap
's first argument should have the type (a -> b)
. That m
on both sides is definitely not supposed to be there.
But it's just what happens when you try to implement ReaderT
in terms of Reader
, instead of the other way around. In order to share code for Functor
(and a lot more) between the two types, the last type variable in each has to be the same thing in the underlying type. And that's just not possible when basing ReaderT
on Reader
. It has to introduce an extra type variable, and the only way to do it while getting the right result from doing all the substitutions is by making the a
in Reader r a
refer to something different than the a
in ReaderT r m a
. And that turns out to be incompatible with sharing higher-kinded instances like Functor
between the two types.
Amusingly, you sort of picked the best possible case with Reader
in that it's possible to get the types to line up right at all. Things fail a lot faster if you try to base StateT
on State
, for instance. There's no way to even write a type alias that will add the m
parameter and expand to the right thing for that pair. Reader
requires you to explore further before things break down.
QUESTION
How to get all properties of type alias into an array?
Asked 2022-Jan-08 at 08:25Given this type alias:
1export type RequestObject = {
2 user_id: number,
3 address: string,
4 user_type: number,
5 points: number,
6};
7
I want an array of all its properties, e.g:
1export type RequestObject = {
2 user_id: number,
3 address: string,
4 user_type: number,
5 points: number,
6};
7['user_id','address','user_type','points']
8
Is there any option to get this? I have googled but I can get it only for interface using following package
ANSWER
Answered 2022-Jan-08 at 08:22Typescript types only exist at compile time. They do not exist in the compiled javascript. Thus you cannot populate an array (a runtime entity) with compile-time data (such as the RequestObject
type alias), unless you do something complicated like the library you found.
- code something yourself that works like the library you found.
- find a different library that works with type aliases such as
RequestObject
. - create an interface equivalent to your type alias and pass that to the library you found, e.g.:
1export type RequestObject = {
2 user_id: number,
3 address: string,
4 user_type: number,
5 points: number,
6};
7['user_id','address','user_type','points']
8import { keys } from 'ts-transformer-keys';
9
10export type RequestObject = {
11 user_id: number,
12 address: string,
13 user_type: number,
14 points: number,
15}
16
17interface IRequestObject extends RequestObject {}
18
19const keysOfProps = keys<IRequestObject>();
20
21console.log(keysOfProps); // ['user_id', 'address', 'user_type', 'points']
22
QUESTION
Netlify says, "error Gatsby requires Node.js 14.15.0 or higher (you have v12.18.0)"βyet I have the newest Node version?
Asked 2022-Jan-08 at 07:21After migrating from Remark to MDX, my builds on Netlify are failing.
I get this error when trying to build:
110:13:28 AM: $ npm run build
210:13:29 AM: > blog-gatsby@0.1.0 build /opt/build/repo
310:13:29 AM: > gatsby build
410:13:30 AM: error Gatsby requires Node.js 14.15.0 or higher (you have v12.18.0).
510:13:30 AM: Upgrade Node to the latest stable release: https://gatsby.dev/upgrading-node-js
6
Yet when I run node -v
in my terminal, it says v17.2.0.
I assume it's not a coincidence that this happened after migrating. Can the problem be because of my node-modules folder? Or is there something in my gatsby-config.js or package.json files I need to change?
My package.json file:
110:13:28 AM: $ npm run build
210:13:29 AM: > blog-gatsby@0.1.0 build /opt/build/repo
310:13:29 AM: > gatsby build
410:13:30 AM: error Gatsby requires Node.js 14.15.0 or higher (you have v12.18.0).
510:13:30 AM: Upgrade Node to the latest stable release: https://gatsby.dev/upgrading-node-js
6{
7 "name": "blog-gatsby",
8 "private": true,
9 "description": "A starter for a blog powered by Gatsby and Markdown",
10 "version": "0.1.0",
11 "author": "Magnus Kolstad <kolstadmagnus@gmail.com>",
12 "bugs": {
13 "url": "https://kolstadmagnus.no"
14 },
15 "dependencies": {
16 "@mdx-js/mdx": "^1.6.22",
17 "@mdx-js/react": "^1.6.22",
18 "gatsby": "^4.3.0",
19 "gatsby-plugin-feed": "^4.3.0",
20 "gatsby-plugin-gatsby-cloud": "^4.3.0",
21 "gatsby-plugin-google-analytics": "^4.3.0",
22 "gatsby-plugin-image": "^2.3.0",
23 "gatsby-plugin-manifest": "^4.3.0",
24 "gatsby-plugin-mdx": "^3.4.0",
25 "gatsby-plugin-offline": "^5.3.0",
26 "gatsby-plugin-react-helmet": "^5.3.0",
27 "gatsby-plugin-sharp": "^4.3.0",
28 "gatsby-remark-copy-linked-files": "^5.3.0",
29 "gatsby-remark-images": "^6.3.0",
30 "gatsby-remark-prismjs": "^6.3.0",
31 "gatsby-remark-responsive-iframe": "^5.3.0",
32 "gatsby-remark-smartypants": "^5.3.0",
33 "gatsby-source-filesystem": "^4.3.0",
34 "gatsby-transformer-sharp": "^4.3.0",
35 "prismjs": "^1.25.0",
36 "react": "^17.0.1",
37 "react-dom": "^17.0.1",
38 "react-helmet": "^6.1.0",
39 "typeface-merriweather": "0.0.72",
40 "typeface-montserrat": "0.0.75"
41 },
42 "devDependencies": {
43 "prettier": "^2.4.1"
44 },
45 "homepage": "https://kolstadmagnus.no",
46 "keywords": [
47 "blog"
48 ],
49 "license": "0BSD",
50 "main": "n/a",
51 "repository": {
52 "type": "git",
53 "url": "git+https://github.com/gatsbyjs/gatsby-starter-blog.git"
54 },
55 "scripts": {
56 "build": "gatsby build",
57 "develop": "gatsby develop",
58 "format": "prettier --write \"**/*.{js,jsx,ts,tsx,json,md}\"",
59 "start": "gatsby develop",
60 "serve": "gatsby serve",
61 "clean": "gatsby clean",
62 "test": "echo \"Write tests! -> https://gatsby.dev/unit-testing\" && exit 1"
63 }
64}
65
What am I doing wrong here?
Update #1
110:13:28 AM: $ npm run build
210:13:29 AM: > blog-gatsby@0.1.0 build /opt/build/repo
310:13:29 AM: > gatsby build
410:13:30 AM: error Gatsby requires Node.js 14.15.0 or higher (you have v12.18.0).
510:13:30 AM: Upgrade Node to the latest stable release: https://gatsby.dev/upgrading-node-js
6{
7 "name": "blog-gatsby",
8 "private": true,
9 "description": "A starter for a blog powered by Gatsby and Markdown",
10 "version": "0.1.0",
11 "author": "Magnus Kolstad <kolstadmagnus@gmail.com>",
12 "bugs": {
13 "url": "https://kolstadmagnus.no"
14 },
15 "dependencies": {
16 "@mdx-js/mdx": "^1.6.22",
17 "@mdx-js/react": "^1.6.22",
18 "gatsby": "^4.3.0",
19 "gatsby-plugin-feed": "^4.3.0",
20 "gatsby-plugin-gatsby-cloud": "^4.3.0",
21 "gatsby-plugin-google-analytics": "^4.3.0",
22 "gatsby-plugin-image": "^2.3.0",
23 "gatsby-plugin-manifest": "^4.3.0",
24 "gatsby-plugin-mdx": "^3.4.0",
25 "gatsby-plugin-offline": "^5.3.0",
26 "gatsby-plugin-react-helmet": "^5.3.0",
27 "gatsby-plugin-sharp": "^4.3.0",
28 "gatsby-remark-copy-linked-files": "^5.3.0",
29 "gatsby-remark-images": "^6.3.0",
30 "gatsby-remark-prismjs": "^6.3.0",
31 "gatsby-remark-responsive-iframe": "^5.3.0",
32 "gatsby-remark-smartypants": "^5.3.0",
33 "gatsby-source-filesystem": "^4.3.0",
34 "gatsby-transformer-sharp": "^4.3.0",
35 "prismjs": "^1.25.0",
36 "react": "^17.0.1",
37 "react-dom": "^17.0.1",
38 "react-helmet": "^6.1.0",
39 "typeface-merriweather": "0.0.72",
40 "typeface-montserrat": "0.0.75"
41 },
42 "devDependencies": {
43 "prettier": "^2.4.1"
44 },
45 "homepage": "https://kolstadmagnus.no",
46 "keywords": [
47 "blog"
48 ],
49 "license": "0BSD",
50 "main": "n/a",
51 "repository": {
52 "type": "git",
53 "url": "git+https://github.com/gatsbyjs/gatsby-starter-blog.git"
54 },
55 "scripts": {
56 "build": "gatsby build",
57 "develop": "gatsby develop",
58 "format": "prettier --write \"**/*.{js,jsx,ts,tsx,json,md}\"",
59 "start": "gatsby develop",
60 "serve": "gatsby serve",
61 "clean": "gatsby clean",
62 "test": "echo \"Write tests! -> https://gatsby.dev/unit-testing\" && exit 1"
63 }
64}
657:11:59 PM: failed Building production JavaScript and CSS bundles - 20.650s
667:11:59 PM: error Generating JavaScript bundles failed
677:11:59 PM: Module build failed (from ./node_modules/url-loader/dist/cjs.js):
687:11:59 PM: Error: error:0308010C:digital envelope routines::unsupported
697:11:59 PM: at new Hash (node:internal/crypto/hash:67:19)
707:11:59 PM: at Object.createHash (node:crypto:130:10)
717:11:59 PM: at getHashDigest (/opt/build/repo/node_modules/file-loader/node_modules/loader-utils/lib/getHashDigest.js:46:34)
727:11:59 PM: at /opt/build/repo/node_modules/file-loader/node_modules/loader-utils/lib/interpolateName.js:113:11
737:11:59 PM: at String.replace (<anonymous>)
747:11:59 PM: at interpolateName (/opt/build/repo/node_modules/file-loader/node_modules/loader-utils/lib/interpolateName.js:110:8)
757:11:59 PM: at Object.loader (/opt/build/repo/node_modules/file-loader/dist/index.js:29:48)
767:11:59 PM: at Object.loader (/opt/build/repo/node_modules/url-loader/dist/index.js:127:19)
777:11:59 PM: β
787:11:59 PM: ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
797:11:59 PM: "build.command" failed
807:11:59 PM: ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
817:11:59 PM: β
827:11:59 PM: Error message
837:11:59 PM: Command failed with exit code 1: npm run build
847:11:59 PM: β
857:11:59 PM: Error location
867:11:59 PM: In Build command from Netlify app:
877:11:59 PM: npm run build
887:11:59 PM: β
897:11:59 PM: Resolved config
907:11:59 PM: build:
917:11:59 PM: command: npm run build
927:11:59 PM: commandOrigin: ui
937:11:59 PM: publish: /opt/build/repo/public
947:11:59 PM: publishOrigin: ui
957:11:59 PM: plugins:
967:11:59 PM: - inputs: {}
977:11:59 PM: origin: ui
987:11:59 PM: package: '@netlify/plugin-gatsby'
997:11:59 PM: redirects:
1007:12:00 PM: - from: /api/*
101 status: 200
102 to: /.netlify/functions/gatsby
103 - force: true
104 from: https://magnuskolstad.com
105 status: 301
106 to: https://kolstadmagnus.no
107 redirectsOrigin: config
108Caching artifacts
109
ANSWER
Answered 2022-Jan-08 at 07:21The problem is that you have Node 17.2.0. locally but in Netlify's environment, you are running a lower version (by default it's not set as 17.2.0). So the local environment is OK, Netlify environment is KO because of this mismatch of Node versions.
When Netlify deploys your site it installs and builds again your site so you should ensure that both environments work under the same conditions. Otherwise, both node_modules
will differ so your application will have different behavior or eventually won't even build because of dependency errors.
You can easily play with the Node version in multiple ways but I'd recommend using the .nvmrc
file. Just run the following command in the root of your project:
110:13:28 AM: $ npm run build
210:13:29 AM: > blog-gatsby@0.1.0 build /opt/build/repo
310:13:29 AM: > gatsby build
410:13:30 AM: error Gatsby requires Node.js 14.15.0 or higher (you have v12.18.0).
510:13:30 AM: Upgrade Node to the latest stable release: https://gatsby.dev/upgrading-node-js
6{
7 "name": "blog-gatsby",
8 "private": true,
9 "description": "A starter for a blog powered by Gatsby and Markdown",
10 "version": "0.1.0",
11 "author": "Magnus Kolstad <kolstadmagnus@gmail.com>",
12 "bugs": {
13 "url": "https://kolstadmagnus.no"
14 },
15 "dependencies": {
16 "@mdx-js/mdx": "^1.6.22",
17 "@mdx-js/react": "^1.6.22",
18 "gatsby": "^4.3.0",
19 "gatsby-plugin-feed": "^4.3.0",
20 "gatsby-plugin-gatsby-cloud": "^4.3.0",
21 "gatsby-plugin-google-analytics": "^4.3.0",
22 "gatsby-plugin-image": "^2.3.0",
23 "gatsby-plugin-manifest": "^4.3.0",
24 "gatsby-plugin-mdx": "^3.4.0",
25 "gatsby-plugin-offline": "^5.3.0",
26 "gatsby-plugin-react-helmet": "^5.3.0",
27 "gatsby-plugin-sharp": "^4.3.0",
28 "gatsby-remark-copy-linked-files": "^5.3.0",
29 "gatsby-remark-images": "^6.3.0",
30 "gatsby-remark-prismjs": "^6.3.0",
31 "gatsby-remark-responsive-iframe": "^5.3.0",
32 "gatsby-remark-smartypants": "^5.3.0",
33 "gatsby-source-filesystem": "^4.3.0",
34 "gatsby-transformer-sharp": "^4.3.0",
35 "prismjs": "^1.25.0",
36 "react": "^17.0.1",
37 "react-dom": "^17.0.1",
38 "react-helmet": "^6.1.0",
39 "typeface-merriweather": "0.0.72",
40 "typeface-montserrat": "0.0.75"
41 },
42 "devDependencies": {
43 "prettier": "^2.4.1"
44 },
45 "homepage": "https://kolstadmagnus.no",
46 "keywords": [
47 "blog"
48 ],
49 "license": "0BSD",
50 "main": "n/a",
51 "repository": {
52 "type": "git",
53 "url": "git+https://github.com/gatsbyjs/gatsby-starter-blog.git"
54 },
55 "scripts": {
56 "build": "gatsby build",
57 "develop": "gatsby develop",
58 "format": "prettier --write \"**/*.{js,jsx,ts,tsx,json,md}\"",
59 "start": "gatsby develop",
60 "serve": "gatsby serve",
61 "clean": "gatsby clean",
62 "test": "echo \"Write tests! -> https://gatsby.dev/unit-testing\" && exit 1"
63 }
64}
657:11:59 PM: failed Building production JavaScript and CSS bundles - 20.650s
667:11:59 PM: error Generating JavaScript bundles failed
677:11:59 PM: Module build failed (from ./node_modules/url-loader/dist/cjs.js):
687:11:59 PM: Error: error:0308010C:digital envelope routines::unsupported
697:11:59 PM: at new Hash (node:internal/crypto/hash:67:19)
707:11:59 PM: at Object.createHash (node:crypto:130:10)
717:11:59 PM: at getHashDigest (/opt/build/repo/node_modules/file-loader/node_modules/loader-utils/lib/getHashDigest.js:46:34)
727:11:59 PM: at /opt/build/repo/node_modules/file-loader/node_modules/loader-utils/lib/interpolateName.js:113:11
737:11:59 PM: at String.replace (<anonymous>)
747:11:59 PM: at interpolateName (/opt/build/repo/node_modules/file-loader/node_modules/loader-utils/lib/interpolateName.js:110:8)
757:11:59 PM: at Object.loader (/opt/build/repo/node_modules/file-loader/dist/index.js:29:48)
767:11:59 PM: at Object.loader (/opt/build/repo/node_modules/url-loader/dist/index.js:127:19)
777:11:59 PM: β
787:11:59 PM: ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
797:11:59 PM: "build.command" failed
807:11:59 PM: ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
817:11:59 PM: β
827:11:59 PM: Error message
837:11:59 PM: Command failed with exit code 1: npm run build
847:11:59 PM: β
857:11:59 PM: Error location
867:11:59 PM: In Build command from Netlify app:
877:11:59 PM: npm run build
887:11:59 PM: β
897:11:59 PM: Resolved config
907:11:59 PM: build:
917:11:59 PM: command: npm run build
927:11:59 PM: commandOrigin: ui
937:11:59 PM: publish: /opt/build/repo/public
947:11:59 PM: publishOrigin: ui
957:11:59 PM: plugins:
967:11:59 PM: - inputs: {}
977:11:59 PM: origin: ui
987:11:59 PM: package: '@netlify/plugin-gatsby'
997:11:59 PM: redirects:
1007:12:00 PM: - from: /api/*
101 status: 200
102 to: /.netlify/functions/gatsby
103 - force: true
104 from: https://magnuskolstad.com
105 status: 301
106 to: https://kolstadmagnus.no
107 redirectsOrigin: config
108Caching artifacts
109node -v > .nvmrc
110
This should create a .nvmrc
file containing the Node version (node -v
) in it. When Netlify finds this file during the build process, it uses it as a base Node version so it installs all the dependencies accordingly.
The file is also useful to tell other contributors which Node version are you using.
QUESTION
Determine whether the Columns of a Dataset are invariant under any given Scikit-Learn Transformer
Asked 2021-Dec-19 at 08:42Given an sklearn tranformer t
, is there a way to determine whether t
changes columns/column order of any given input dataset X
, without applying it to the data?
For example with t = sklearn.preprocessing.StandardScaler
there is a 1-to-1 mapping between the columns of X
and t.transform(X)
, namely X[:, i] -> t.transform(X)[:, i]
, whereas this is obviously not the case for sklearn.decomposition.PCA
.
A corollary of that would be: Can we know, how the columns of the input will change by applying t
, e.g. which columns an already fitted sklearn.feature_selection.SelectKBest
chooses.
I am not looking for solutions to specific transformers, but a solution applicable to all or at least a wide selection of transformers.
Feel free to implement your own Pipeline class or wrapper if necessary.
ANSWER
Answered 2021-Nov-23 at 15:01I found a partial answer. Both StandardScaler
and SelectKBest
have .get_feature_names_out
methods. I did not find the time to investigate further.
1from numpy.random import RandomState
2import numpy as np
3import pandas as pd
4
5from sklearn.preprocessing import StandardScaler
6from sklearn.feature_selection import SelectKBest
7
8from sklearn.linear_model import LassoCV
9
10
11rng = RandomState()
12
13# Make some data
14slopes = np.array([-1., 1., .1])
15X = pd.DataFrame(
16 data = np.linspace(-1,1,500)[:, np.newaxis] + rng.random((500, 3)),
17 columns=["foo", "bar", "baz"]
18)
19y = pd.Series(data=np.linspace(-1,1, 500) + rng.rand((500)))
20
21# Test Transformers
22scaler = StandardScaler().fit(X)
23selector = SelectKBest(k=2).fit(X, y)
24
25print(scaler.get_feature_names_out())
26print(selector.get_feature_names_out())
27
QUESTION
ValueError after attempting to use OneHotEncoder and then normalize values with make_column_transformer
Asked 2021-Dec-09 at 20:59So I was trying to convert my data's timestamps from Unix timestamps to a more readable date format. I created a simple Java program to do so and write to a .csv file, and that went smoothly. I tried using it for my model by one-hot encoding it into numbers and then turning everything into normalized data. However, after my attempt to one-hot encode (which I am not sure if it even worked), my normalization process using make_column_transformer failed.
1# model 4
2# next model
3import tensorflow as tf
4import matplotlib.pyplot as plt
5import pandas as pd
6import numpy as np
7from tensorflow.keras import layers
8from sklearn.compose import make_column_transformer
9from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
10from sklearn.model_selection import train_test_split
11
12np.set_printoptions(precision=3, suppress=True)
13btc_data = pd.read_csv(
14 "/content/drive/MyDrive/Science Fair/output2.csv",
15 names=["Time", "Open"])
16
17X_btc = btc_data[["Time"]]
18y_btc = btc_data["Open"]
19
20enc = OneHotEncoder(handle_unknown="ignore")
21enc.fit(X_btc)
22
23X_btc = enc.transform(X_btc)
24
25print(X_btc)
26
27X_train, X_test, y_train, y_test = train_test_split(X_btc, y_btc, test_size=0.2, random_state=62)
28
29ct = make_column_transformer(
30 (MinMaxScaler(), ["Time"])
31)
32
33ct.fit(X_train)
34X_train_normal = ct.transform(X_train)
35X_test_normal = ct.transform(X_test)
36
37callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=3)
38
39btc_model_4 = tf.keras.Sequential([
40 layers.Dense(100, activation="relu"),
41 layers.Dense(100, activation="relu"),
42 layers.Dense(100, activation="relu"),
43 layers.Dense(100, activation="relu"),
44 layers.Dense(100, activation="relu"),
45 layers.Dense(100, activation="relu"),
46 layers.Dense(1, activation="linear")
47])
48
49btc_model_4.compile(loss = tf.losses.MeanSquaredError(),
50 optimizer = tf.optimizers.Adam())
51
52history = btc_model_4.fit(X_train_normal, y_train, batch_size=8192, epochs=100, callbacks=[callback])
53
54btc_model_4.evaluate(X_test_normal, y_test, batch_size=8192)
55
56y_pred = btc_model_4.predict(X_test_normal)
57
58btc_model_4.save("btc_model_4")
59btc_model_4.save("btc_model_4.h5")
60
61# plot model
62def plot_evaluations(train_data=X_train_normal,
63 train_labels=y_train,
64 test_data=X_test_normal,
65 test_labels=y_test,
66 predictions=y_pred):
67 print(test_data.shape)
68 print(predictions.shape)
69
70 plt.figure(figsize=(100, 15))
71 plt.scatter(train_data, train_labels, c='b', label="Training")
72 plt.scatter(test_data, test_labels, c='g', label="Testing")
73 plt.scatter(test_data, predictions, c='r', label="Results")
74 plt.legend()
75
76plot_evaluations()
77
78# plot loss curve
79pd.DataFrame(history.history).plot()
80plt.ylabel("loss")
81plt.xlabel("epochs")
82
My normal data format is like so:
1# model 4
2# next model
3import tensorflow as tf
4import matplotlib.pyplot as plt
5import pandas as pd
6import numpy as np
7from tensorflow.keras import layers
8from sklearn.compose import make_column_transformer
9from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
10from sklearn.model_selection import train_test_split
11
12np.set_printoptions(precision=3, suppress=True)
13btc_data = pd.read_csv(
14 "/content/drive/MyDrive/Science Fair/output2.csv",
15 names=["Time", "Open"])
16
17X_btc = btc_data[["Time"]]
18y_btc = btc_data["Open"]
19
20enc = OneHotEncoder(handle_unknown="ignore")
21enc.fit(X_btc)
22
23X_btc = enc.transform(X_btc)
24
25print(X_btc)
26
27X_train, X_test, y_train, y_test = train_test_split(X_btc, y_btc, test_size=0.2, random_state=62)
28
29ct = make_column_transformer(
30 (MinMaxScaler(), ["Time"])
31)
32
33ct.fit(X_train)
34X_train_normal = ct.transform(X_train)
35X_test_normal = ct.transform(X_test)
36
37callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=3)
38
39btc_model_4 = tf.keras.Sequential([
40 layers.Dense(100, activation="relu"),
41 layers.Dense(100, activation="relu"),
42 layers.Dense(100, activation="relu"),
43 layers.Dense(100, activation="relu"),
44 layers.Dense(100, activation="relu"),
45 layers.Dense(100, activation="relu"),
46 layers.Dense(1, activation="linear")
47])
48
49btc_model_4.compile(loss = tf.losses.MeanSquaredError(),
50 optimizer = tf.optimizers.Adam())
51
52history = btc_model_4.fit(X_train_normal, y_train, batch_size=8192, epochs=100, callbacks=[callback])
53
54btc_model_4.evaluate(X_test_normal, y_test, batch_size=8192)
55
56y_pred = btc_model_4.predict(X_test_normal)
57
58btc_model_4.save("btc_model_4")
59btc_model_4.save("btc_model_4.h5")
60
61# plot model
62def plot_evaluations(train_data=X_train_normal,
63 train_labels=y_train,
64 test_data=X_test_normal,
65 test_labels=y_test,
66 predictions=y_pred):
67 print(test_data.shape)
68 print(predictions.shape)
69
70 plt.figure(figsize=(100, 15))
71 plt.scatter(train_data, train_labels, c='b', label="Training")
72 plt.scatter(test_data, test_labels, c='g', label="Testing")
73 plt.scatter(test_data, predictions, c='r', label="Results")
74 plt.legend()
75
76plot_evaluations()
77
78# plot loss curve
79pd.DataFrame(history.history).plot()
80plt.ylabel("loss")
81plt.xlabel("epochs")
822015-12-05 12:52:00,377.48
832015-12-05 12:53:00,377.5
842015-12-05 12:54:00,377.5
852015-12-05 12:56:00,377.5
862015-12-05 12:57:00,377.5
872015-12-05 12:58:00,377.5
882015-12-05 12:59:00,377.5
892015-12-05 13:00:00,377.5
902015-12-05 13:01:00,377.79
912015-12-05 13:02:00,377.5
922015-12-05 13:03:00,377.79
932015-12-05 13:05:00,377.74
942015-12-05 13:06:00,377.79
952015-12-05 13:07:00,377.64
962015-12-05 13:08:00,377.79
972015-12-05 13:10:00,377.77
982015-12-05 13:11:00,377.7
992015-12-05 13:12:00,377.77
1002015-12-05 13:13:00,377.77
1012015-12-05 13:14:00,377.79
1022015-12-05 13:15:00,377.72
1032015-12-05 13:16:00,377.5
1042015-12-05 13:17:00,377.49
1052015-12-05 13:18:00,377.5
1062015-12-05 13:19:00,377.5
1072015-12-05 13:20:00,377.8
1082015-12-05 13:21:00,377.84
1092015-12-05 13:22:00,378.29
1102015-12-05 13:23:00,378.3
1112015-12-05 13:24:00,378.3
1122015-12-05 13:25:00,378.33
1132015-12-05 13:26:00,378.33
1142015-12-05 13:28:00,378.31
1152015-12-05 13:29:00,378.68
116
The first is the date and the second value after the comma is the price of BTC at that time. Now after "one-hot encoding", I added a print statement to print the value of those X values, and that gave the following value:
1# model 4
2# next model
3import tensorflow as tf
4import matplotlib.pyplot as plt
5import pandas as pd
6import numpy as np
7from tensorflow.keras import layers
8from sklearn.compose import make_column_transformer
9from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
10from sklearn.model_selection import train_test_split
11
12np.set_printoptions(precision=3, suppress=True)
13btc_data = pd.read_csv(
14 "/content/drive/MyDrive/Science Fair/output2.csv",
15 names=["Time", "Open"])
16
17X_btc = btc_data[["Time"]]
18y_btc = btc_data["Open"]
19
20enc = OneHotEncoder(handle_unknown="ignore")
21enc.fit(X_btc)
22
23X_btc = enc.transform(X_btc)
24
25print(X_btc)
26
27X_train, X_test, y_train, y_test = train_test_split(X_btc, y_btc, test_size=0.2, random_state=62)
28
29ct = make_column_transformer(
30 (MinMaxScaler(), ["Time"])
31)
32
33ct.fit(X_train)
34X_train_normal = ct.transform(X_train)
35X_test_normal = ct.transform(X_test)
36
37callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=3)
38
39btc_model_4 = tf.keras.Sequential([
40 layers.Dense(100, activation="relu"),
41 layers.Dense(100, activation="relu"),
42 layers.Dense(100, activation="relu"),
43 layers.Dense(100, activation="relu"),
44 layers.Dense(100, activation="relu"),
45 layers.Dense(100, activation="relu"),
46 layers.Dense(1, activation="linear")
47])
48
49btc_model_4.compile(loss = tf.losses.MeanSquaredError(),
50 optimizer = tf.optimizers.Adam())
51
52history = btc_model_4.fit(X_train_normal, y_train, batch_size=8192, epochs=100, callbacks=[callback])
53
54btc_model_4.evaluate(X_test_normal, y_test, batch_size=8192)
55
56y_pred = btc_model_4.predict(X_test_normal)
57
58btc_model_4.save("btc_model_4")
59btc_model_4.save("btc_model_4.h5")
60
61# plot model
62def plot_evaluations(train_data=X_train_normal,
63 train_labels=y_train,
64 test_data=X_test_normal,
65 test_labels=y_test,
66 predictions=y_pred):
67 print(test_data.shape)
68 print(predictions.shape)
69
70 plt.figure(figsize=(100, 15))
71 plt.scatter(train_data, train_labels, c='b', label="Training")
72 plt.scatter(test_data, test_labels, c='g', label="Testing")
73 plt.scatter(test_data, predictions, c='r', label="Results")
74 plt.legend()
75
76plot_evaluations()
77
78# plot loss curve
79pd.DataFrame(history.history).plot()
80plt.ylabel("loss")
81plt.xlabel("epochs")
822015-12-05 12:52:00,377.48
832015-12-05 12:53:00,377.5
842015-12-05 12:54:00,377.5
852015-12-05 12:56:00,377.5
862015-12-05 12:57:00,377.5
872015-12-05 12:58:00,377.5
882015-12-05 12:59:00,377.5
892015-12-05 13:00:00,377.5
902015-12-05 13:01:00,377.79
912015-12-05 13:02:00,377.5
922015-12-05 13:03:00,377.79
932015-12-05 13:05:00,377.74
942015-12-05 13:06:00,377.79
952015-12-05 13:07:00,377.64
962015-12-05 13:08:00,377.79
972015-12-05 13:10:00,377.77
982015-12-05 13:11:00,377.7
992015-12-05 13:12:00,377.77
1002015-12-05 13:13:00,377.77
1012015-12-05 13:14:00,377.79
1022015-12-05 13:15:00,377.72
1032015-12-05 13:16:00,377.5
1042015-12-05 13:17:00,377.49
1052015-12-05 13:18:00,377.5
1062015-12-05 13:19:00,377.5
1072015-12-05 13:20:00,377.8
1082015-12-05 13:21:00,377.84
1092015-12-05 13:22:00,378.29
1102015-12-05 13:23:00,378.3
1112015-12-05 13:24:00,378.3
1122015-12-05 13:25:00,378.33
1132015-12-05 13:26:00,378.33
1142015-12-05 13:28:00,378.31
1152015-12-05 13:29:00,378.68
116 (0, 0) 1.0
117 (1, 1) 1.0
118 (2, 2) 1.0
119 (3, 3) 1.0
120 (4, 4) 1.0
121 (5, 5) 1.0
122 (6, 6) 1.0
123 (7, 7) 1.0
124 (8, 8) 1.0
125 (9, 9) 1.0
126 (10, 10) 1.0
127 (11, 11) 1.0
128 (12, 12) 1.0
129 (13, 13) 1.0
130 (14, 14) 1.0
131 (15, 15) 1.0
132 (16, 16) 1.0
133 (17, 17) 1.0
134 (18, 18) 1.0
135 (19, 19) 1.0
136 (20, 20) 1.0
137 (21, 21) 1.0
138 (22, 22) 1.0
139 (23, 23) 1.0
140 (24, 24) 1.0
141 : :
142 (2526096, 2526096) 1.0
143 (2526097, 2526097) 1.0
144 (2526098, 2526098) 1.0
145 (2526099, 2526099) 1.0
146 (2526100, 2526100) 1.0
147 (2526101, 2526101) 1.0
148 (2526102, 2526102) 1.0
149 (2526103, 2526103) 1.0
150 (2526104, 2526104) 1.0
151 (2526105, 2526105) 1.0
152 (2526106, 2526106) 1.0
153 (2526107, 2526107) 1.0
154 (2526108, 2526108) 1.0
155 (2526109, 2526109) 1.0
156 (2526110, 2526110) 1.0
157 (2526111, 2526111) 1.0
158 (2526112, 2526112) 1.0
159 (2526113, 2526113) 1.0
160 (2526114, 2526114) 1.0
161 (2526115, 2526115) 1.0
162 (2526116, 2526116) 1.0
163 (2526117, 2526117) 1.0
164 (2526118, 2526118) 1.0
165 (2526119, 2526119) 1.0
166 (2526120, 2526120) 1.0
167
Following fitting for normalization, I receive the following error:
1# model 4
2# next model
3import tensorflow as tf
4import matplotlib.pyplot as plt
5import pandas as pd
6import numpy as np
7from tensorflow.keras import layers
8from sklearn.compose import make_column_transformer
9from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
10from sklearn.model_selection import train_test_split
11
12np.set_printoptions(precision=3, suppress=True)
13btc_data = pd.read_csv(
14 "/content/drive/MyDrive/Science Fair/output2.csv",
15 names=["Time", "Open"])
16
17X_btc = btc_data[["Time"]]
18y_btc = btc_data["Open"]
19
20enc = OneHotEncoder(handle_unknown="ignore")
21enc.fit(X_btc)
22
23X_btc = enc.transform(X_btc)
24
25print(X_btc)
26
27X_train, X_test, y_train, y_test = train_test_split(X_btc, y_btc, test_size=0.2, random_state=62)
28
29ct = make_column_transformer(
30 (MinMaxScaler(), ["Time"])
31)
32
33ct.fit(X_train)
34X_train_normal = ct.transform(X_train)
35X_test_normal = ct.transform(X_test)
36
37callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=3)
38
39btc_model_4 = tf.keras.Sequential([
40 layers.Dense(100, activation="relu"),
41 layers.Dense(100, activation="relu"),
42 layers.Dense(100, activation="relu"),
43 layers.Dense(100, activation="relu"),
44 layers.Dense(100, activation="relu"),
45 layers.Dense(100, activation="relu"),
46 layers.Dense(1, activation="linear")
47])
48
49btc_model_4.compile(loss = tf.losses.MeanSquaredError(),
50 optimizer = tf.optimizers.Adam())
51
52history = btc_model_4.fit(X_train_normal, y_train, batch_size=8192, epochs=100, callbacks=[callback])
53
54btc_model_4.evaluate(X_test_normal, y_test, batch_size=8192)
55
56y_pred = btc_model_4.predict(X_test_normal)
57
58btc_model_4.save("btc_model_4")
59btc_model_4.save("btc_model_4.h5")
60
61# plot model
62def plot_evaluations(train_data=X_train_normal,
63 train_labels=y_train,
64 test_data=X_test_normal,
65 test_labels=y_test,
66 predictions=y_pred):
67 print(test_data.shape)
68 print(predictions.shape)
69
70 plt.figure(figsize=(100, 15))
71 plt.scatter(train_data, train_labels, c='b', label="Training")
72 plt.scatter(test_data, test_labels, c='g', label="Testing")
73 plt.scatter(test_data, predictions, c='r', label="Results")
74 plt.legend()
75
76plot_evaluations()
77
78# plot loss curve
79pd.DataFrame(history.history).plot()
80plt.ylabel("loss")
81plt.xlabel("epochs")
822015-12-05 12:52:00,377.48
832015-12-05 12:53:00,377.5
842015-12-05 12:54:00,377.5
852015-12-05 12:56:00,377.5
862015-12-05 12:57:00,377.5
872015-12-05 12:58:00,377.5
882015-12-05 12:59:00,377.5
892015-12-05 13:00:00,377.5
902015-12-05 13:01:00,377.79
912015-12-05 13:02:00,377.5
922015-12-05 13:03:00,377.79
932015-12-05 13:05:00,377.74
942015-12-05 13:06:00,377.79
952015-12-05 13:07:00,377.64
962015-12-05 13:08:00,377.79
972015-12-05 13:10:00,377.77
982015-12-05 13:11:00,377.7
992015-12-05 13:12:00,377.77
1002015-12-05 13:13:00,377.77
1012015-12-05 13:14:00,377.79
1022015-12-05 13:15:00,377.72
1032015-12-05 13:16:00,377.5
1042015-12-05 13:17:00,377.49
1052015-12-05 13:18:00,377.5
1062015-12-05 13:19:00,377.5
1072015-12-05 13:20:00,377.8
1082015-12-05 13:21:00,377.84
1092015-12-05 13:22:00,378.29
1102015-12-05 13:23:00,378.3
1112015-12-05 13:24:00,378.3
1122015-12-05 13:25:00,378.33
1132015-12-05 13:26:00,378.33
1142015-12-05 13:28:00,378.31
1152015-12-05 13:29:00,378.68
116 (0, 0) 1.0
117 (1, 1) 1.0
118 (2, 2) 1.0
119 (3, 3) 1.0
120 (4, 4) 1.0
121 (5, 5) 1.0
122 (6, 6) 1.0
123 (7, 7) 1.0
124 (8, 8) 1.0
125 (9, 9) 1.0
126 (10, 10) 1.0
127 (11, 11) 1.0
128 (12, 12) 1.0
129 (13, 13) 1.0
130 (14, 14) 1.0
131 (15, 15) 1.0
132 (16, 16) 1.0
133 (17, 17) 1.0
134 (18, 18) 1.0
135 (19, 19) 1.0
136 (20, 20) 1.0
137 (21, 21) 1.0
138 (22, 22) 1.0
139 (23, 23) 1.0
140 (24, 24) 1.0
141 : :
142 (2526096, 2526096) 1.0
143 (2526097, 2526097) 1.0
144 (2526098, 2526098) 1.0
145 (2526099, 2526099) 1.0
146 (2526100, 2526100) 1.0
147 (2526101, 2526101) 1.0
148 (2526102, 2526102) 1.0
149 (2526103, 2526103) 1.0
150 (2526104, 2526104) 1.0
151 (2526105, 2526105) 1.0
152 (2526106, 2526106) 1.0
153 (2526107, 2526107) 1.0
154 (2526108, 2526108) 1.0
155 (2526109, 2526109) 1.0
156 (2526110, 2526110) 1.0
157 (2526111, 2526111) 1.0
158 (2526112, 2526112) 1.0
159 (2526113, 2526113) 1.0
160 (2526114, 2526114) 1.0
161 (2526115, 2526115) 1.0
162 (2526116, 2526116) 1.0
163 (2526117, 2526117) 1.0
164 (2526118, 2526118) 1.0
165 (2526119, 2526119) 1.0
166 (2526120, 2526120) 1.0
167---------------------------------------------------------------------------
168AttributeError Traceback (most recent call last)
169/usr/local/lib/python3.7/dist-packages/sklearn/utils/__init__.py in _get_column_indices(X, key)
170 408 try:
171--> 409 all_columns = X.columns
172 410 except AttributeError:
173
1745 frames
175AttributeError: columns not found
176
177During handling of the above exception, another exception occurred:
178
179ValueError Traceback (most recent call last)
180/usr/local/lib/python3.7/dist-packages/sklearn/utils/__init__.py in _get_column_indices(X, key)
181 410 except AttributeError:
182 411 raise ValueError(
183--> 412 "Specifying the columns using strings is only "
184 413 "supported for pandas DataFrames"
185 414 )
186
187ValueError: Specifying the columns using strings is only supported for pandas DataFrames
188
Am I one-hot encoding correctly? What is the appropriate way to do this? Should I directly implement the one-hot encoder in my normalization process?
ANSWER
Answered 2021-Dec-09 at 20:59using OneHotEncoder is not the way to go here, it's better to extract the features from the column time as separate features like year, month, day, hour, minutes etc... and give these columns as input to your model.
1# model 4
2# next model
3import tensorflow as tf
4import matplotlib.pyplot as plt
5import pandas as pd
6import numpy as np
7from tensorflow.keras import layers
8from sklearn.compose import make_column_transformer
9from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
10from sklearn.model_selection import train_test_split
11
12np.set_printoptions(precision=3, suppress=True)
13btc_data = pd.read_csv(
14 "/content/drive/MyDrive/Science Fair/output2.csv",
15 names=["Time", "Open"])
16
17X_btc = btc_data[["Time"]]
18y_btc = btc_data["Open"]
19
20enc = OneHotEncoder(handle_unknown="ignore")
21enc.fit(X_btc)
22
23X_btc = enc.transform(X_btc)
24
25print(X_btc)
26
27X_train, X_test, y_train, y_test = train_test_split(X_btc, y_btc, test_size=0.2, random_state=62)
28
29ct = make_column_transformer(
30 (MinMaxScaler(), ["Time"])
31)
32
33ct.fit(X_train)
34X_train_normal = ct.transform(X_train)
35X_test_normal = ct.transform(X_test)
36
37callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=3)
38
39btc_model_4 = tf.keras.Sequential([
40 layers.Dense(100, activation="relu"),
41 layers.Dense(100, activation="relu"),
42 layers.Dense(100, activation="relu"),
43 layers.Dense(100, activation="relu"),
44 layers.Dense(100, activation="relu"),
45 layers.Dense(100, activation="relu"),
46 layers.Dense(1, activation="linear")
47])
48
49btc_model_4.compile(loss = tf.losses.MeanSquaredError(),
50 optimizer = tf.optimizers.Adam())
51
52history = btc_model_4.fit(X_train_normal, y_train, batch_size=8192, epochs=100, callbacks=[callback])
53
54btc_model_4.evaluate(X_test_normal, y_test, batch_size=8192)
55
56y_pred = btc_model_4.predict(X_test_normal)
57
58btc_model_4.save("btc_model_4")
59btc_model_4.save("btc_model_4.h5")
60
61# plot model
62def plot_evaluations(train_data=X_train_normal,
63 train_labels=y_train,
64 test_data=X_test_normal,
65 test_labels=y_test,
66 predictions=y_pred):
67 print(test_data.shape)
68 print(predictions.shape)
69
70 plt.figure(figsize=(100, 15))
71 plt.scatter(train_data, train_labels, c='b', label="Training")
72 plt.scatter(test_data, test_labels, c='g', label="Testing")
73 plt.scatter(test_data, predictions, c='r', label="Results")
74 plt.legend()
75
76plot_evaluations()
77
78# plot loss curve
79pd.DataFrame(history.history).plot()
80plt.ylabel("loss")
81plt.xlabel("epochs")
822015-12-05 12:52:00,377.48
832015-12-05 12:53:00,377.5
842015-12-05 12:54:00,377.5
852015-12-05 12:56:00,377.5
862015-12-05 12:57:00,377.5
872015-12-05 12:58:00,377.5
882015-12-05 12:59:00,377.5
892015-12-05 13:00:00,377.5
902015-12-05 13:01:00,377.79
912015-12-05 13:02:00,377.5
922015-12-05 13:03:00,377.79
932015-12-05 13:05:00,377.74
942015-12-05 13:06:00,377.79
952015-12-05 13:07:00,377.64
962015-12-05 13:08:00,377.79
972015-12-05 13:10:00,377.77
982015-12-05 13:11:00,377.7
992015-12-05 13:12:00,377.77
1002015-12-05 13:13:00,377.77
1012015-12-05 13:14:00,377.79
1022015-12-05 13:15:00,377.72
1032015-12-05 13:16:00,377.5
1042015-12-05 13:17:00,377.49
1052015-12-05 13:18:00,377.5
1062015-12-05 13:19:00,377.5
1072015-12-05 13:20:00,377.8
1082015-12-05 13:21:00,377.84
1092015-12-05 13:22:00,378.29
1102015-12-05 13:23:00,378.3
1112015-12-05 13:24:00,378.3
1122015-12-05 13:25:00,378.33
1132015-12-05 13:26:00,378.33
1142015-12-05 13:28:00,378.31
1152015-12-05 13:29:00,378.68
116 (0, 0) 1.0
117 (1, 1) 1.0
118 (2, 2) 1.0
119 (3, 3) 1.0
120 (4, 4) 1.0
121 (5, 5) 1.0
122 (6, 6) 1.0
123 (7, 7) 1.0
124 (8, 8) 1.0
125 (9, 9) 1.0
126 (10, 10) 1.0
127 (11, 11) 1.0
128 (12, 12) 1.0
129 (13, 13) 1.0
130 (14, 14) 1.0
131 (15, 15) 1.0
132 (16, 16) 1.0
133 (17, 17) 1.0
134 (18, 18) 1.0
135 (19, 19) 1.0
136 (20, 20) 1.0
137 (21, 21) 1.0
138 (22, 22) 1.0
139 (23, 23) 1.0
140 (24, 24) 1.0
141 : :
142 (2526096, 2526096) 1.0
143 (2526097, 2526097) 1.0
144 (2526098, 2526098) 1.0
145 (2526099, 2526099) 1.0
146 (2526100, 2526100) 1.0
147 (2526101, 2526101) 1.0
148 (2526102, 2526102) 1.0
149 (2526103, 2526103) 1.0
150 (2526104, 2526104) 1.0
151 (2526105, 2526105) 1.0
152 (2526106, 2526106) 1.0
153 (2526107, 2526107) 1.0
154 (2526108, 2526108) 1.0
155 (2526109, 2526109) 1.0
156 (2526110, 2526110) 1.0
157 (2526111, 2526111) 1.0
158 (2526112, 2526112) 1.0
159 (2526113, 2526113) 1.0
160 (2526114, 2526114) 1.0
161 (2526115, 2526115) 1.0
162 (2526116, 2526116) 1.0
163 (2526117, 2526117) 1.0
164 (2526118, 2526118) 1.0
165 (2526119, 2526119) 1.0
166 (2526120, 2526120) 1.0
167---------------------------------------------------------------------------
168AttributeError Traceback (most recent call last)
169/usr/local/lib/python3.7/dist-packages/sklearn/utils/__init__.py in _get_column_indices(X, key)
170 408 try:
171--> 409 all_columns = X.columns
172 410 except AttributeError:
173
1745 frames
175AttributeError: columns not found
176
177During handling of the above exception, another exception occurred:
178
179ValueError Traceback (most recent call last)
180/usr/local/lib/python3.7/dist-packages/sklearn/utils/__init__.py in _get_column_indices(X, key)
181 410 except AttributeError:
182 411 raise ValueError(
183--> 412 "Specifying the columns using strings is only "
184 413 "supported for pandas DataFrames"
185 414 )
186
187ValueError: Specifying the columns using strings is only supported for pandas DataFrames
188btc_data['Year'] = btc_data['Date'].astype('datetime64[ns]').dt.year
189btc_data['Month'] = btc_data['Date'].astype('datetime64[ns]').dt.month
190btc_data['Day'] = btc_data['Date'].astype('datetime64[ns]').dt.day
191
192
the issue here is coming from the oneHotEncoder which is getting returning a scipy sparse matrix and get rides of the column "Time" so to correct this you must re-transform the output to a pandas dataframe and add the "Time" column.
1# model 4
2# next model
3import tensorflow as tf
4import matplotlib.pyplot as plt
5import pandas as pd
6import numpy as np
7from tensorflow.keras import layers
8from sklearn.compose import make_column_transformer
9from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
10from sklearn.model_selection import train_test_split
11
12np.set_printoptions(precision=3, suppress=True)
13btc_data = pd.read_csv(
14 "/content/drive/MyDrive/Science Fair/output2.csv",
15 names=["Time", "Open"])
16
17X_btc = btc_data[["Time"]]
18y_btc = btc_data["Open"]
19
20enc = OneHotEncoder(handle_unknown="ignore")
21enc.fit(X_btc)
22
23X_btc = enc.transform(X_btc)
24
25print(X_btc)
26
27X_train, X_test, y_train, y_test = train_test_split(X_btc, y_btc, test_size=0.2, random_state=62)
28
29ct = make_column_transformer(
30 (MinMaxScaler(), ["Time"])
31)
32
33ct.fit(X_train)
34X_train_normal = ct.transform(X_train)
35X_test_normal = ct.transform(X_test)
36
37callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=3)
38
39btc_model_4 = tf.keras.Sequential([
40 layers.Dense(100, activation="relu"),
41 layers.Dense(100, activation="relu"),
42 layers.Dense(100, activation="relu"),
43 layers.Dense(100, activation="relu"),
44 layers.Dense(100, activation="relu"),
45 layers.Dense(100, activation="relu"),
46 layers.Dense(1, activation="linear")
47])
48
49btc_model_4.compile(loss = tf.losses.MeanSquaredError(),
50 optimizer = tf.optimizers.Adam())
51
52history = btc_model_4.fit(X_train_normal, y_train, batch_size=8192, epochs=100, callbacks=[callback])
53
54btc_model_4.evaluate(X_test_normal, y_test, batch_size=8192)
55
56y_pred = btc_model_4.predict(X_test_normal)
57
58btc_model_4.save("btc_model_4")
59btc_model_4.save("btc_model_4.h5")
60
61# plot model
62def plot_evaluations(train_data=X_train_normal,
63 train_labels=y_train,
64 test_data=X_test_normal,
65 test_labels=y_test,
66 predictions=y_pred):
67 print(test_data.shape)
68 print(predictions.shape)
69
70 plt.figure(figsize=(100, 15))
71 plt.scatter(train_data, train_labels, c='b', label="Training")
72 plt.scatter(test_data, test_labels, c='g', label="Testing")
73 plt.scatter(test_data, predictions, c='r', label="Results")
74 plt.legend()
75
76plot_evaluations()
77
78# plot loss curve
79pd.DataFrame(history.history).plot()
80plt.ylabel("loss")
81plt.xlabel("epochs")
822015-12-05 12:52:00,377.48
832015-12-05 12:53:00,377.5
842015-12-05 12:54:00,377.5
852015-12-05 12:56:00,377.5
862015-12-05 12:57:00,377.5
872015-12-05 12:58:00,377.5
882015-12-05 12:59:00,377.5
892015-12-05 13:00:00,377.5
902015-12-05 13:01:00,377.79
912015-12-05 13:02:00,377.5
922015-12-05 13:03:00,377.79
932015-12-05 13:05:00,377.74
942015-12-05 13:06:00,377.79
952015-12-05 13:07:00,377.64
962015-12-05 13:08:00,377.79
972015-12-05 13:10:00,377.77
982015-12-05 13:11:00,377.7
992015-12-05 13:12:00,377.77
1002015-12-05 13:13:00,377.77
1012015-12-05 13:14:00,377.79
1022015-12-05 13:15:00,377.72
1032015-12-05 13:16:00,377.5
1042015-12-05 13:17:00,377.49
1052015-12-05 13:18:00,377.5
1062015-12-05 13:19:00,377.5
1072015-12-05 13:20:00,377.8
1082015-12-05 13:21:00,377.84
1092015-12-05 13:22:00,378.29
1102015-12-05 13:23:00,378.3
1112015-12-05 13:24:00,378.3
1122015-12-05 13:25:00,378.33
1132015-12-05 13:26:00,378.33
1142015-12-05 13:28:00,378.31
1152015-12-05 13:29:00,378.68
116 (0, 0) 1.0
117 (1, 1) 1.0
118 (2, 2) 1.0
119 (3, 3) 1.0
120 (4, 4) 1.0
121 (5, 5) 1.0
122 (6, 6) 1.0
123 (7, 7) 1.0
124 (8, 8) 1.0
125 (9, 9) 1.0
126 (10, 10) 1.0
127 (11, 11) 1.0
128 (12, 12) 1.0
129 (13, 13) 1.0
130 (14, 14) 1.0
131 (15, 15) 1.0
132 (16, 16) 1.0
133 (17, 17) 1.0
134 (18, 18) 1.0
135 (19, 19) 1.0
136 (20, 20) 1.0
137 (21, 21) 1.0
138 (22, 22) 1.0
139 (23, 23) 1.0
140 (24, 24) 1.0
141 : :
142 (2526096, 2526096) 1.0
143 (2526097, 2526097) 1.0
144 (2526098, 2526098) 1.0
145 (2526099, 2526099) 1.0
146 (2526100, 2526100) 1.0
147 (2526101, 2526101) 1.0
148 (2526102, 2526102) 1.0
149 (2526103, 2526103) 1.0
150 (2526104, 2526104) 1.0
151 (2526105, 2526105) 1.0
152 (2526106, 2526106) 1.0
153 (2526107, 2526107) 1.0
154 (2526108, 2526108) 1.0
155 (2526109, 2526109) 1.0
156 (2526110, 2526110) 1.0
157 (2526111, 2526111) 1.0
158 (2526112, 2526112) 1.0
159 (2526113, 2526113) 1.0
160 (2526114, 2526114) 1.0
161 (2526115, 2526115) 1.0
162 (2526116, 2526116) 1.0
163 (2526117, 2526117) 1.0
164 (2526118, 2526118) 1.0
165 (2526119, 2526119) 1.0
166 (2526120, 2526120) 1.0
167---------------------------------------------------------------------------
168AttributeError Traceback (most recent call last)
169/usr/local/lib/python3.7/dist-packages/sklearn/utils/__init__.py in _get_column_indices(X, key)
170 408 try:
171--> 409 all_columns = X.columns
172 410 except AttributeError:
173
1745 frames
175AttributeError: columns not found
176
177During handling of the above exception, another exception occurred:
178
179ValueError Traceback (most recent call last)
180/usr/local/lib/python3.7/dist-packages/sklearn/utils/__init__.py in _get_column_indices(X, key)
181 410 except AttributeError:
182 411 raise ValueError(
183--> 412 "Specifying the columns using strings is only "
184 413 "supported for pandas DataFrames"
185 414 )
186
187ValueError: Specifying the columns using strings is only supported for pandas DataFrames
188btc_data['Year'] = btc_data['Date'].astype('datetime64[ns]').dt.year
189btc_data['Month'] = btc_data['Date'].astype('datetime64[ns]').dt.month
190btc_data['Day'] = btc_data['Date'].astype('datetime64[ns]').dt.day
191
192enc = OneHotEncoder(handle_unknown="ignore")
193enc.fit(X_btc)
194X_btc = enc.transform(X_btc)
195X_btc = pd.DataFrame(X_btc.todense())
196X_btc["Time"] = btc_data["Time"]
197
one way to countournate the memory issue is :
- Generate two indexes with the same random_state, one for the pandas data frame and one for the scipy sparse matrix
1# model 4
2# next model
3import tensorflow as tf
4import matplotlib.pyplot as plt
5import pandas as pd
6import numpy as np
7from tensorflow.keras import layers
8from sklearn.compose import make_column_transformer
9from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
10from sklearn.model_selection import train_test_split
11
12np.set_printoptions(precision=3, suppress=True)
13btc_data = pd.read_csv(
14 "/content/drive/MyDrive/Science Fair/output2.csv",
15 names=["Time", "Open"])
16
17X_btc = btc_data[["Time"]]
18y_btc = btc_data["Open"]
19
20enc = OneHotEncoder(handle_unknown="ignore")
21enc.fit(X_btc)
22
23X_btc = enc.transform(X_btc)
24
25print(X_btc)
26
27X_train, X_test, y_train, y_test = train_test_split(X_btc, y_btc, test_size=0.2, random_state=62)
28
29ct = make_column_transformer(
30 (MinMaxScaler(), ["Time"])
31)
32
33ct.fit(X_train)
34X_train_normal = ct.transform(X_train)
35X_test_normal = ct.transform(X_test)
36
37callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=3)
38
39btc_model_4 = tf.keras.Sequential([
40 layers.Dense(100, activation="relu"),
41 layers.Dense(100, activation="relu"),
42 layers.Dense(100, activation="relu"),
43 layers.Dense(100, activation="relu"),
44 layers.Dense(100, activation="relu"),
45 layers.Dense(100, activation="relu"),
46 layers.Dense(1, activation="linear")
47])
48
49btc_model_4.compile(loss = tf.losses.MeanSquaredError(),
50 optimizer = tf.optimizers.Adam())
51
52history = btc_model_4.fit(X_train_normal, y_train, batch_size=8192, epochs=100, callbacks=[callback])
53
54btc_model_4.evaluate(X_test_normal, y_test, batch_size=8192)
55
56y_pred = btc_model_4.predict(X_test_normal)
57
58btc_model_4.save("btc_model_4")
59btc_model_4.save("btc_model_4.h5")
60
61# plot model
62def plot_evaluations(train_data=X_train_normal,
63 train_labels=y_train,
64 test_data=X_test_normal,
65 test_labels=y_test,
66 predictions=y_pred):
67 print(test_data.shape)
68 print(predictions.shape)
69
70 plt.figure(figsize=(100, 15))
71 plt.scatter(train_data, train_labels, c='b', label="Training")
72 plt.scatter(test_data, test_labels, c='g', label="Testing")
73 plt.scatter(test_data, predictions, c='r', label="Results")
74 plt.legend()
75
76plot_evaluations()
77
78# plot loss curve
79pd.DataFrame(history.history).plot()
80plt.ylabel("loss")
81plt.xlabel("epochs")
822015-12-05 12:52:00,377.48
832015-12-05 12:53:00,377.5
842015-12-05 12:54:00,377.5
852015-12-05 12:56:00,377.5
862015-12-05 12:57:00,377.5
872015-12-05 12:58:00,377.5
882015-12-05 12:59:00,377.5
892015-12-05 13:00:00,377.5
902015-12-05 13:01:00,377.79
912015-12-05 13:02:00,377.5
922015-12-05 13:03:00,377.79
932015-12-05 13:05:00,377.74
942015-12-05 13:06:00,377.79
952015-12-05 13:07:00,377.64
962015-12-05 13:08:00,377.79
972015-12-05 13:10:00,377.77
982015-12-05 13:11:00,377.7
992015-12-05 13:12:00,377.77
1002015-12-05 13:13:00,377.77
1012015-12-05 13:14:00,377.79
1022015-12-05 13:15:00,377.72
1032015-12-05 13:16:00,377.5
1042015-12-05 13:17:00,377.49
1052015-12-05 13:18:00,377.5
1062015-12-05 13:19:00,377.5
1072015-12-05 13:20:00,377.8
1082015-12-05 13:21:00,377.84
1092015-12-05 13:22:00,378.29
1102015-12-05 13:23:00,378.3
1112015-12-05 13:24:00,378.3
1122015-12-05 13:25:00,378.33
1132015-12-05 13:26:00,378.33
1142015-12-05 13:28:00,378.31
1152015-12-05 13:29:00,378.68
116 (0, 0) 1.0
117 (1, 1) 1.0
118 (2, 2) 1.0
119 (3, 3) 1.0
120 (4, 4) 1.0
121 (5, 5) 1.0
122 (6, 6) 1.0
123 (7, 7) 1.0
124 (8, 8) 1.0
125 (9, 9) 1.0
126 (10, 10) 1.0
127 (11, 11) 1.0
128 (12, 12) 1.0
129 (13, 13) 1.0
130 (14, 14) 1.0
131 (15, 15) 1.0
132 (16, 16) 1.0
133 (17, 17) 1.0
134 (18, 18) 1.0
135 (19, 19) 1.0
136 (20, 20) 1.0
137 (21, 21) 1.0
138 (22, 22) 1.0
139 (23, 23) 1.0
140 (24, 24) 1.0
141 : :
142 (2526096, 2526096) 1.0
143 (2526097, 2526097) 1.0
144 (2526098, 2526098) 1.0
145 (2526099, 2526099) 1.0
146 (2526100, 2526100) 1.0
147 (2526101, 2526101) 1.0
148 (2526102, 2526102) 1.0
149 (2526103, 2526103) 1.0
150 (2526104, 2526104) 1.0
151 (2526105, 2526105) 1.0
152 (2526106, 2526106) 1.0
153 (2526107, 2526107) 1.0
154 (2526108, 2526108) 1.0
155 (2526109, 2526109) 1.0
156 (2526110, 2526110) 1.0
157 (2526111, 2526111) 1.0
158 (2526112, 2526112) 1.0
159 (2526113, 2526113) 1.0
160 (2526114, 2526114) 1.0
161 (2526115, 2526115) 1.0
162 (2526116, 2526116) 1.0
163 (2526117, 2526117) 1.0
164 (2526118, 2526118) 1.0
165 (2526119, 2526119) 1.0
166 (2526120, 2526120) 1.0
167---------------------------------------------------------------------------
168AttributeError Traceback (most recent call last)
169/usr/local/lib/python3.7/dist-packages/sklearn/utils/__init__.py in _get_column_indices(X, key)
170 408 try:
171--> 409 all_columns = X.columns
172 410 except AttributeError:
173
1745 frames
175AttributeError: columns not found
176
177During handling of the above exception, another exception occurred:
178
179ValueError Traceback (most recent call last)
180/usr/local/lib/python3.7/dist-packages/sklearn/utils/__init__.py in _get_column_indices(X, key)
181 410 except AttributeError:
182 411 raise ValueError(
183--> 412 "Specifying the columns using strings is only "
184 413 "supported for pandas DataFrames"
185 414 )
186
187ValueError: Specifying the columns using strings is only supported for pandas DataFrames
188btc_data['Year'] = btc_data['Date'].astype('datetime64[ns]').dt.year
189btc_data['Month'] = btc_data['Date'].astype('datetime64[ns]').dt.month
190btc_data['Day'] = btc_data['Date'].astype('datetime64[ns]').dt.day
191
192enc = OneHotEncoder(handle_unknown="ignore")
193enc.fit(X_btc)
194X_btc = enc.transform(X_btc)
195X_btc = pd.DataFrame(X_btc.todense())
196X_btc["Time"] = btc_data["Time"]
197X_train, X_test, y_train, y_test = train_test_split(X_btc, y_btc, test_size=0.2, random_state=62)
198X_train_pd, X_test_pd, y_train_pd, y_test_pd = train_test_split(btc_data, y_btc, test_size=0.2, random_state=62)
199
- Use the pandas data frame for the MinMaxScaler().
1# model 4
2# next model
3import tensorflow as tf
4import matplotlib.pyplot as plt
5import pandas as pd
6import numpy as np
7from tensorflow.keras import layers
8from sklearn.compose import make_column_transformer
9from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
10from sklearn.model_selection import train_test_split
11
12np.set_printoptions(precision=3, suppress=True)
13btc_data = pd.read_csv(
14 "/content/drive/MyDrive/Science Fair/output2.csv",
15 names=["Time", "Open"])
16
17X_btc = btc_data[["Time"]]
18y_btc = btc_data["Open"]
19
20enc = OneHotEncoder(handle_unknown="ignore")
21enc.fit(X_btc)
22
23X_btc = enc.transform(X_btc)
24
25print(X_btc)
26
27X_train, X_test, y_train, y_test = train_test_split(X_btc, y_btc, test_size=0.2, random_state=62)
28
29ct = make_column_transformer(
30 (MinMaxScaler(), ["Time"])
31)
32
33ct.fit(X_train)
34X_train_normal = ct.transform(X_train)
35X_test_normal = ct.transform(X_test)
36
37callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=3)
38
39btc_model_4 = tf.keras.Sequential([
40 layers.Dense(100, activation="relu"),
41 layers.Dense(100, activation="relu"),
42 layers.Dense(100, activation="relu"),
43 layers.Dense(100, activation="relu"),
44 layers.Dense(100, activation="relu"),
45 layers.Dense(100, activation="relu"),
46 layers.Dense(1, activation="linear")
47])
48
49btc_model_4.compile(loss = tf.losses.MeanSquaredError(),
50 optimizer = tf.optimizers.Adam())
51
52history = btc_model_4.fit(X_train_normal, y_train, batch_size=8192, epochs=100, callbacks=[callback])
53
54btc_model_4.evaluate(X_test_normal, y_test, batch_size=8192)
55
56y_pred = btc_model_4.predict(X_test_normal)
57
58btc_model_4.save("btc_model_4")
59btc_model_4.save("btc_model_4.h5")
60
61# plot model
62def plot_evaluations(train_data=X_train_normal,
63 train_labels=y_train,
64 test_data=X_test_normal,
65 test_labels=y_test,
66 predictions=y_pred):
67 print(test_data.shape)
68 print(predictions.shape)
69
70 plt.figure(figsize=(100, 15))
71 plt.scatter(train_data, train_labels, c='b', label="Training")
72 plt.scatter(test_data, test_labels, c='g', label="Testing")
73 plt.scatter(test_data, predictions, c='r', label="Results")
74 plt.legend()
75
76plot_evaluations()
77
78# plot loss curve
79pd.DataFrame(history.history).plot()
80plt.ylabel("loss")
81plt.xlabel("epochs")
822015-12-05 12:52:00,377.48
832015-12-05 12:53:00,377.5
842015-12-05 12:54:00,377.5
852015-12-05 12:56:00,377.5
862015-12-05 12:57:00,377.5
872015-12-05 12:58:00,377.5
882015-12-05 12:59:00,377.5
892015-12-05 13:00:00,377.5
902015-12-05 13:01:00,377.79
912015-12-05 13:02:00,377.5
922015-12-05 13:03:00,377.79
932015-12-05 13:05:00,377.74
942015-12-05 13:06:00,377.79
952015-12-05 13:07:00,377.64
962015-12-05 13:08:00,377.79
972015-12-05 13:10:00,377.77
982015-12-05 13:11:00,377.7
992015-12-05 13:12:00,377.77
1002015-12-05 13:13:00,377.77
1012015-12-05 13:14:00,377.79
1022015-12-05 13:15:00,377.72
1032015-12-05 13:16:00,377.5
1042015-12-05 13:17:00,377.49
1052015-12-05 13:18:00,377.5
1062015-12-05 13:19:00,377.5
1072015-12-05 13:20:00,377.8
1082015-12-05 13:21:00,377.84
1092015-12-05 13:22:00,378.29
1102015-12-05 13:23:00,378.3
1112015-12-05 13:24:00,378.3
1122015-12-05 13:25:00,378.33
1132015-12-05 13:26:00,378.33
1142015-12-05 13:28:00,378.31
1152015-12-05 13:29:00,378.68
116 (0, 0) 1.0
117 (1, 1) 1.0
118 (2, 2) 1.0
119 (3, 3) 1.0
120 (4, 4) 1.0
121 (5, 5) 1.0
122 (6, 6) 1.0
123 (7, 7) 1.0
124 (8, 8) 1.0
125 (9, 9) 1.0
126 (10, 10) 1.0
127 (11, 11) 1.0
128 (12, 12) 1.0
129 (13, 13) 1.0
130 (14, 14) 1.0
131 (15, 15) 1.0
132 (16, 16) 1.0
133 (17, 17) 1.0
134 (18, 18) 1.0
135 (19, 19) 1.0
136 (20, 20) 1.0
137 (21, 21) 1.0
138 (22, 22) 1.0
139 (23, 23) 1.0
140 (24, 24) 1.0
141 : :
142 (2526096, 2526096) 1.0
143 (2526097, 2526097) 1.0
144 (2526098, 2526098) 1.0
145 (2526099, 2526099) 1.0
146 (2526100, 2526100) 1.0
147 (2526101, 2526101) 1.0
148 (2526102, 2526102) 1.0
149 (2526103, 2526103) 1.0
150 (2526104, 2526104) 1.0
151 (2526105, 2526105) 1.0
152 (2526106, 2526106) 1.0
153 (2526107, 2526107) 1.0
154 (2526108, 2526108) 1.0
155 (2526109, 2526109) 1.0
156 (2526110, 2526110) 1.0
157 (2526111, 2526111) 1.0
158 (2526112, 2526112) 1.0
159 (2526113, 2526113) 1.0
160 (2526114, 2526114) 1.0
161 (2526115, 2526115) 1.0
162 (2526116, 2526116) 1.0
163 (2526117, 2526117) 1.0
164 (2526118, 2526118) 1.0
165 (2526119, 2526119) 1.0
166 (2526120, 2526120) 1.0
167---------------------------------------------------------------------------
168AttributeError Traceback (most recent call last)
169/usr/local/lib/python3.7/dist-packages/sklearn/utils/__init__.py in _get_column_indices(X, key)
170 408 try:
171--> 409 all_columns = X.columns
172 410 except AttributeError:
173
1745 frames
175AttributeError: columns not found
176
177During handling of the above exception, another exception occurred:
178
179ValueError Traceback (most recent call last)
180/usr/local/lib/python3.7/dist-packages/sklearn/utils/__init__.py in _get_column_indices(X, key)
181 410 except AttributeError:
182 411 raise ValueError(
183--> 412 "Specifying the columns using strings is only "
184 413 "supported for pandas DataFrames"
185 414 )
186
187ValueError: Specifying the columns using strings is only supported for pandas DataFrames
188btc_data['Year'] = btc_data['Date'].astype('datetime64[ns]').dt.year
189btc_data['Month'] = btc_data['Date'].astype('datetime64[ns]').dt.month
190btc_data['Day'] = btc_data['Date'].astype('datetime64[ns]').dt.day
191
192enc = OneHotEncoder(handle_unknown="ignore")
193enc.fit(X_btc)
194X_btc = enc.transform(X_btc)
195X_btc = pd.DataFrame(X_btc.todense())
196X_btc["Time"] = btc_data["Time"]
197X_train, X_test, y_train, y_test = train_test_split(X_btc, y_btc, test_size=0.2, random_state=62)
198X_train_pd, X_test_pd, y_train_pd, y_test_pd = train_test_split(btc_data, y_btc, test_size=0.2, random_state=62)
199 ct = make_column_transformer((MinMaxScaler(), ["Time"]))
200 ct.fit(X_train_pd)
201 result_train = ct.transform(X_train_pd)
202 result_test = ct.transform(X_test_pd)
203
- Use generators for load data in train and test phase ( this will get ride of the memory issue ) and include the scaled time in the generators.
1# model 4
2# next model
3import tensorflow as tf
4import matplotlib.pyplot as plt
5import pandas as pd
6import numpy as np
7from tensorflow.keras import layers
8from sklearn.compose import make_column_transformer
9from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
10from sklearn.model_selection import train_test_split
11
12np.set_printoptions(precision=3, suppress=True)
13btc_data = pd.read_csv(
14 "/content/drive/MyDrive/Science Fair/output2.csv",
15 names=["Time", "Open"])
16
17X_btc = btc_data[["Time"]]
18y_btc = btc_data["Open"]
19
20enc = OneHotEncoder(handle_unknown="ignore")
21enc.fit(X_btc)
22
23X_btc = enc.transform(X_btc)
24
25print(X_btc)
26
27X_train, X_test, y_train, y_test = train_test_split(X_btc, y_btc, test_size=0.2, random_state=62)
28
29ct = make_column_transformer(
30 (MinMaxScaler(), ["Time"])
31)
32
33ct.fit(X_train)
34X_train_normal = ct.transform(X_train)
35X_test_normal = ct.transform(X_test)
36
37callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=3)
38
39btc_model_4 = tf.keras.Sequential([
40 layers.Dense(100, activation="relu"),
41 layers.Dense(100, activation="relu"),
42 layers.Dense(100, activation="relu"),
43 layers.Dense(100, activation="relu"),
44 layers.Dense(100, activation="relu"),
45 layers.Dense(100, activation="relu"),
46 layers.Dense(1, activation="linear")
47])
48
49btc_model_4.compile(loss = tf.losses.MeanSquaredError(),
50 optimizer = tf.optimizers.Adam())
51
52history = btc_model_4.fit(X_train_normal, y_train, batch_size=8192, epochs=100, callbacks=[callback])
53
54btc_model_4.evaluate(X_test_normal, y_test, batch_size=8192)
55
56y_pred = btc_model_4.predict(X_test_normal)
57
58btc_model_4.save("btc_model_4")
59btc_model_4.save("btc_model_4.h5")
60
61# plot model
62def plot_evaluations(train_data=X_train_normal,
63 train_labels=y_train,
64 test_data=X_test_normal,
65 test_labels=y_test,
66 predictions=y_pred):
67 print(test_data.shape)
68 print(predictions.shape)
69
70 plt.figure(figsize=(100, 15))
71 plt.scatter(train_data, train_labels, c='b', label="Training")
72 plt.scatter(test_data, test_labels, c='g', label="Testing")
73 plt.scatter(test_data, predictions, c='r', label="Results")
74 plt.legend()
75
76plot_evaluations()
77
78# plot loss curve
79pd.DataFrame(history.history).plot()
80plt.ylabel("loss")
81plt.xlabel("epochs")
822015-12-05 12:52:00,377.48
832015-12-05 12:53:00,377.5
842015-12-05 12:54:00,377.5
852015-12-05 12:56:00,377.5
862015-12-05 12:57:00,377.5
872015-12-05 12:58:00,377.5
882015-12-05 12:59:00,377.5
892015-12-05 13:00:00,377.5
902015-12-05 13:01:00,377.79
912015-12-05 13:02:00,377.5
922015-12-05 13:03:00,377.79
932015-12-05 13:05:00,377.74
942015-12-05 13:06:00,377.79
952015-12-05 13:07:00,377.64
962015-12-05 13:08:00,377.79
972015-12-05 13:10:00,377.77
982015-12-05 13:11:00,377.7
992015-12-05 13:12:00,377.77
1002015-12-05 13:13:00,377.77
1012015-12-05 13:14:00,377.79
1022015-12-05 13:15:00,377.72
1032015-12-05 13:16:00,377.5
1042015-12-05 13:17:00,377.49
1052015-12-05 13:18:00,377.5
1062015-12-05 13:19:00,377.5
1072015-12-05 13:20:00,377.8
1082015-12-05 13:21:00,377.84
1092015-12-05 13:22:00,378.29
1102015-12-05 13:23:00,378.3
1112015-12-05 13:24:00,378.3
1122015-12-05 13:25:00,378.33
1132015-12-05 13:26:00,378.33
1142015-12-05 13:28:00,378.31
1152015-12-05 13:29:00,378.68
116 (0, 0) 1.0
117 (1, 1) 1.0
118 (2, 2) 1.0
119 (3, 3) 1.0
120 (4, 4) 1.0
121 (5, 5) 1.0
122 (6, 6) 1.0
123 (7, 7) 1.0
124 (8, 8) 1.0
125 (9, 9) 1.0
126 (10, 10) 1.0
127 (11, 11) 1.0
128 (12, 12) 1.0
129 (13, 13) 1.0
130 (14, 14) 1.0
131 (15, 15) 1.0
132 (16, 16) 1.0
133 (17, 17) 1.0
134 (18, 18) 1.0
135 (19, 19) 1.0
136 (20, 20) 1.0
137 (21, 21) 1.0
138 (22, 22) 1.0
139 (23, 23) 1.0
140 (24, 24) 1.0
141 : :
142 (2526096, 2526096) 1.0
143 (2526097, 2526097) 1.0
144 (2526098, 2526098) 1.0
145 (2526099, 2526099) 1.0
146 (2526100, 2526100) 1.0
147 (2526101, 2526101) 1.0
148 (2526102, 2526102) 1.0
149 (2526103, 2526103) 1.0
150 (2526104, 2526104) 1.0
151 (2526105, 2526105) 1.0
152 (2526106, 2526106) 1.0
153 (2526107, 2526107) 1.0
154 (2526108, 2526108) 1.0
155 (2526109, 2526109) 1.0
156 (2526110, 2526110) 1.0
157 (2526111, 2526111) 1.0
158 (2526112, 2526112) 1.0
159 (2526113, 2526113) 1.0
160 (2526114, 2526114) 1.0
161 (2526115, 2526115) 1.0
162 (2526116, 2526116) 1.0
163 (2526117, 2526117) 1.0
164 (2526118, 2526118) 1.0
165 (2526119, 2526119) 1.0
166 (2526120, 2526120) 1.0
167---------------------------------------------------------------------------
168AttributeError Traceback (most recent call last)
169/usr/local/lib/python3.7/dist-packages/sklearn/utils/__init__.py in _get_column_indices(X, key)
170 408 try:
171--> 409 all_columns = X.columns
172 410 except AttributeError:
173
1745 frames
175AttributeError: columns not found
176
177During handling of the above exception, another exception occurred:
178
179ValueError Traceback (most recent call last)
180/usr/local/lib/python3.7/dist-packages/sklearn/utils/__init__.py in _get_column_indices(X, key)
181 410 except AttributeError:
182 411 raise ValueError(
183--> 412 "Specifying the columns using strings is only "
184 413 "supported for pandas DataFrames"
185 414 )
186
187ValueError: Specifying the columns using strings is only supported for pandas DataFrames
188btc_data['Year'] = btc_data['Date'].astype('datetime64[ns]').dt.year
189btc_data['Month'] = btc_data['Date'].astype('datetime64[ns]').dt.month
190btc_data['Day'] = btc_data['Date'].astype('datetime64[ns]').dt.day
191
192enc = OneHotEncoder(handle_unknown="ignore")
193enc.fit(X_btc)
194X_btc = enc.transform(X_btc)
195X_btc = pd.DataFrame(X_btc.todense())
196X_btc["Time"] = btc_data["Time"]
197X_train, X_test, y_train, y_test = train_test_split(X_btc, y_btc, test_size=0.2, random_state=62)
198X_train_pd, X_test_pd, y_train_pd, y_test_pd = train_test_split(btc_data, y_btc, test_size=0.2, random_state=62)
199 ct = make_column_transformer((MinMaxScaler(), ["Time"]))
200 ct.fit(X_train_pd)
201 result_train = ct.transform(X_train_pd)
202 result_test = ct.transform(X_test_pd)
203def nn_batch_generator(X_data, y_data, scaled, batch_size):
204 samples_per_epoch = X_data.shape[0]
205 number_of_batches = samples_per_epoch / batch_size
206 counter = 0
207 index = np.arange(np.shape(y_data)[0])
208 while True:
209 index_batch = index[batch_size * counter:batch_size * (counter + 1)]
210 scaled_array = scaled[index_batch]
211 X_batch = X_data[index_batch, :].todense()
212 y_batch = y_data.iloc[index_batch]
213 counter += 1
214 yield np.array(np.hstack((np.array(X_batch), scaled_array))), np.array(y_batch)
215 if (counter > number_of_batches):
216 counter = 0
217
218
219def nn_batch_generator_test(X_data, scaled, batch_size):
220 samples_per_epoch = X_data.shape[0]
221 number_of_batches = samples_per_epoch / batch_size
222 counter = 0
223 index = np.arange(np.shape(X_data)[0])
224 while True:
225 index_batch = index[batch_size * counter:batch_size * (counter + 1)]
226 scaled_array = scaled[index_batch]
227 X_batch = X_data[index_batch, :].todense()
228 counter += 1
229 yield np.hstack((X_batch, scaled_array))
230 if (counter > number_of_batches):
231 counter = 0
232
233
Finally fit the model
1# model 4
2# next model
3import tensorflow as tf
4import matplotlib.pyplot as plt
5import pandas as pd
6import numpy as np
7from tensorflow.keras import layers
8from sklearn.compose import make_column_transformer
9from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
10from sklearn.model_selection import train_test_split
11
12np.set_printoptions(precision=3, suppress=True)
13btc_data = pd.read_csv(
14 "/content/drive/MyDrive/Science Fair/output2.csv",
15 names=["Time", "Open"])
16
17X_btc = btc_data[["Time"]]
18y_btc = btc_data["Open"]
19
20enc = OneHotEncoder(handle_unknown="ignore")
21enc.fit(X_btc)
22
23X_btc = enc.transform(X_btc)
24
25print(X_btc)
26
27X_train, X_test, y_train, y_test = train_test_split(X_btc, y_btc, test_size=0.2, random_state=62)
28
29ct = make_column_transformer(
30 (MinMaxScaler(), ["Time"])
31)
32
33ct.fit(X_train)
34X_train_normal = ct.transform(X_train)
35X_test_normal = ct.transform(X_test)
36
37callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=3)
38
39btc_model_4 = tf.keras.Sequential([
40 layers.Dense(100, activation="relu"),
41 layers.Dense(100, activation="relu"),
42 layers.Dense(100, activation="relu"),
43 layers.Dense(100, activation="relu"),
44 layers.Dense(100, activation="relu"),
45 layers.Dense(100, activation="relu"),
46 layers.Dense(1, activation="linear")
47])
48
49btc_model_4.compile(loss = tf.losses.MeanSquaredError(),
50 optimizer = tf.optimizers.Adam())
51
52history = btc_model_4.fit(X_train_normal, y_train, batch_size=8192, epochs=100, callbacks=[callback])
53
54btc_model_4.evaluate(X_test_normal, y_test, batch_size=8192)
55
56y_pred = btc_model_4.predict(X_test_normal)
57
58btc_model_4.save("btc_model_4")
59btc_model_4.save("btc_model_4.h5")
60
61# plot model
62def plot_evaluations(train_data=X_train_normal,
63 train_labels=y_train,
64 test_data=X_test_normal,
65 test_labels=y_test,
66 predictions=y_pred):
67 print(test_data.shape)
68 print(predictions.shape)
69
70 plt.figure(figsize=(100, 15))
71 plt.scatter(train_data, train_labels, c='b', label="Training")
72 plt.scatter(test_data, test_labels, c='g', label="Testing")
73 plt.scatter(test_data, predictions, c='r', label="Results")
74 plt.legend()
75
76plot_evaluations()
77
78# plot loss curve
79pd.DataFrame(history.history).plot()
80plt.ylabel("loss")
81plt.xlabel("epochs")
822015-12-05 12:52:00,377.48
832015-12-05 12:53:00,377.5
842015-12-05 12:54:00,377.5
852015-12-05 12:56:00,377.5
862015-12-05 12:57:00,377.5
872015-12-05 12:58:00,377.5
882015-12-05 12:59:00,377.5
892015-12-05 13:00:00,377.5
902015-12-05 13:01:00,377.79
912015-12-05 13:02:00,377.5
922015-12-05 13:03:00,377.79
932015-12-05 13:05:00,377.74
942015-12-05 13:06:00,377.79
952015-12-05 13:07:00,377.64
962015-12-05 13:08:00,377.79
972015-12-05 13:10:00,377.77
982015-12-05 13:11:00,377.7
992015-12-05 13:12:00,377.77
1002015-12-05 13:13:00,377.77
1012015-12-05 13:14:00,377.79
1022015-12-05 13:15:00,377.72
1032015-12-05 13:16:00,377.5
1042015-12-05 13:17:00,377.49
1052015-12-05 13:18:00,377.5
1062015-12-05 13:19:00,377.5
1072015-12-05 13:20:00,377.8
1082015-12-05 13:21:00,377.84
1092015-12-05 13:22:00,378.29
1102015-12-05 13:23:00,378.3
1112015-12-05 13:24:00,378.3
1122015-12-05 13:25:00,378.33
1132015-12-05 13:26:00,378.33
1142015-12-05 13:28:00,378.31
1152015-12-05 13:29:00,378.68
116 (0, 0) 1.0
117 (1, 1) 1.0
118 (2, 2) 1.0
119 (3, 3) 1.0
120 (4, 4) 1.0
121 (5, 5) 1.0
122 (6, 6) 1.0
123 (7, 7) 1.0
124 (8, 8) 1.0
125 (9, 9) 1.0
126 (10, 10) 1.0
127 (11, 11) 1.0
128 (12, 12) 1.0
129 (13, 13) 1.0
130 (14, 14) 1.0
131 (15, 15) 1.0
132 (16, 16) 1.0
133 (17, 17) 1.0
134 (18, 18) 1.0
135 (19, 19) 1.0
136 (20, 20) 1.0
137 (21, 21) 1.0
138 (22, 22) 1.0
139 (23, 23) 1.0
140 (24, 24) 1.0
141 : :
142 (2526096, 2526096) 1.0
143 (2526097, 2526097) 1.0
144 (2526098, 2526098) 1.0
145 (2526099, 2526099) 1.0
146 (2526100, 2526100) 1.0
147 (2526101, 2526101) 1.0
148 (2526102, 2526102) 1.0
149 (2526103, 2526103) 1.0
150 (2526104, 2526104) 1.0
151 (2526105, 2526105) 1.0
152 (2526106, 2526106) 1.0
153 (2526107, 2526107) 1.0
154 (2526108, 2526108) 1.0
155 (2526109, 2526109) 1.0
156 (2526110, 2526110) 1.0
157 (2526111, 2526111) 1.0
158 (2526112, 2526112) 1.0
159 (2526113, 2526113) 1.0
160 (2526114, 2526114) 1.0
161 (2526115, 2526115) 1.0
162 (2526116, 2526116) 1.0
163 (2526117, 2526117) 1.0
164 (2526118, 2526118) 1.0
165 (2526119, 2526119) 1.0
166 (2526120, 2526120) 1.0
167---------------------------------------------------------------------------
168AttributeError Traceback (most recent call last)
169/usr/local/lib/python3.7/dist-packages/sklearn/utils/__init__.py in _get_column_indices(X, key)
170 408 try:
171--> 409 all_columns = X.columns
172 410 except AttributeError:
173
1745 frames
175AttributeError: columns not found
176
177During handling of the above exception, another exception occurred:
178
179ValueError Traceback (most recent call last)
180/usr/local/lib/python3.7/dist-packages/sklearn/utils/__init__.py in _get_column_indices(X, key)
181 410 except AttributeError:
182 411 raise ValueError(
183--> 412 "Specifying the columns using strings is only "
184 413 "supported for pandas DataFrames"
185 414 )
186
187ValueError: Specifying the columns using strings is only supported for pandas DataFrames
188btc_data['Year'] = btc_data['Date'].astype('datetime64[ns]').dt.year
189btc_data['Month'] = btc_data['Date'].astype('datetime64[ns]').dt.month
190btc_data['Day'] = btc_data['Date'].astype('datetime64[ns]').dt.day
191
192enc = OneHotEncoder(handle_unknown="ignore")
193enc.fit(X_btc)
194X_btc = enc.transform(X_btc)
195X_btc = pd.DataFrame(X_btc.todense())
196X_btc["Time"] = btc_data["Time"]
197X_train, X_test, y_train, y_test = train_test_split(X_btc, y_btc, test_size=0.2, random_state=62)
198X_train_pd, X_test_pd, y_train_pd, y_test_pd = train_test_split(btc_data, y_btc, test_size=0.2, random_state=62)
199 ct = make_column_transformer((MinMaxScaler(), ["Time"]))
200 ct.fit(X_train_pd)
201 result_train = ct.transform(X_train_pd)
202 result_test = ct.transform(X_test_pd)
203def nn_batch_generator(X_data, y_data, scaled, batch_size):
204 samples_per_epoch = X_data.shape[0]
205 number_of_batches = samples_per_epoch / batch_size
206 counter = 0
207 index = np.arange(np.shape(y_data)[0])
208 while True:
209 index_batch = index[batch_size * counter:batch_size * (counter + 1)]
210 scaled_array = scaled[index_batch]
211 X_batch = X_data[index_batch, :].todense()
212 y_batch = y_data.iloc[index_batch]
213 counter += 1
214 yield np.array(np.hstack((np.array(X_batch), scaled_array))), np.array(y_batch)
215 if (counter > number_of_batches):
216 counter = 0
217
218
219def nn_batch_generator_test(X_data, scaled, batch_size):
220 samples_per_epoch = X_data.shape[0]
221 number_of_batches = samples_per_epoch / batch_size
222 counter = 0
223 index = np.arange(np.shape(X_data)[0])
224 while True:
225 index_batch = index[batch_size * counter:batch_size * (counter + 1)]
226 scaled_array = scaled[index_batch]
227 X_batch = X_data[index_batch, :].todense()
228 counter += 1
229 yield np.hstack((X_batch, scaled_array))
230 if (counter > number_of_batches):
231 counter = 0
232
233
234history = btc_model_4.fit(nn_batch_generator(X_train, y_train, scaled=result_train, batch_size=2), steps_per_epoch=#Todetermine,
235 batch_size=2, epochs=10,
236 callbacks=[callback])
237
238btc_model_4.evaluate(nn_batch_generator(X_test, y_test, scaled=result_test, batch_size=2), batch_size=2, steps=#Todetermine)
239y_pred = btc_model_4.predict(nn_batch_generator_test(X_test, scaled=result_test, batch_size=2), steps=#Todetermine)
240
241
QUESTION
What are differences between AutoModelForSequenceClassification vs AutoModel
Asked 2021-Dec-05 at 09:07We can create a model from AutoModel(TFAutoModel) function:
1from transformers import AutoModel
2model = AutoModel.from_pretrained('distilbert-base-uncase')
3
In other hand, a model is created by AutoModelForSequenceClassification(TFAutoModelForSequenceClassification):
1from transformers import AutoModel
2model = AutoModel.from_pretrained('distilbert-base-uncase')
3from transformers import AutoModelForSequenceClassification
4model = AutoModelForSequenceClassification('distilbert-base-uncase')
5
As I know, both models use distilbert-base-uncase library to create models. From name of methods, the second class( AutoModelForSequenceClassification ) is created for Sequence Classification.
But what are really differences in 2 classes? And how to use them correctly?
(I searched in huggingface but it is not clear)
ANSWER
Answered 2021-Dec-05 at 09:07The difference between AutoModel and AutoModelForSequenceClassification model is that AutoModelForSequenceClassification has a classification head on top of the model outputs which can be easily trained with the base model
QUESTION
How can I check a confusion_matrix after fine-tuning with custom datasets?
Asked 2021-Nov-24 at 13:26This question is the same with How can I check a confusion_matrix after fine-tuning with custom datasets?, on Data Science Stack Exchange.
BackgroundI would like to check a confusion_matrix, including precision, recall, and f1-score like below after fine-tuning with custom datasets.
Fine tuning process and the task are Sequence Classification with IMDb Reviews on the Fine-tuning with custom datasets tutorial on Hugging face.
After finishing the fine-tune with Trainer, how can I check a confusion_matrix in this case?
An image of confusion_matrix, including precision, recall, and f1-score original site: just for example output image
1predictions = np.argmax(trainer.test(test_x), axis=1)
2
3# Confusion matrix and classification report.
4print(classification_report(test_y, predictions))
5
6 precision recall f1-score support
7
8 0 0.75 0.79 0.77 1000
9 1 0.81 0.87 0.84 1000
10 2 0.63 0.61 0.62 1000
11 3 0.55 0.47 0.50 1000
12 4 0.66 0.66 0.66 1000
13 5 0.62 0.64 0.63 1000
14 6 0.74 0.83 0.78 1000
15 7 0.80 0.74 0.77 1000
16 8 0.85 0.81 0.83 1000
17 9 0.79 0.80 0.80 1000
18
19avg / total 0.72 0.72 0.72 10000
20
1predictions = np.argmax(trainer.test(test_x), axis=1)
2
3# Confusion matrix and classification report.
4print(classification_report(test_y, predictions))
5
6 precision recall f1-score support
7
8 0 0.75 0.79 0.77 1000
9 1 0.81 0.87 0.84 1000
10 2 0.63 0.61 0.62 1000
11 3 0.55 0.47 0.50 1000
12 4 0.66 0.66 0.66 1000
13 5 0.62 0.64 0.63 1000
14 6 0.74 0.83 0.78 1000
15 7 0.80 0.74 0.77 1000
16 8 0.85 0.81 0.83 1000
17 9 0.79 0.80 0.80 1000
18
19avg / total 0.72 0.72 0.72 10000
20from transformers import DistilBertForSequenceClassification, Trainer, TrainingArguments
21
22training_args = TrainingArguments(
23 output_dir='./results', # output directory
24 num_train_epochs=3, # total number of training epochs
25 per_device_train_batch_size=16, # batch size per device during training
26 per_device_eval_batch_size=64, # batch size for evaluation
27 warmup_steps=500, # number of warmup steps for learning rate scheduler
28 weight_decay=0.01, # strength of weight decay
29 logging_dir='./logs', # directory for storing logs
30 logging_steps=10,
31)
32
33model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")
34
35trainer = Trainer(
36 model=model, # the instantiated π€ Transformers model to be trained
37 args=training_args, # training arguments, defined above
38 train_dataset=train_dataset, # training dataset
39 eval_dataset=val_dataset # evaluation dataset
40)
41
42trainer.train()
43
Data set Preparation for Sequence Classification with IMDb Reviews, and I'm fine-tuning with Trainer.
1predictions = np.argmax(trainer.test(test_x), axis=1)
2
3# Confusion matrix and classification report.
4print(classification_report(test_y, predictions))
5
6 precision recall f1-score support
7
8 0 0.75 0.79 0.77 1000
9 1 0.81 0.87 0.84 1000
10 2 0.63 0.61 0.62 1000
11 3 0.55 0.47 0.50 1000
12 4 0.66 0.66 0.66 1000
13 5 0.62 0.64 0.63 1000
14 6 0.74 0.83 0.78 1000
15 7 0.80 0.74 0.77 1000
16 8 0.85 0.81 0.83 1000
17 9 0.79 0.80 0.80 1000
18
19avg / total 0.72 0.72 0.72 10000
20from transformers import DistilBertForSequenceClassification, Trainer, TrainingArguments
21
22training_args = TrainingArguments(
23 output_dir='./results', # output directory
24 num_train_epochs=3, # total number of training epochs
25 per_device_train_batch_size=16, # batch size per device during training
26 per_device_eval_batch_size=64, # batch size for evaluation
27 warmup_steps=500, # number of warmup steps for learning rate scheduler
28 weight_decay=0.01, # strength of weight decay
29 logging_dir='./logs', # directory for storing logs
30 logging_steps=10,
31)
32
33model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")
34
35trainer = Trainer(
36 model=model, # the instantiated π€ Transformers model to be trained
37 args=training_args, # training arguments, defined above
38 train_dataset=train_dataset, # training dataset
39 eval_dataset=val_dataset # evaluation dataset
40)
41
42trainer.train()
43from pathlib import Path
44
45def read_imdb_split(split_dir):
46 split_dir = Path(split_dir)
47 texts = []
48 labels = []
49 for label_dir in ["pos", "neg"]:
50 for text_file in (split_dir/label_dir).iterdir():
51 texts.append(text_file.read_text())
52 labels.append(0 if label_dir is "neg" else 1)
53
54 return texts, labels
55
56train_texts, train_labels = read_imdb_split('aclImdb/train')
57test_texts, test_labels = read_imdb_split('aclImdb/test')
58
59from sklearn.model_selection import train_test_split
60train_texts, val_texts, train_labels, val_labels = train_test_split(train_texts, train_labels, test_size=.2)
61
62from transformers import DistilBertTokenizerFast
63tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased')
64
65train_encodings = tokenizer(train_texts, truncation=True, padding=True)
66val_encodings = tokenizer(val_texts, truncation=True, padding=True)
67test_encodings = tokenizer(test_texts, truncation=True, padding=True)
68
69import torch
70
71class IMDbDataset(torch.utils.data.Dataset):
72 def __init__(self, encodings, labels):
73 self.encodings = encodings
74 self.labels = labels
75
76 def __getitem__(self, idx):
77 item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
78 item['labels'] = torch.tensor(self.labels[idx])
79 return item
80
81 def __len__(self):
82 return len(self.labels)
83
84train_dataset = IMDbDataset(train_encodings, train_labels)
85val_dataset = IMDbDataset(val_encodings, val_labels)
86test_dataset = IMDbDataset(test_encodings, test_labels)
87
ANSWER
Answered 2021-Nov-24 at 13:26What you could do in this situation is to iterate on the validation set(or on the test set for that matter) and manually create a list of y_true
and y_pred
.
1predictions = np.argmax(trainer.test(test_x), axis=1)
2
3# Confusion matrix and classification report.
4print(classification_report(test_y, predictions))
5
6 precision recall f1-score support
7
8 0 0.75 0.79 0.77 1000
9 1 0.81 0.87 0.84 1000
10 2 0.63 0.61 0.62 1000
11 3 0.55 0.47 0.50 1000
12 4 0.66 0.66 0.66 1000
13 5 0.62 0.64 0.63 1000
14 6 0.74 0.83 0.78 1000
15 7 0.80 0.74 0.77 1000
16 8 0.85 0.81 0.83 1000
17 9 0.79 0.80 0.80 1000
18
19avg / total 0.72 0.72 0.72 10000
20from transformers import DistilBertForSequenceClassification, Trainer, TrainingArguments
21
22training_args = TrainingArguments(
23 output_dir='./results', # output directory
24 num_train_epochs=3, # total number of training epochs
25 per_device_train_batch_size=16, # batch size per device during training
26 per_device_eval_batch_size=64, # batch size for evaluation
27 warmup_steps=500, # number of warmup steps for learning rate scheduler
28 weight_decay=0.01, # strength of weight decay
29 logging_dir='./logs', # directory for storing logs
30 logging_steps=10,
31)
32
33model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")
34
35trainer = Trainer(
36 model=model, # the instantiated π€ Transformers model to be trained
37 args=training_args, # training arguments, defined above
38 train_dataset=train_dataset, # training dataset
39 eval_dataset=val_dataset # evaluation dataset
40)
41
42trainer.train()
43from pathlib import Path
44
45def read_imdb_split(split_dir):
46 split_dir = Path(split_dir)
47 texts = []
48 labels = []
49 for label_dir in ["pos", "neg"]:
50 for text_file in (split_dir/label_dir).iterdir():
51 texts.append(text_file.read_text())
52 labels.append(0 if label_dir is "neg" else 1)
53
54 return texts, labels
55
56train_texts, train_labels = read_imdb_split('aclImdb/train')
57test_texts, test_labels = read_imdb_split('aclImdb/test')
58
59from sklearn.model_selection import train_test_split
60train_texts, val_texts, train_labels, val_labels = train_test_split(train_texts, train_labels, test_size=.2)
61
62from transformers import DistilBertTokenizerFast
63tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased')
64
65train_encodings = tokenizer(train_texts, truncation=True, padding=True)
66val_encodings = tokenizer(val_texts, truncation=True, padding=True)
67test_encodings = tokenizer(test_texts, truncation=True, padding=True)
68
69import torch
70
71class IMDbDataset(torch.utils.data.Dataset):
72 def __init__(self, encodings, labels):
73 self.encodings = encodings
74 self.labels = labels
75
76 def __getitem__(self, idx):
77 item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
78 item['labels'] = torch.tensor(self.labels[idx])
79 return item
80
81 def __len__(self):
82 return len(self.labels)
83
84train_dataset = IMDbDataset(train_encodings, train_labels)
85val_dataset = IMDbDataset(val_encodings, val_labels)
86test_dataset = IMDbDataset(test_encodings, test_labels)
87import torch
88import torch.nn.functional as F
89from sklearn import metrics
90
91y_preds = []
92y_trues = []
93for index,val_text in enumerate(val_texts):
94 tokenized_val_text = tokenizer([val_text],
95 truncation=True,
96 padding=True,
97 return_tensor='pt')
98 logits = model(tokenized_val_text)
99 prediction = F.softmax(logits, dim=1)
100 y_pred = torch.argmax(prediction).numpy()
101 y_true = val_labels[index]
102 y_preds.append(y_pred)
103 y_trues.append(y_true)
104
Finally,
1predictions = np.argmax(trainer.test(test_x), axis=1)
2
3# Confusion matrix and classification report.
4print(classification_report(test_y, predictions))
5
6 precision recall f1-score support
7
8 0 0.75 0.79 0.77 1000
9 1 0.81 0.87 0.84 1000
10 2 0.63 0.61 0.62 1000
11 3 0.55 0.47 0.50 1000
12 4 0.66 0.66 0.66 1000
13 5 0.62 0.64 0.63 1000
14 6 0.74 0.83 0.78 1000
15 7 0.80 0.74 0.77 1000
16 8 0.85 0.81 0.83 1000
17 9 0.79 0.80 0.80 1000
18
19avg / total 0.72 0.72 0.72 10000
20from transformers import DistilBertForSequenceClassification, Trainer, TrainingArguments
21
22training_args = TrainingArguments(
23 output_dir='./results', # output directory
24 num_train_epochs=3, # total number of training epochs
25 per_device_train_batch_size=16, # batch size per device during training
26 per_device_eval_batch_size=64, # batch size for evaluation
27 warmup_steps=500, # number of warmup steps for learning rate scheduler
28 weight_decay=0.01, # strength of weight decay
29 logging_dir='./logs', # directory for storing logs
30 logging_steps=10,
31)
32
33model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")
34
35trainer = Trainer(
36 model=model, # the instantiated π€ Transformers model to be trained
37 args=training_args, # training arguments, defined above
38 train_dataset=train_dataset, # training dataset
39 eval_dataset=val_dataset # evaluation dataset
40)
41
42trainer.train()
43from pathlib import Path
44
45def read_imdb_split(split_dir):
46 split_dir = Path(split_dir)
47 texts = []
48 labels = []
49 for label_dir in ["pos", "neg"]:
50 for text_file in (split_dir/label_dir).iterdir():
51 texts.append(text_file.read_text())
52 labels.append(0 if label_dir is "neg" else 1)
53
54 return texts, labels
55
56train_texts, train_labels = read_imdb_split('aclImdb/train')
57test_texts, test_labels = read_imdb_split('aclImdb/test')
58
59from sklearn.model_selection import train_test_split
60train_texts, val_texts, train_labels, val_labels = train_test_split(train_texts, train_labels, test_size=.2)
61
62from transformers import DistilBertTokenizerFast
63tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased')
64
65train_encodings = tokenizer(train_texts, truncation=True, padding=True)
66val_encodings = tokenizer(val_texts, truncation=True, padding=True)
67test_encodings = tokenizer(test_texts, truncation=True, padding=True)
68
69import torch
70
71class IMDbDataset(torch.utils.data.Dataset):
72 def __init__(self, encodings, labels):
73 self.encodings = encodings
74 self.labels = labels
75
76 def __getitem__(self, idx):
77 item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
78 item['labels'] = torch.tensor(self.labels[idx])
79 return item
80
81 def __len__(self):
82 return len(self.labels)
83
84train_dataset = IMDbDataset(train_encodings, train_labels)
85val_dataset = IMDbDataset(val_encodings, val_labels)
86test_dataset = IMDbDataset(test_encodings, test_labels)
87import torch
88import torch.nn.functional as F
89from sklearn import metrics
90
91y_preds = []
92y_trues = []
93for index,val_text in enumerate(val_texts):
94 tokenized_val_text = tokenizer([val_text],
95 truncation=True,
96 padding=True,
97 return_tensor='pt')
98 logits = model(tokenized_val_text)
99 prediction = F.softmax(logits, dim=1)
100 y_pred = torch.argmax(prediction).numpy()
101 y_true = val_labels[index]
102 y_preds.append(y_pred)
103 y_trues.append(y_true)
104confusion_matrix = metrics.confusion_matrix(y_trues, y_preds, labels=["neg", "pos"]))
105print(confusion_matrix)
106
Observations:
- The output of the model are the
logits
, not the probabilities normalized. - As such, we apply
softmax
on dimension one to transform to actual probabilities (e.g.0.2% class 0
,0.8% class 1
). - We apply the
.argmax()
operation to get the index of the class.
QUESTION
How to get SHAP values for Huggingface Transformer Model Prediction [Zero-Shot Classification]?
Asked 2021-Oct-25 at 13:25Given a Zero-Shot Classification Task via Huggingface as follows:
1from transformers import pipeline
2classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
3
4example_text = "This is an example text about snowflakes in the summer"
5labels = ["weather", "sports", "computer industry"]
6
7output = classifier(example_text, labels, multi_label=True)
8output
9{'sequence': 'This is an example text about snowflakes in the summer',
10'labels': ['weather', 'sports'],
11'scores': [0.9780895709991455, 0.021910419687628746]}
12
I am trying to extract the SHAP values to generate a text-based explanation for the prediction result like shown here: SHAP for Transformers
I already tried the following based on the above url:
1from transformers import pipeline
2classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
3
4example_text = "This is an example text about snowflakes in the summer"
5labels = ["weather", "sports", "computer industry"]
6
7output = classifier(example_text, labels, multi_label=True)
8output
9{'sequence': 'This is an example text about snowflakes in the summer',
10'labels': ['weather', 'sports'],
11'scores': [0.9780895709991455, 0.021910419687628746]}
12from transformers import AutoModelForSequenceClassification, AutoTokenizer, ZeroShotClassificationPipeline
13
14model = AutoModelForSequenceClassification.from_pretrained('facebook/bart-large-mnli')
15tokenizer = AutoTokenizer.from_pretrained('facebook/bart-large-mnli')
16
17pipe = ZeroShotClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=True)
18
19def score_and_visualize(text):
20 prediction = pipe([text])
21 print(prediction[0])
22
23 explainer = shap.Explainer(pipe)
24 shap_values = explainer([text])
25
26 shap.plots.text(shap_values)
27
28score_and_visualize(example_text)
29
Any suggestions? Thanks for your help in advance!
Alternatively to the above pipeline the following also works:
1from transformers import pipeline
2classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
3
4example_text = "This is an example text about snowflakes in the summer"
5labels = ["weather", "sports", "computer industry"]
6
7output = classifier(example_text, labels, multi_label=True)
8output
9{'sequence': 'This is an example text about snowflakes in the summer',
10'labels': ['weather', 'sports'],
11'scores': [0.9780895709991455, 0.021910419687628746]}
12from transformers import AutoModelForSequenceClassification, AutoTokenizer, ZeroShotClassificationPipeline
13
14model = AutoModelForSequenceClassification.from_pretrained('facebook/bart-large-mnli')
15tokenizer = AutoTokenizer.from_pretrained('facebook/bart-large-mnli')
16
17pipe = ZeroShotClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=True)
18
19def score_and_visualize(text):
20 prediction = pipe([text])
21 print(prediction[0])
22
23 explainer = shap.Explainer(pipe)
24 shap_values = explainer([text])
25
26 shap.plots.text(shap_values)
27
28score_and_visualize(example_text)
29from transformers import AutoModelForSequenceClassification, AutoTokenizer, ZeroShotClassificationPipeline
30
31model = AutoModelForSequenceClassification.from_pretrained('facebook/bart-large-mnli')
32tokenizer = AutoTokenizer.from_pretrained('facebook/bart-large-mnli')
33
34classifier = ZeroShotClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=True)
35
36example_text = "This is an example text about snowflakes in the summer"
37labels = ["weather", "sports"]
38
39output = classifier(example_text, labels)
40output
41{'sequence': 'This is an example text about snowflakes in the summer',
42'labels': ['weather', 'sports'],
43'scores': [0.9780895709991455, 0.021910419687628746]}
44
ANSWER
Answered 2021-Oct-22 at 21:51The ZeroShotClassificationPipeline is currently not supported by shap, but you can use a workaround. The workaround is required because:
- The shap Explainer forwards only one parameter to the model (a pipeline in this case), but the ZeroShotClassificationPipeline requires two parameters, namely text, and labels.
- The shap Explainer will access the config of your model and use its
label2id
andid2label
properties. They do not match the labels returned from the ZeroShotClassificationPipeline and will result in an error.
Below is a suggestion for one possible workaround. I recommend opening an issue at shap and requesting official support for huggingface's ZeroShotClassificationPipeline.
1from transformers import pipeline
2classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
3
4example_text = "This is an example text about snowflakes in the summer"
5labels = ["weather", "sports", "computer industry"]
6
7output = classifier(example_text, labels, multi_label=True)
8output
9{'sequence': 'This is an example text about snowflakes in the summer',
10'labels': ['weather', 'sports'],
11'scores': [0.9780895709991455, 0.021910419687628746]}
12from transformers import AutoModelForSequenceClassification, AutoTokenizer, ZeroShotClassificationPipeline
13
14model = AutoModelForSequenceClassification.from_pretrained('facebook/bart-large-mnli')
15tokenizer = AutoTokenizer.from_pretrained('facebook/bart-large-mnli')
16
17pipe = ZeroShotClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=True)
18
19def score_and_visualize(text):
20 prediction = pipe([text])
21 print(prediction[0])
22
23 explainer = shap.Explainer(pipe)
24 shap_values = explainer([text])
25
26 shap.plots.text(shap_values)
27
28score_and_visualize(example_text)
29from transformers import AutoModelForSequenceClassification, AutoTokenizer, ZeroShotClassificationPipeline
30
31model = AutoModelForSequenceClassification.from_pretrained('facebook/bart-large-mnli')
32tokenizer = AutoTokenizer.from_pretrained('facebook/bart-large-mnli')
33
34classifier = ZeroShotClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=True)
35
36example_text = "This is an example text about snowflakes in the summer"
37labels = ["weather", "sports"]
38
39output = classifier(example_text, labels)
40output
41{'sequence': 'This is an example text about snowflakes in the summer',
42'labels': ['weather', 'sports'],
43'scores': [0.9780895709991455, 0.021910419687628746]}
44import shap
45from transformers import AutoModelForSequenceClassification, AutoTokenizer, ZeroShotClassificationPipeline
46from typing import Union, List
47
48weights = "valhalla/distilbart-mnli-12-3"
49
50model = AutoModelForSequenceClassification.from_pretrained(weights)
51tokenizer = AutoTokenizer.from_pretrained(weights)
52
53# Create your own pipeline that only requires the text parameter
54# for the __call__ method and provides a method to set the labels
55class MyZeroShotClassificationPipeline(ZeroShotClassificationPipeline):
56 # Overwrite the __call__ method
57 def __call__(self, *args):
58 o = super().__call__(args[0], self.workaround_labels)[0]
59
60 return [[{"label":x[0], "score": x[1]} for x in zip(o["labels"], o["scores"])]]
61
62 def set_labels_workaround(self, labels: Union[str,List[str]]):
63 self.workaround_labels = labels
64
65example_text = "This is an example text about snowflakes in the summer"
66labels = ["weather","sports"]
67
68# In the following, we address issue 2.
69model.config.label2id.update({v:k for k,v in enumerate(labels)})
70model.config.id2label.update({k:v for k,v in enumerate(labels)})
71
72pipe = MyZeroShotClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=True)
73pipe.set_labels_workaround(labels)
74
75def score_and_visualize(text):
76 prediction = pipe([text])
77 print(prediction[0])
78
79 explainer = shap.Explainer(pipe)
80 shap_values = explainer([text])
81
82 shap.plots.text(shap_values)
83
84
85score_and_visualize(example_text)
86
Community Discussions contain sources that include Stack Exchange Network
Tutorials and Learning Resources in Transformer
Tutorials and Learning Resources are not available at this moment for Transformer