By continuing you indicate that you have read and agree to our Terms of service and Privacy policy
By continuing you indicate that you have read and agree to our Terms of service and Privacy policy
By continuing you indicate that you have read and agree to our Terms of service and Privacy policy
By continuing you indicate that you have read and agree to our Terms of service and Privacy policy
Popular Releases
Popular Libraries
New Libraries
Top Authors
Trending Kits
Trending Discussions
Learning
v4.18.0: Checkpoint sharding, vision models | |
0.33.2 | |
v1.1.1 | |
v2.0.0 - Integration into Huggingface Model Hub | |
Python v0.12.1 |
transformers v4.18.0: Checkpoint sharding, vision models |
vit-pytorch 0.33.2 |
gpt-neo v1.1.1 |
sentence-transformers v2.0.0 - Integration into Huggingface Model Hub |
tokenizers Python v0.12.1 |
No Trending Kits are available at this moment for Transformer
QUESTION
Cannot read properties of undefined (reading 'transformFile') at Bundler.transformFile
Asked 2022-Mar-29 at 12:36I have updated node
today and I'm getting this error:
1error: TypeError: Cannot read properties of undefined (reading 'transformFile')
2 at Bundler.transformFile (/Users/.../node_modules/metro/src/Bundler.js:48:30)
3 at runMicrotasks (<anonymous>)
4 at processTicksAndRejections (node:internal/process/task_queues:96:5)
5 at async Object.transform (/Users/.../node_modules/metro/src/lib/transformHelpers.js:101:12)
6 at async processModule (/Users/.../node_modules/metro/src/DeltaBundler/traverseDependencies.js:137:18)
7 at async traverseDependenciesForSingleFile (/Users/.../node_modules/metro/src/DeltaBundler/traverseDependencies.js:131:3)
8 at async Promise.all (index 0)
9 at async initialTraverseDependencies (/Users/.../node_modules/metro/src/DeltaBundler/traverseDependencies.js:114:3)
10 at async DeltaCalculator._getChangedDependencies (/Users/.../node_modules/metro/src/DeltaBundler/DeltaCalculator.js:164:25)
11 at async DeltaCalculator.getDelta (/Users/.../node_modules/metro/src/DeltaBundler/DeltaCalculator.js:94:16)
12
Other than that I haven't done anything unusual, so I'm not sure what to share. If I'm missing any info please comment and I'll add it.
While building the terminal also throws this error:
1error: TypeError: Cannot read properties of undefined (reading 'transformFile')
2 at Bundler.transformFile (/Users/.../node_modules/metro/src/Bundler.js:48:30)
3 at runMicrotasks (<anonymous>)
4 at processTicksAndRejections (node:internal/process/task_queues:96:5)
5 at async Object.transform (/Users/.../node_modules/metro/src/lib/transformHelpers.js:101:12)
6 at async processModule (/Users/.../node_modules/metro/src/DeltaBundler/traverseDependencies.js:137:18)
7 at async traverseDependenciesForSingleFile (/Users/.../node_modules/metro/src/DeltaBundler/traverseDependencies.js:131:3)
8 at async Promise.all (index 0)
9 at async initialTraverseDependencies (/Users/.../node_modules/metro/src/DeltaBundler/traverseDependencies.js:114:3)
10 at async DeltaCalculator._getChangedDependencies (/Users/.../node_modules/metro/src/DeltaBundler/DeltaCalculator.js:164:25)
11 at async DeltaCalculator.getDelta (/Users/.../node_modules/metro/src/DeltaBundler/DeltaCalculator.js:94:16)
12Failed to construct transformer: Error: error:0308010C:digital envelope routines::unsupported
13 at new Hash (node:internal/crypto/hash:67:19)
14 at Object.createHash (node:crypto:130:10)
15 at stableHash (/Users/.../node_modules/metro-cache/src/stableHash.js:19:8)
16 at Object.getCacheKey (/Users/.../node_modules/metro-transform-worker/src/index.js:593:7)
17 at getTransformCacheKey (/Users/.../node_modules/metro/src/DeltaBundler/getTransformCacheKey.js:24:19)
18 at new Transformer (/Users/.../node_modules/metro/src/DeltaBundler/Transformer.js:48:9)
19 at /Users/.../node_modules/metro/src/Bundler.js:22:29
20 at processTicksAndRejections (node:internal/process/task_queues:96:5) {
21 opensslErrorStack: [ 'error:03000086:digital envelope routines::initialization error' ],
22 library: 'digital envelope routines',
23 reason: 'unsupported',
24 code: 'ERR_OSSL_EVP_UNSUPPORTED'
25}
26
My node, npx and react-native versions are:
ANSWER
Answered 2021-Oct-27 at 17:19Ran into the same issue with Node.js 17.0.0. To solve it, I downgraded to version 14.18.1, deleted node_modules
and reinstalled.
QUESTION
What is this GHC feature called? `forall` in type definitions
Asked 2022-Feb-01 at 19:28I learned that you can redefine ContT
from transformers such that the r
type parameter is made implicit (and may be specified explicitly using TypeApplications
), viz.:
1-- | Same as `ContT` but with the `r` made implicit
2type ContT ::
3 forall (r :: Type).
4 (Type -> Type) ->
5 Type ->
6 Type
7data ContT m a where
8 ContT ::
9 forall r m a.
10 {runContT :: (a -> m r) -> m r} ->
11 ContT @r m a
12
13type ContVoid :: (Type -> Type) -> Type -> Type
14type ContVoid = ContT @()
15
I hadn't realized this was possible in GHC. What is the larger feature called to refer to this way of defining a family of types with implicit type parameters, that is specified using forall
in type definition (referring, in the example above, to the outer forall
- rather than the inner forall
which simply unifies the r
)?
ANSWER
Answered 2022-Feb-01 at 19:28Nobody uses this (invisible dependent quantification) for this purpose (where the dependency is not used) but it is the same as giving a Type -> ..
parameter, implicitly.
1-- | Same as `ContT` but with the `r` made implicit
2type ContT ::
3 forall (r :: Type).
4 (Type -> Type) ->
5 Type ->
6 Type
7data ContT m a where
8 ContT ::
9 forall r m a.
10 {runContT :: (a -> m r) -> m r} ->
11 ContT @r m a
12
13type ContVoid :: (Type -> Type) -> Type -> Type
14type ContVoid = ContT @()
15type EITHER :: forall (a :: Type) (b :: Type). Type
16data EITHER where
17 LEFT :: a -> EITHER @a @b
18 RIGHT :: b -> EITHER @a @b
19
20eITHER :: (a -> res) -> (b -> res) -> (EITHER @a @b -> res)
21eITHER left right = \case
22 LEFT a -> left a
23 RIGHT b -> right b
24
You can also use "visible dependent quantification" where forall->
is the visible counterpart to forall.
, so forall (a :: Type) -> ..
is properly like Type -> ..
where a does not appear in ..:
1-- | Same as `ContT` but with the `r` made implicit
2type ContT ::
3 forall (r :: Type).
4 (Type -> Type) ->
5 Type ->
6 Type
7data ContT m a where
8 ContT ::
9 forall r m a.
10 {runContT :: (a -> m r) -> m r} ->
11 ContT @r m a
12
13type ContVoid :: (Type -> Type) -> Type -> Type
14type ContVoid = ContT @()
15type EITHER :: forall (a :: Type) (b :: Type). Type
16data EITHER where
17 LEFT :: a -> EITHER @a @b
18 RIGHT :: b -> EITHER @a @b
19
20eITHER :: (a -> res) -> (b -> res) -> (EITHER @a @b -> res)
21eITHER left right = \case
22 LEFT a -> left a
23 RIGHT b -> right b
24type EITHER :: forall (a :: Type) -> forall (b :: Type) -> Type
25data EITHER a b where
26 LEFT :: a -> EITHER a b
27 RIGHT :: b -> EITHER a b
28
29eITHER :: (a -> res) -> (b -> res) -> (EITHER a b -> res)
30eITHER left right = \case
31 LEFT a -> left a
32 RIGHT b -> right b
33
QUESTION
Why Reader implemented based ReaderT?
Asked 2022-Jan-11 at 17:11I found that Reader
is implemented based on ReaderT
using Identity
. Why don't make Reader
first and then make ReaderT
? Is there specific reason to implement that way?
ANSWER
Answered 2022-Jan-11 at 17:11They are the same data type to share as much code as possible between Reader
and ReaderT
. As it stands, only runReader
, mapReader
, and withReader
have any special cases. And withReader
doesn't have any unique code, it's just a type specialization, so only two functions actually do anything special for Reader
as opposed to ReaderT
.
You might look at the module exports and think that isn't buying much, but it actually is. There are a lot of instances defined for ReaderT
that Reader
automatically has as well, because it's the same type. So it's actually a fair bit less code to have only one underlying type for the two.
Given that, your question boils down to asking why Reader
is implemented on top of ReaderT
, and not the other way around. And for that, well, it's just the only way that works.
Let's try to go the other direction and see what goes wrong.
1newtype Reader r a = Reader (r -> a)
2type ReaderT r m a = Reader r (m a)
3
Yep, ok. Inline the alias and strip out the newtype wrapping and ReaderT r m a
is equivalent to r -> m a
, as it should be. Now let's move forward to the Functor
instance:
1newtype Reader r a = Reader (r -> a)
2type ReaderT r m a = Reader r (m a)
3instance Functor (Reader r) where
4 fmap f (Reader g) = Reader (f . g)
5
Yep, it's the only possible instance for Functor
for that definition of Reader
. And since ReaderT
is the same underlying type, it also provides an instance of Functor
for ReaderT
. Except something has gone horribly wrong. If you fix the second argument and result types to be what you'd expect, fmap
specializes to the type (m a -> m b) -> ReaderT r m a -> ReaderT r m b
. That's not right at all. fmap
's first argument should have the type (a -> b)
. That m
on both sides is definitely not supposed to be there.
But it's just what happens when you try to implement ReaderT
in terms of Reader
, instead of the other way around. In order to share code for Functor
(and a lot more) between the two types, the last type variable in each has to be the same thing in the underlying type. And that's just not possible when basing ReaderT
on Reader
. It has to introduce an extra type variable, and the only way to do it while getting the right result from doing all the substitutions is by making the a
in Reader r a
refer to something different than the a
in ReaderT r m a
. And that turns out to be incompatible with sharing higher-kinded instances like Functor
between the two types.
Amusingly, you sort of picked the best possible case with Reader
in that it's possible to get the types to line up right at all. Things fail a lot faster if you try to base StateT
on State
, for instance. There's no way to even write a type alias that will add the m
parameter and expand to the right thing for that pair. Reader
requires you to explore further before things break down.
QUESTION
How to get all properties of type alias into an array?
Asked 2022-Jan-08 at 08:25Given this type alias:
1export type RequestObject = {
2 user_id: number,
3 address: string,
4 user_type: number,
5 points: number,
6};
7
I want an array of all its properties, e.g:
1export type RequestObject = {
2 user_id: number,
3 address: string,
4 user_type: number,
5 points: number,
6};
7['user_id','address','user_type','points']
8
Is there any option to get this? I have googled but I can get it only for interface using following package
ANSWER
Answered 2022-Jan-08 at 08:22Typescript types only exist at compile time. They do not exist in the compiled javascript. Thus you cannot populate an array (a runtime entity) with compile-time data (such as the RequestObject
type alias), unless you do something complicated like the library you found.
RequestObject
.1export type RequestObject = {
2 user_id: number,
3 address: string,
4 user_type: number,
5 points: number,
6};
7['user_id','address','user_type','points']
8import { keys } from 'ts-transformer-keys';
9
10export type RequestObject = {
11 user_id: number,
12 address: string,
13 user_type: number,
14 points: number,
15}
16
17interface IRequestObject extends RequestObject {}
18
19const keysOfProps = keys<IRequestObject>();
20
21console.log(keysOfProps); // ['user_id', 'address', 'user_type', 'points']
22
QUESTION
Netlify says, "error Gatsby requires Node.js 14.15.0 or higher (you have v12.18.0)"—yet I have the newest Node version?
Asked 2022-Jan-08 at 07:21After migrating from Remark to MDX, my builds on Netlify are failing.
I get this error when trying to build:
110:13:28 AM: $ npm run build
210:13:29 AM: > blog-gatsby@0.1.0 build /opt/build/repo
310:13:29 AM: > gatsby build
410:13:30 AM: error Gatsby requires Node.js 14.15.0 or higher (you have v12.18.0).
510:13:30 AM: Upgrade Node to the latest stable release: https://gatsby.dev/upgrading-node-js
6
Yet when I run node -v
in my terminal, it says v17.2.0.
I assume it's not a coincidence that this happened after migrating. Can the problem be because of my node-modules folder? Or is there something in my gatsby-config.js or package.json files I need to change?
My package.json file:
110:13:28 AM: $ npm run build
210:13:29 AM: > blog-gatsby@0.1.0 build /opt/build/repo
310:13:29 AM: > gatsby build
410:13:30 AM: error Gatsby requires Node.js 14.15.0 or higher (you have v12.18.0).
510:13:30 AM: Upgrade Node to the latest stable release: https://gatsby.dev/upgrading-node-js
6{
7 "name": "blog-gatsby",
8 "private": true,
9 "description": "A starter for a blog powered by Gatsby and Markdown",
10 "version": "0.1.0",
11 "author": "Magnus Kolstad <kolstadmagnus@gmail.com>",
12 "bugs": {
13 "url": "https://kolstadmagnus.no"
14 },
15 "dependencies": {
16 "@mdx-js/mdx": "^1.6.22",
17 "@mdx-js/react": "^1.6.22",
18 "gatsby": "^4.3.0",
19 "gatsby-plugin-feed": "^4.3.0",
20 "gatsby-plugin-gatsby-cloud": "^4.3.0",
21 "gatsby-plugin-google-analytics": "^4.3.0",
22 "gatsby-plugin-image": "^2.3.0",
23 "gatsby-plugin-manifest": "^4.3.0",
24 "gatsby-plugin-mdx": "^3.4.0",
25 "gatsby-plugin-offline": "^5.3.0",
26 "gatsby-plugin-react-helmet": "^5.3.0",
27 "gatsby-plugin-sharp": "^4.3.0",
28 "gatsby-remark-copy-linked-files": "^5.3.0",
29 "gatsby-remark-images": "^6.3.0",
30 "gatsby-remark-prismjs": "^6.3.0",
31 "gatsby-remark-responsive-iframe": "^5.3.0",
32 "gatsby-remark-smartypants": "^5.3.0",
33 "gatsby-source-filesystem": "^4.3.0",
34 "gatsby-transformer-sharp": "^4.3.0",
35 "prismjs": "^1.25.0",
36 "react": "^17.0.1",
37 "react-dom": "^17.0.1",
38 "react-helmet": "^6.1.0",
39 "typeface-merriweather": "0.0.72",
40 "typeface-montserrat": "0.0.75"
41 },
42 "devDependencies": {
43 "prettier": "^2.4.1"
44 },
45 "homepage": "https://kolstadmagnus.no",
46 "keywords": [
47 "blog"
48 ],
49 "license": "0BSD",
50 "main": "n/a",
51 "repository": {
52 "type": "git",
53 "url": "git+https://github.com/gatsbyjs/gatsby-starter-blog.git"
54 },
55 "scripts": {
56 "build": "gatsby build",
57 "develop": "gatsby develop",
58 "format": "prettier --write \"**/*.{js,jsx,ts,tsx,json,md}\"",
59 "start": "gatsby develop",
60 "serve": "gatsby serve",
61 "clean": "gatsby clean",
62 "test": "echo \"Write tests! -> https://gatsby.dev/unit-testing\" && exit 1"
63 }
64}
65
What am I doing wrong here?
110:13:28 AM: $ npm run build
210:13:29 AM: > blog-gatsby@0.1.0 build /opt/build/repo
310:13:29 AM: > gatsby build
410:13:30 AM: error Gatsby requires Node.js 14.15.0 or higher (you have v12.18.0).
510:13:30 AM: Upgrade Node to the latest stable release: https://gatsby.dev/upgrading-node-js
6{
7 "name": "blog-gatsby",
8 "private": true,
9 "description": "A starter for a blog powered by Gatsby and Markdown",
10 "version": "0.1.0",
11 "author": "Magnus Kolstad <kolstadmagnus@gmail.com>",
12 "bugs": {
13 "url": "https://kolstadmagnus.no"
14 },
15 "dependencies": {
16 "@mdx-js/mdx": "^1.6.22",
17 "@mdx-js/react": "^1.6.22",
18 "gatsby": "^4.3.0",
19 "gatsby-plugin-feed": "^4.3.0",
20 "gatsby-plugin-gatsby-cloud": "^4.3.0",
21 "gatsby-plugin-google-analytics": "^4.3.0",
22 "gatsby-plugin-image": "^2.3.0",
23 "gatsby-plugin-manifest": "^4.3.0",
24 "gatsby-plugin-mdx": "^3.4.0",
25 "gatsby-plugin-offline": "^5.3.0",
26 "gatsby-plugin-react-helmet": "^5.3.0",
27 "gatsby-plugin-sharp": "^4.3.0",
28 "gatsby-remark-copy-linked-files": "^5.3.0",
29 "gatsby-remark-images": "^6.3.0",
30 "gatsby-remark-prismjs": "^6.3.0",
31 "gatsby-remark-responsive-iframe": "^5.3.0",
32 "gatsby-remark-smartypants": "^5.3.0",
33 "gatsby-source-filesystem": "^4.3.0",
34 "gatsby-transformer-sharp": "^4.3.0",
35 "prismjs": "^1.25.0",
36 "react": "^17.0.1",
37 "react-dom": "^17.0.1",
38 "react-helmet": "^6.1.0",
39 "typeface-merriweather": "0.0.72",
40 "typeface-montserrat": "0.0.75"
41 },
42 "devDependencies": {
43 "prettier": "^2.4.1"
44 },
45 "homepage": "https://kolstadmagnus.no",
46 "keywords": [
47 "blog"
48 ],
49 "license": "0BSD",
50 "main": "n/a",
51 "repository": {
52 "type": "git",
53 "url": "git+https://github.com/gatsbyjs/gatsby-starter-blog.git"
54 },
55 "scripts": {
56 "build": "gatsby build",
57 "develop": "gatsby develop",
58 "format": "prettier --write \"**/*.{js,jsx,ts,tsx,json,md}\"",
59 "start": "gatsby develop",
60 "serve": "gatsby serve",
61 "clean": "gatsby clean",
62 "test": "echo \"Write tests! -> https://gatsby.dev/unit-testing\" && exit 1"
63 }
64}
657:11:59 PM: failed Building production JavaScript and CSS bundles - 20.650s
667:11:59 PM: error Generating JavaScript bundles failed
677:11:59 PM: Module build failed (from ./node_modules/url-loader/dist/cjs.js):
687:11:59 PM: Error: error:0308010C:digital envelope routines::unsupported
697:11:59 PM: at new Hash (node:internal/crypto/hash:67:19)
707:11:59 PM: at Object.createHash (node:crypto:130:10)
717:11:59 PM: at getHashDigest (/opt/build/repo/node_modules/file-loader/node_modules/loader-utils/lib/getHashDigest.js:46:34)
727:11:59 PM: at /opt/build/repo/node_modules/file-loader/node_modules/loader-utils/lib/interpolateName.js:113:11
737:11:59 PM: at String.replace (<anonymous>)
747:11:59 PM: at interpolateName (/opt/build/repo/node_modules/file-loader/node_modules/loader-utils/lib/interpolateName.js:110:8)
757:11:59 PM: at Object.loader (/opt/build/repo/node_modules/file-loader/dist/index.js:29:48)
767:11:59 PM: at Object.loader (/opt/build/repo/node_modules/url-loader/dist/index.js:127:19)
777:11:59 PM: ​
787:11:59 PM: ────────────────────────────────────────────────────────────────
797:11:59 PM: "build.command" failed
807:11:59 PM: ────────────────────────────────────────────────────────────────
817:11:59 PM: ​
827:11:59 PM: Error message
837:11:59 PM: Command failed with exit code 1: npm run build
847:11:59 PM: ​
857:11:59 PM: Error location
867:11:59 PM: In Build command from Netlify app:
877:11:59 PM: npm run build
887:11:59 PM: ​
897:11:59 PM: Resolved config
907:11:59 PM: build:
917:11:59 PM: command: npm run build
927:11:59 PM: commandOrigin: ui
937:11:59 PM: publish: /opt/build/repo/public
947:11:59 PM: publishOrigin: ui
957:11:59 PM: plugins:
967:11:59 PM: - inputs: {}
977:11:59 PM: origin: ui
987:11:59 PM: package: '@netlify/plugin-gatsby'
997:11:59 PM: redirects:
1007:12:00 PM: - from: /api/*
101 status: 200
102 to: /.netlify/functions/gatsby
103 - force: true
104 from: https://magnuskolstad.com
105 status: 301
106 to: https://kolstadmagnus.no
107 redirectsOrigin: config
108Caching artifacts
109
ANSWER
Answered 2022-Jan-08 at 07:21The problem is that you have Node 17.2.0. locally but in Netlify's environment, you are running a lower version (by default it's not set as 17.2.0). So the local environment is OK, Netlify environment is KO because of this mismatch of Node versions.
When Netlify deploys your site it installs and builds again your site so you should ensure that both environments work under the same conditions. Otherwise, both node_modules
will differ so your application will have different behavior or eventually won't even build because of dependency errors.
You can easily play with the Node version in multiple ways but I'd recommend using the .nvmrc
file. Just run the following command in the root of your project:
110:13:28 AM: $ npm run build
210:13:29 AM: > blog-gatsby@0.1.0 build /opt/build/repo
310:13:29 AM: > gatsby build
410:13:30 AM: error Gatsby requires Node.js 14.15.0 or higher (you have v12.18.0).
510:13:30 AM: Upgrade Node to the latest stable release: https://gatsby.dev/upgrading-node-js
6{
7 "name": "blog-gatsby",
8 "private": true,
9 "description": "A starter for a blog powered by Gatsby and Markdown",
10 "version": "0.1.0",
11 "author": "Magnus Kolstad <kolstadmagnus@gmail.com>",
12 "bugs": {
13 "url": "https://kolstadmagnus.no"
14 },
15 "dependencies": {
16 "@mdx-js/mdx": "^1.6.22",
17 "@mdx-js/react": "^1.6.22",
18 "gatsby": "^4.3.0",
19 "gatsby-plugin-feed": "^4.3.0",
20 "gatsby-plugin-gatsby-cloud": "^4.3.0",
21 "gatsby-plugin-google-analytics": "^4.3.0",
22 "gatsby-plugin-image": "^2.3.0",
23 "gatsby-plugin-manifest": "^4.3.0",
24 "gatsby-plugin-mdx": "^3.4.0",
25 "gatsby-plugin-offline": "^5.3.0",
26 "gatsby-plugin-react-helmet": "^5.3.0",
27 "gatsby-plugin-sharp": "^4.3.0",
28 "gatsby-remark-copy-linked-files": "^5.3.0",
29 "gatsby-remark-images": "^6.3.0",
30 "gatsby-remark-prismjs": "^6.3.0",
31 "gatsby-remark-responsive-iframe": "^5.3.0",
32 "gatsby-remark-smartypants": "^5.3.0",
33 "gatsby-source-filesystem": "^4.3.0",
34 "gatsby-transformer-sharp": "^4.3.0",
35 "prismjs": "^1.25.0",
36 "react": "^17.0.1",
37 "react-dom": "^17.0.1",
38 "react-helmet": "^6.1.0",
39 "typeface-merriweather": "0.0.72",
40 "typeface-montserrat": "0.0.75"
41 },
42 "devDependencies": {
43 "prettier": "^2.4.1"
44 },
45 "homepage": "https://kolstadmagnus.no",
46 "keywords": [
47 "blog"
48 ],
49 "license": "0BSD",
50 "main": "n/a",
51 "repository": {
52 "type": "git",
53 "url": "git+https://github.com/gatsbyjs/gatsby-starter-blog.git"
54 },
55 "scripts": {
56 "build": "gatsby build",
57 "develop": "gatsby develop",
58 "format": "prettier --write \"**/*.{js,jsx,ts,tsx,json,md}\"",
59 "start": "gatsby develop",
60 "serve": "gatsby serve",
61 "clean": "gatsby clean",
62 "test": "echo \"Write tests! -> https://gatsby.dev/unit-testing\" && exit 1"
63 }
64}
657:11:59 PM: failed Building production JavaScript and CSS bundles - 20.650s
667:11:59 PM: error Generating JavaScript bundles failed
677:11:59 PM: Module build failed (from ./node_modules/url-loader/dist/cjs.js):
687:11:59 PM: Error: error:0308010C:digital envelope routines::unsupported
697:11:59 PM: at new Hash (node:internal/crypto/hash:67:19)
707:11:59 PM: at Object.createHash (node:crypto:130:10)
717:11:59 PM: at getHashDigest (/opt/build/repo/node_modules/file-loader/node_modules/loader-utils/lib/getHashDigest.js:46:34)
727:11:59 PM: at /opt/build/repo/node_modules/file-loader/node_modules/loader-utils/lib/interpolateName.js:113:11
737:11:59 PM: at String.replace (<anonymous>)
747:11:59 PM: at interpolateName (/opt/build/repo/node_modules/file-loader/node_modules/loader-utils/lib/interpolateName.js:110:8)
757:11:59 PM: at Object.loader (/opt/build/repo/node_modules/file-loader/dist/index.js:29:48)
767:11:59 PM: at Object.loader (/opt/build/repo/node_modules/url-loader/dist/index.js:127:19)
777:11:59 PM: ​
787:11:59 PM: ────────────────────────────────────────────────────────────────
797:11:59 PM: "build.command" failed
807:11:59 PM: ────────────────────────────────────────────────────────────────
817:11:59 PM: ​
827:11:59 PM: Error message
837:11:59 PM: Command failed with exit code 1: npm run build
847:11:59 PM: ​
857:11:59 PM: Error location
867:11:59 PM: In Build command from Netlify app:
877:11:59 PM: npm run build
887:11:59 PM: ​
897:11:59 PM: Resolved config
907:11:59 PM: build:
917:11:59 PM: command: npm run build
927:11:59 PM: commandOrigin: ui
937:11:59 PM: publish: /opt/build/repo/public
947:11:59 PM: publishOrigin: ui
957:11:59 PM: plugins:
967:11:59 PM: - inputs: {}
977:11:59 PM: origin: ui
987:11:59 PM: package: '@netlify/plugin-gatsby'
997:11:59 PM: redirects:
1007:12:00 PM: - from: /api/*
101 status: 200
102 to: /.netlify/functions/gatsby
103 - force: true
104 from: https://magnuskolstad.com
105 status: 301
106 to: https://kolstadmagnus.no
107 redirectsOrigin: config
108Caching artifacts
109node -v > .nvmrc
110
This should create a .nvmrc
file containing the Node version (node -v
) in it. When Netlify finds this file during the build process, it uses it as a base Node version so it installs all the dependencies accordingly.
The file is also useful to tell other contributors which Node version are you using.
QUESTION
Determine whether the Columns of a Dataset are invariant under any given Scikit-Learn Transformer
Asked 2021-Dec-19 at 08:42Given an sklearn tranformer t
, is there a way to determine whether t
changes columns/column order of any given input dataset X
, without applying it to the data?
For example with t = sklearn.preprocessing.StandardScaler
there is a 1-to-1 mapping between the columns of X
and t.transform(X)
, namely X[:, i] -> t.transform(X)[:, i]
, whereas this is obviously not the case for sklearn.decomposition.PCA
.
A corollary of that would be: Can we know, how the columns of the input will change by applying t
, e.g. which columns an already fitted sklearn.feature_selection.SelectKBest
chooses.
I am not looking for solutions to specific transformers, but a solution applicable to all or at least a wide selection of transformers.
Feel free to implement your own Pipeline class or wrapper if necessary.
ANSWER
Answered 2021-Nov-23 at 15:01I found a partial answer. Both StandardScaler
and SelectKBest
have .get_feature_names_out
methods. I did not find the time to investigate further.
1from numpy.random import RandomState
2import numpy as np
3import pandas as pd
4
5from sklearn.preprocessing import StandardScaler
6from sklearn.feature_selection import SelectKBest
7
8from sklearn.linear_model import LassoCV
9
10
11rng = RandomState()
12
13# Make some data
14slopes = np.array([-1., 1., .1])
15X = pd.DataFrame(
16 data = np.linspace(-1,1,500)[:, np.newaxis] + rng.random((500, 3)),
17 columns=["foo", "bar", "baz"]
18)
19y = pd.Series(data=np.linspace(-1,1, 500) + rng.rand((500)))
20
21# Test Transformers
22scaler = StandardScaler().fit(X)
23selector = SelectKBest(k=2).fit(X, y)
24
25print(scaler.get_feature_names_out())
26print(selector.get_feature_names_out())
27
QUESTION
ValueError after attempting to use OneHotEncoder and then normalize values with make_column_transformer
Asked 2021-Dec-09 at 20:59So I was trying to convert my data's timestamps from Unix timestamps to a more readable date format. I created a simple Java program to do so and write to a .csv file, and that went smoothly. I tried using it for my model by one-hot encoding it into numbers and then turning everything into normalized data. However, after my attempt to one-hot encode (which I am not sure if it even worked), my normalization process using make_column_transformer failed.
1# model 4
2# next model
3import tensorflow as tf
4import matplotlib.pyplot as plt
5import pandas as pd
6import numpy as np
7from tensorflow.keras import layers
8from sklearn.compose import make_column_transformer
9from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
10from sklearn.model_selection import train_test_split
11
12np.set_printoptions(precision=3, suppress=True)
13btc_data = pd.read_csv(
14 "/content/drive/MyDrive/Science Fair/output2.csv",
15 names=["Time", "Open"])
16
17X_btc = btc_data[["Time"]]
18y_btc = btc_data["Open"]
19
20enc = OneHotEncoder(handle_unknown="ignore")
21enc.fit(X_btc)
22
23X_btc = enc.transform(X_btc)
24
25print(X_btc)
26
27X_train, X_test, y_train, y_test = train_test_split(X_btc, y_btc, test_size=0.2, random_state=62)
28
29ct = make_column_transformer(
30 (MinMaxScaler(), ["Time"])
31)
32
33ct.fit(X_train)
34X_train_normal = ct.transform(X_train)
35X_test_normal = ct.transform(X_test)
36
37callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=3)
38
39btc_model_4 = tf.keras.Sequential([
40 layers.Dense(100, activation="relu"),
41 layers.Dense(100, activation="relu"),
42 layers.Dense(100, activation="relu"),
43 layers.Dense(100, activation="relu"),
44 layers.Dense(100, activation="relu"),
45 layers.Dense(100, activation="relu"),
46 layers.Dense(1, activation="linear")
47])
48
49btc_model_4.compile(loss = tf.losses.MeanSquaredError(),
50 optimizer = tf.optimizers.Adam())
51
52history = btc_model_4.fit(X_train_normal, y_train, batch_size=8192, epochs=100, callbacks=[callback])
53
54btc_model_4.evaluate(X_test_normal, y_test, batch_size=8192)
55
56y_pred = btc_model_4.predict(X_test_normal)
57
58btc_model_4.save("btc_model_4")
59btc_model_4.save("btc_model_4.h5")
60
61# plot model
62def plot_evaluations(train_data=X_train_normal,
63 train_labels=y_train,
64 test_data=X_test_normal,
65 test_labels=y_test,
66 predictions=y_pred):
67 print(test_data.shape)
68 print(predictions.shape)
69
70 plt.figure(figsize=(100, 15))
71 plt.scatter(train_data, train_labels, c='b', label="Training")
72 plt.scatter(test_data, test_labels, c='g', label="Testing")
73 plt.scatter(test_data, predictions, c='r', label="Results")
74 plt.legend()
75
76plot_evaluations()
77
78# plot loss curve
79pd.DataFrame(history.history).plot()
80plt.ylabel("loss")
81plt.xlabel("epochs")
82
My normal data format is like so:
1# model 4
2# next model
3import tensorflow as tf
4import matplotlib.pyplot as plt
5import pandas as pd
6import numpy as np
7from tensorflow.keras import layers
8from sklearn.compose import make_column_transformer
9from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
10from sklearn.model_selection import train_test_split
11
12np.set_printoptions(precision=3, suppress=True)
13btc_data = pd.read_csv(
14 "/content/drive/MyDrive/Science Fair/output2.csv",
15 names=["Time", "Open"])
16
17X_btc = btc_data[["Time"]]
18y_btc = btc_data["Open"]
19
20enc = OneHotEncoder(handle_unknown="ignore")
21enc.fit(X_btc)
22
23X_btc = enc.transform(X_btc)
24
25print(X_btc)
26
27X_train, X_test, y_train, y_test = train_test_split(X_btc, y_btc, test_size=0.2, random_state=62)
28
29ct = make_column_transformer(
30 (MinMaxScaler(), ["Time"])
31)
32
33ct.fit(X_train)
34X_train_normal = ct.transform(X_train)
35X_test_normal = ct.transform(X_test)
36
37callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=3)
38
39btc_model_4 = tf.keras.Sequential([
40 layers.Dense(100, activation="relu"),
41 layers.Dense(100, activation="relu"),
42 layers.Dense(100, activation="relu"),
43 layers.Dense(100, activation="relu"),
44 layers.Dense(100, activation="relu"),
45 layers.Dense(100, activation="relu"),
46 layers.Dense(1, activation="linear")
47])
48
49btc_model_4.compile(loss = tf.losses.MeanSquaredError(),
50 optimizer = tf.optimizers.Adam())
51
52history = btc_model_4.fit(X_train_normal, y_train, batch_size=8192, epochs=100, callbacks=[callback])
53
54btc_model_4.evaluate(X_test_normal, y_test, batch_size=8192)
55
56y_pred = btc_model_4.predict(X_test_normal)
57
58btc_model_4.save("btc_model_4")
59btc_model_4.save("btc_model_4.h5")
60
61# plot model
62def plot_evaluations(train_data=X_train_normal,
63 train_labels=y_train,
64 test_data=X_test_normal,
65 test_labels=y_test,
66 predictions=y_pred):
67 print(test_data.shape)
68 print(predictions.shape)
69
70 plt.figure(figsize=(100, 15))
71 plt.scatter(train_data, train_labels, c='b', label="Training")
72 plt.scatter(test_data, test_labels, c='g', label="Testing")
73 plt.scatter(test_data, predictions, c='r', label="Results")
74 plt.legend()
75
76plot_evaluations()
77
78# plot loss curve
79pd.DataFrame(history.history).plot()
80plt.ylabel("loss")
81plt.xlabel("epochs")
822015-12-05 12:52:00,377.48
832015-12-05 12:53:00,377.5
842015-12-05 12:54:00,377.5
852015-12-05 12:56:00,377.5
862015-12-05 12:57:00,377.5
872015-12-05 12:58:00,377.5
882015-12-05 12:59:00,377.5
892015-12-05 13:00:00,377.5
902015-12-05 13:01:00,377.79
912015-12-05 13:02:00,377.5
922015-12-05 13:03:00,377.79
932015-12-05 13:05:00,377.74
942015-12-05 13:06:00,377.79
952015-12-05 13:07:00,377.64
962015-12-05 13:08:00,377.79
972015-12-05 13:10:00,377.77
982015-12-05 13:11:00,377.7
992015-12-05 13:12:00,377.77
1002015-12-05 13:13:00,377.77
1012015-12-05 13:14:00,377.79
1022015-12-05 13:15:00,377.72
1032015-12-05 13:16:00,377.5
1042015-12-05 13:17:00,377.49
1052015-12-05 13:18:00,377.5
1062015-12-05 13:19:00,377.5
1072015-12-05 13:20:00,377.8
1082015-12-05 13:21:00,377.84
1092015-12-05 13:22:00,378.29
1102015-12-05 13:23:00,378.3
1112015-12-05 13:24:00,378.3
1122015-12-05 13:25:00,378.33
1132015-12-05 13:26:00,378.33
1142015-12-05 13:28:00,378.31
1152015-12-05 13:29:00,378.68
116
The first is the date and the second value after the comma is the price of BTC at that time. Now after "one-hot encoding", I added a print statement to print the value of those X values, and that gave the following value:
1# model 4
2# next model
3import tensorflow as tf
4import matplotlib.pyplot as plt
5import pandas as pd
6import numpy as np
7from tensorflow.keras import layers
8from sklearn.compose import make_column_transformer
9from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
10from sklearn.model_selection import train_test_split
11
12np.set_printoptions(precision=3, suppress=True)
13btc_data = pd.read_csv(
14 "/content/drive/MyDrive/Science Fair/output2.csv",
15 names=["Time", "Open"])
16
17X_btc = btc_data[["Time"]]
18y_btc = btc_data["Open"]
19
20enc = OneHotEncoder(handle_unknown="ignore")
21enc.fit(X_btc)
22
23X_btc = enc.transform(X_btc)
24
25print(X_btc)
26
27X_train, X_test, y_train, y_test = train_test_split(X_btc, y_btc, test_size=0.2, random_state=62)
28
29ct = make_column_transformer(
30 (MinMaxScaler(), ["Time"])
31)
32
33ct.fit(X_train)
34X_train_normal = ct.transform(X_train)
35X_test_normal = ct.transform(X_test)
36
37callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=3)
38
39btc_model_4 = tf.keras.Sequential([
40 layers.Dense(100, activation="relu"),
41 layers.Dense(100, activation="relu"),
42 layers.Dense(100, activation="relu"),
43 layers.Dense(100, activation="relu"),
44 layers.Dense(100, activation="relu"),
45 layers.Dense(100, activation="relu"),
46 layers.Dense(1, activation="linear")
47])
48
49btc_model_4.compile(loss = tf.losses.MeanSquaredError(),
50 optimizer = tf.optimizers.Adam())
51
52history = btc_model_4.fit(X_train_normal, y_train, batch_size=8192, epochs=100, callbacks=[callback])
53
54btc_model_4.evaluate(X_test_normal, y_test, batch_size=8192)
55
56y_pred = btc_model_4.predict(X_test_normal)
57
58btc_model_4.save("btc_model_4")
59btc_model_4.save("btc_model_4.h5")
60
61# plot model
62def plot_evaluations(train_data=X_train_normal,
63 train_labels=y_train,
64 test_data=X_test_normal,
65 test_labels=y_test,
66 predictions=y_pred):
67 print(test_data.shape)
68 print(predictions.shape)
69
70 plt.figure(figsize=(100, 15))
71 plt.scatter(train_data, train_labels, c='b', label="Training")
72 plt.scatter(test_data, test_labels, c='g', label="Testing")
73 plt.scatter(test_data, predictions, c='r', label="Results")
74 plt.legend()
75
76plot_evaluations()
77
78# plot loss curve
79pd.DataFrame(history.history).plot()
80plt.ylabel("loss")
81plt.xlabel("epochs")
822015-12-05 12:52:00,377.48
832015-12-05 12:53:00,377.5
842015-12-05 12:54:00,377.5
852015-12-05 12:56:00,377.5
862015-12-05 12:57:00,377.5
872015-12-05 12:58:00,377.5
882015-12-05 12:59:00,377.5
892015-12-05 13:00:00,377.5
902015-12-05 13:01:00,377.79
912015-12-05 13:02:00,377.5
922015-12-05 13:03:00,377.79
932015-12-05 13:05:00,377.74
942015-12-05 13:06:00,377.79
952015-12-05 13:07:00,377.64
962015-12-05 13:08:00,377.79
972015-12-05 13:10:00,377.77
982015-12-05 13:11:00,377.7
992015-12-05 13:12:00,377.77
1002015-12-05 13:13:00,377.77
1012015-12-05 13:14:00,377.79
1022015-12-05 13:15:00,377.72
1032015-12-05 13:16:00,377.5
1042015-12-05 13:17:00,377.49
1052015-12-05 13:18:00,377.5
1062015-12-05 13:19:00,377.5
1072015-12-05 13:20:00,377.8
1082015-12-05 13:21:00,377.84
1092015-12-05 13:22:00,378.29
1102015-12-05 13:23:00,378.3
1112015-12-05 13:24:00,378.3
1122015-12-05 13:25:00,378.33
1132015-12-05 13:26:00,378.33
1142015-12-05 13:28:00,378.31
1152015-12-05 13:29:00,378.68
116 (0, 0) 1.0
117 (1, 1) 1.0
118 (2, 2) 1.0
119 (3, 3) 1.0
120 (4, 4) 1.0
121 (5, 5) 1.0
122 (6, 6) 1.0
123 (7, 7) 1.0
124 (8, 8) 1.0
125 (9, 9) 1.0
126 (10, 10) 1.0
127 (11, 11) 1.0
128 (12, 12) 1.0
129 (13, 13) 1.0
130 (14, 14) 1.0
131 (15, 15) 1.0
132 (16, 16) 1.0
133 (17, 17) 1.0
134 (18, 18) 1.0
135 (19, 19) 1.0
136 (20, 20) 1.0
137 (21, 21) 1.0
138 (22, 22) 1.0
139 (23, 23) 1.0
140 (24, 24) 1.0
141 : :
142 (2526096, 2526096) 1.0
143 (2526097, 2526097) 1.0
144 (2526098, 2526098) 1.0
145 (2526099, 2526099) 1.0
146 (2526100, 2526100) 1.0
147 (2526101, 2526101) 1.0
148 (2526102, 2526102) 1.0
149 (2526103, 2526103) 1.0
150 (2526104, 2526104) 1.0
151 (2526105, 2526105) 1.0
152 (2526106, 2526106) 1.0
153 (2526107, 2526107) 1.0
154 (2526108, 2526108) 1.0
155 (2526109, 2526109) 1.0
156 (2526110, 2526110) 1.0
157 (2526111, 2526111) 1.0
158 (2526112, 2526112) 1.0
159 (2526113, 2526113) 1.0
160 (2526114, 2526114) 1.0
161 (2526115, 2526115) 1.0
162 (2526116, 2526116) 1.0
163 (2526117, 2526117) 1.0
164 (2526118, 2526118) 1.0
165 (2526119, 2526119) 1.0
166 (2526120, 2526120) 1.0
167
Following fitting for normalization, I receive the following error:
1# model 4
2# next model
3import tensorflow as tf
4import matplotlib.pyplot as plt
5import pandas as pd
6import numpy as np
7from tensorflow.keras import layers
8from sklearn.compose import make_column_transformer
9from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
10from sklearn.model_selection import train_test_split
11
12np.set_printoptions(precision=3, suppress=True)
13btc_data = pd.read_csv(
14 "/content/drive/MyDrive/Science Fair/output2.csv",
15 names=["Time", "Open"])
16
17X_btc = btc_data[["Time"]]
18y_btc = btc_data["Open"]
19
20enc = OneHotEncoder(handle_unknown="ignore")
21enc.fit(X_btc)
22
23X_btc = enc.transform(X_btc)
24
25print(X_btc)
26
27X_train, X_test, y_train, y_test = train_test_split(X_btc, y_btc, test_size=0.2, random_state=62)
28
29ct = make_column_transformer(
30 (MinMaxScaler(), ["Time"])
31)
32
33ct.fit(X_train)
34X_train_normal = ct.transform(X_train)
35X_test_normal = ct.transform(X_test)
36
37callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=3)
38
39btc_model_4 = tf.keras.Sequential([
40 layers.Dense(100, activation="relu"),
41 layers.Dense(100, activation="relu"),
42 layers.Dense(100, activation="relu"),
43 layers.Dense(100, activation="relu"),
44 layers.Dense(100, activation="relu"),
45 layers.Dense(100, activation="relu"),
46 layers.Dense(1, activation="linear")
47])
48
49btc_model_4.compile(loss = tf.losses.MeanSquaredError(),
50 optimizer = tf.optimizers.Adam())
51
52history = btc_model_4.fit(X_train_normal, y_train, batch_size=8192, epochs=100, callbacks=[callback])
53
54btc_model_4.evaluate(X_test_normal, y_test, batch_size=8192)
55
56y_pred = btc_model_4.predict(X_test_normal)
57
58btc_model_4.save("btc_model_4")
59btc_model_4.save("btc_model_4.h5")
60
61# plot model
62def plot_evaluations(train_data=X_train_normal,
63 train_labels=y_train,
64 test_data=X_test_normal,
65 test_labels=y_test,
66 predictions=y_pred):
67 print(test_data.shape)
68 print(predictions.shape)
69
70 plt.figure(figsize=(100, 15))
71 plt.scatter(train_data, train_labels, c='b', label="Training")
72 plt.scatter(test_data, test_labels, c='g', label="Testing")
73 plt.scatter(test_data, predictions, c='r', label="Results")
74 plt.legend()
75
76plot_evaluations()
77
78# plot loss curve
79pd.DataFrame(history.history).plot()
80plt.ylabel("loss")
81plt.xlabel("epochs")
822015-12-05 12:52:00,377.48
832015-12-05 12:53:00,377.5
842015-12-05 12:54:00,377.5
852015-12-05 12:56:00,377.5
862015-12-05 12:57:00,377.5
872015-12-05 12:58:00,377.5
882015-12-05 12:59:00,377.5
892015-12-05 13:00:00,377.5
902015-12-05 13:01:00,377.79
912015-12-05 13:02:00,377.5
922015-12-05 13:03:00,377.79
932015-12-05 13:05:00,377.74
942015-12-05 13:06:00,377.79
952015-12-05 13:07:00,377.64
962015-12-05 13:08:00,377.79
972015-12-05 13:10:00,377.77
982015-12-05 13:11:00,377.7
992015-12-05 13:12:00,377.77
1002015-12-05 13:13:00,377.77
1012015-12-05 13:14:00,377.79
1022015-12-05 13:15:00,377.72
1032015-12-05 13:16:00,377.5
1042015-12-05 13:17:00,377.49
1052015-12-05 13:18:00,377.5
1062015-12-05 13:19:00,377.5
1072015-12-05 13:20:00,377.8
1082015-12-05 13:21:00,377.84
1092015-12-05 13:22:00,378.29
1102015-12-05 13:23:00,378.3
1112015-12-05 13:24:00,378.3
1122015-12-05 13:25:00,378.33
1132015-12-05 13:26:00,378.33
1142015-12-05 13:28:00,378.31
1152015-12-05 13:29:00,378.68
116 (0, 0) 1.0
117 (1, 1) 1.0
118 (2, 2) 1.0
119 (3, 3) 1.0
120 (4, 4) 1.0
121 (5, 5) 1.0
122 (6, 6) 1.0
123 (7, 7) 1.0
124 (8, 8) 1.0
125 (9, 9) 1.0
126 (10, 10) 1.0
127 (11, 11) 1.0
128 (12, 12) 1.0
129 (13, 13) 1.0
130 (14, 14) 1.0
131 (15, 15) 1.0
132 (16, 16) 1.0
133 (17, 17) 1.0
134 (18, 18) 1.0
135 (19, 19) 1.0
136 (20, 20) 1.0
137 (21, 21) 1.0
138 (22, 22) 1.0
139 (23, 23) 1.0
140 (24, 24) 1.0
141 : :
142 (2526096, 2526096) 1.0
143 (2526097, 2526097) 1.0
144 (2526098, 2526098) 1.0
145 (2526099, 2526099) 1.0
146 (2526100, 2526100) 1.0
147 (2526101, 2526101) 1.0
148 (2526102, 2526102) 1.0
149 (2526103, 2526103) 1.0
150 (2526104, 2526104) 1.0
151 (2526105, 2526105) 1.0
152 (2526106, 2526106) 1.0
153 (2526107, 2526107) 1.0
154 (2526108, 2526108) 1.0
155 (2526109, 2526109) 1.0
156 (2526110, 2526110) 1.0
157 (2526111, 2526111) 1.0
158 (2526112, 2526112) 1.0
159 (2526113, 2526113) 1.0
160 (2526114, 2526114) 1.0
161 (2526115, 2526115) 1.0
162 (2526116, 2526116) 1.0
163 (2526117, 2526117) 1.0
164 (2526118, 2526118) 1.0
165 (2526119, 2526119) 1.0
166 (2526120, 2526120) 1.0
167---------------------------------------------------------------------------
168AttributeError Traceback (most recent call last)
169/usr/local/lib/python3.7/dist-packages/sklearn/utils/__init__.py in _get_column_indices(X, key)
170 408 try:
171--> 409 all_columns = X.columns
172 410 except AttributeError:
173
1745 frames
175AttributeError: columns not found
176
177During handling of the above exception, another exception occurred:
178
179ValueError Traceback (most recent call last)
180/usr/local/lib/python3.7/dist-packages/sklearn/utils/__init__.py in _get_column_indices(X, key)
181 410 except AttributeError:
182 411 raise ValueError(
183--> 412 "Specifying the columns using strings is only "
184 413 "supported for pandas DataFrames"
185 414 )
186
187ValueError: Specifying the columns using strings is only supported for pandas DataFrames
188
Am I one-hot encoding correctly? What is the appropriate way to do this? Should I directly implement the one-hot encoder in my normalization process?
ANSWER
Answered 2021-Dec-09 at 20:59using OneHotEncoder is not the way to go here, it's better to extract the features from the column time as separate features like year, month, day, hour, minutes etc... and give these columns as input to your model.
1# model 4
2# next model
3import tensorflow as tf
4import matplotlib.pyplot as plt
5import pandas as pd
6import numpy as np
7from tensorflow.keras import layers
8from sklearn.compose import make_column_transformer
9from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
10from sklearn.model_selection import train_test_split
11
12np.set_printoptions(precision=3, suppress=True)
13btc_data = pd.read_csv(
14 "/content/drive/MyDrive/Science Fair/output2.csv",
15 names=["Time", "Open"])
16
17X_btc = btc_data[["Time"]]
18y_btc = btc_data["Open"]
19
20enc = OneHotEncoder(handle_unknown="ignore")
21enc.fit(X_btc)
22
23X_btc = enc.transform(X_btc)
24
25print(X_btc)
26
27X_train, X_test, y_train, y_test = train_test_split(X_btc, y_btc, test_size=0.2, random_state=62)
28
29ct = make_column_transformer(
30 (MinMaxScaler(), ["Time"])
31)
32
33ct.fit(X_train)
34X_train_normal = ct.transform(X_train)
35X_test_normal = ct.transform(X_test)
36
37callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=3)
38
39btc_model_4 = tf.keras.Sequential([
40 layers.Dense(100, activation="relu"),
41 layers.Dense(100, activation="relu"),
42 layers.Dense(100, activation="relu"),
43 layers.Dense(100, activation="relu"),
44 layers.Dense(100, activation="relu"),
45 layers.Dense(100, activation="relu"),
46 layers.Dense(1, activation="linear")
47])
48
49btc_model_4.compile(loss = tf.losses.MeanSquaredError(),
50 optimizer = tf.optimizers.Adam())
51
52history = btc_model_4.fit(X_train_normal, y_train, batch_size=8192, epochs=100, callbacks=[callback])
53
54btc_model_4.evaluate(X_test_normal, y_test, batch_size=8192)
55
56y_pred = btc_model_4.predict(X_test_normal)
57
58btc_model_4.save("btc_model_4")
59btc_model_4.save("btc_model_4.h5")
60
61# plot model
62def plot_evaluations(train_data=X_train_normal,
63 train_labels=y_train,
64 test_data=X_test_normal,
65 test_labels=y_test,
66 predictions=y_pred):
67 print(test_data.shape)
68 print(predictions.shape)
69
70 plt.figure(figsize=(100, 15))
71 plt.scatter(train_data, train_labels, c='b', label="Training")
72 plt.scatter(test_data, test_labels, c='g', label="Testing")
73 plt.scatter(test_data, predictions, c='r', label="Results")
74 plt.legend()
75
76plot_evaluations()
77
78# plot loss curve
79pd.DataFrame(history.history).plot()
80plt.ylabel("loss")
81plt.xlabel("epochs")
822015-12-05 12:52:00,377.48
832015-12-05 12:53:00,377.5
842015-12-05 12:54:00,377.5
852015-12-05 12:56:00,377.5
862015-12-05 12:57:00,377.5
872015-12-05 12:58:00,377.5
882015-12-05 12:59:00,377.5
892015-12-05 13:00:00,377.5
902015-12-05 13:01:00,377.79
912015-12-05 13:02:00,377.5
922015-12-05 13:03:00,377.79
932015-12-05 13:05:00,377.74
942015-12-05 13:06:00,377.79
952015-12-05 13:07:00,377.64
962015-12-05 13:08:00,377.79
972015-12-05 13:10:00,377.77
982015-12-05 13:11:00,377.7
992015-12-05 13:12:00,377.77
1002015-12-05 13:13:00,377.77
1012015-12-05 13:14:00,377.79
1022015-12-05 13:15:00,377.72
1032015-12-05 13:16:00,377.5
1042015-12-05 13:17:00,377.49
1052015-12-05 13:18:00,377.5
1062015-12-05 13:19:00,377.5
1072015-12-05 13:20:00,377.8
1082015-12-05 13:21:00,377.84
1092015-12-05 13:22:00,378.29
1102015-12-05 13:23:00,378.3
1112015-12-05 13:24:00,378.3
1122015-12-05 13:25:00,378.33
1132015-12-05 13:26:00,378.33
1142015-12-05 13:28:00,378.31
1152015-12-05 13:29:00,378.68
116 (0, 0) 1.0
117 (1, 1) 1.0
118 (2, 2) 1.0
119 (3, 3) 1.0
120 (4, 4) 1.0
121 (5, 5) 1.0
122 (6, 6) 1.0
123 (7, 7) 1.0
124 (8, 8) 1.0
125 (9, 9) 1.0
126 (10, 10) 1.0
127 (11, 11) 1.0
128 (12, 12) 1.0
129 (13, 13) 1.0
130 (14, 14) 1.0
131 (15, 15) 1.0
132 (16, 16) 1.0
133 (17, 17) 1.0
134 (18, 18) 1.0
135 (19, 19) 1.0
136 (20, 20) 1.0
137 (21, 21) 1.0
138 (22, 22) 1.0
139 (23, 23) 1.0
140 (24, 24) 1.0
141 : :
142 (2526096, 2526096) 1.0
143 (2526097, 2526097) 1.0
144 (2526098, 2526098) 1.0
145 (2526099, 2526099) 1.0
146 (2526100, 2526100) 1.0
147 (2526101, 2526101) 1.0
148 (2526102, 2526102) 1.0
149 (2526103, 2526103) 1.0
150 (2526104, 2526104) 1.0
151 (2526105, 2526105) 1.0
152 (2526106, 2526106) 1.0
153 (2526107, 2526107) 1.0
154 (2526108, 2526108) 1.0
155 (2526109, 2526109) 1.0
156 (2526110, 2526110) 1.0
157 (2526111, 2526111) 1.0
158 (2526112, 2526112) 1.0
159 (2526113, 2526113) 1.0
160 (2526114, 2526114) 1.0
161 (2526115, 2526115) 1.0
162 (2526116, 2526116) 1.0
163 (2526117, 2526117) 1.0
164 (2526118, 2526118) 1.0
165 (2526119, 2526119) 1.0
166 (2526120, 2526120) 1.0
167---------------------------------------------------------------------------
168AttributeError Traceback (most recent call last)
169/usr/local/lib/python3.7/dist-packages/sklearn/utils/__init__.py in _get_column_indices(X, key)
170 408 try:
171--> 409 all_columns = X.columns
172 410 except AttributeError:
173
1745 frames
175AttributeError: columns not found
176
177During handling of the above exception, another exception occurred:
178
179ValueError Traceback (most recent call last)
180/usr/local/lib/python3.7/dist-packages/sklearn/utils/__init__.py in _get_column_indices(X, key)
181 410 except AttributeError:
182 411 raise ValueError(
183--> 412 "Specifying the columns using strings is only "
184 413 "supported for pandas DataFrames"
185 414 )
186
187ValueError: Specifying the columns using strings is only supported for pandas DataFrames
188btc_data['Year'] = btc_data['Date'].astype('datetime64[ns]').dt.year
189btc_data['Month'] = btc_data['Date'].astype('datetime64[ns]').dt.month
190btc_data['Day'] = btc_data['Date'].astype('datetime64[ns]').dt.day
191
192
the issue here is coming from the oneHotEncoder which is getting returning a scipy sparse matrix and get rides of the column "Time" so to correct this you must re-transform the output to a pandas dataframe and add the "Time" column.
1# model 4
2# next model
3import tensorflow as tf
4import matplotlib.pyplot as plt
5import pandas as pd
6import numpy as np
7from tensorflow.keras import layers
8from sklearn.compose import make_column_transformer
9from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
10from sklearn.model_selection import train_test_split
11
12np.set_printoptions(precision=3, suppress=True)
13btc_data = pd.read_csv(
14 "/content/drive/MyDrive/Science Fair/output2.csv",
15 names=["Time", "Open"])
16
17X_btc = btc_data[["Time"]]
18y_btc = btc_data["Open"]
19
20enc = OneHotEncoder(handle_unknown="ignore")
21enc.fit(X_btc)
22
23X_btc = enc.transform(X_btc)
24
25print(X_btc)
26
27X_train, X_test, y_train, y_test = train_test_split(X_btc, y_btc, test_size=0.2, random_state=62)
28
29ct = make_column_transformer(
30 (MinMaxScaler(), ["Time"])
31)
32
33ct.fit(X_train)
34X_train_normal = ct.transform(X_train)
35X_test_normal = ct.transform(X_test)
36
37callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=3)
38
39btc_model_4 = tf.keras.Sequential([
40 layers.Dense(100, activation="relu"),
41 layers.Dense(100, activation="relu"),
42 layers.Dense(100, activation="relu"),
43 layers.Dense(100, activation="relu"),
44 layers.Dense(100, activation="relu"),
45 layers.Dense(100, activation="relu"),
46 layers.Dense(1, activation="linear")
47])
48
49btc_model_4.compile(loss = tf.losses.MeanSquaredError(),
50 optimizer = tf.optimizers.Adam())
51
52history = btc_model_4.fit(X_train_normal, y_train, batch_size=8192, epochs=100, callbacks=[callback])
53
54btc_model_4.evaluate(X_test_normal, y_test, batch_size=8192)
55
56y_pred = btc_model_4.predict(X_test_normal)
57
58btc_model_4.save("btc_model_4")
59btc_model_4.save("btc_model_4.h5")
60
61# plot model
62def plot_evaluations(train_data=X_train_normal,
63 train_labels=y_train,
64 test_data=X_test_normal,
65 test_labels=y_test,
66 predictions=y_pred):
67 print(test_data.shape)
68 print(predictions.shape)
69
70 plt.figure(figsize=(100, 15))
71 plt.scatter(train_data, train_labels, c='b', label="Training")
72 plt.scatter(test_data, test_labels, c='g', label="Testing")
73 plt.scatter(test_data, predictions, c='r', label="Results")
74 plt.legend()
75
76plot_evaluations()
77
78# plot loss curve
79pd.DataFrame(history.history).plot()
80plt.ylabel("loss")
81plt.xlabel("epochs")
822015-12-05 12:52:00,377.48
832015-12-05 12:53:00,377.5
842015-12-05 12:54:00,377.5
852015-12-05 12:56:00,377.5
862015-12-05 12:57:00,377.5
872015-12-05 12:58:00,377.5
882015-12-05 12:59:00,377.5
892015-12-05 13:00:00,377.5
902015-12-05 13:01:00,377.79
912015-12-05 13:02:00,377.5
922015-12-05 13:03:00,377.79
932015-12-05 13:05:00,377.74
942015-12-05 13:06:00,377.79
952015-12-05 13:07:00,377.64
962015-12-05 13:08:00,377.79
972015-12-05 13:10:00,377.77
982015-12-05 13:11:00,377.7
992015-12-05 13:12:00,377.77
1002015-12-05 13:13:00,377.77
1012015-12-05 13:14:00,377.79
1022015-12-05 13:15:00,377.72
1032015-12-05 13:16:00,377.5
1042015-12-05 13:17:00,377.49
1052015-12-05 13:18:00,377.5
1062015-12-05 13:19:00,377.5
1072015-12-05 13:20:00,377.8
1082015-12-05 13:21:00,377.84
1092015-12-05 13:22:00,378.29
1102015-12-05 13:23:00,378.3
1112015-12-05 13:24:00,378.3
1122015-12-05 13:25:00,378.33
1132015-12-05 13:26:00,378.33
1142015-12-05 13:28:00,378.31
1152015-12-05 13:29:00,378.68
116 (0, 0) 1.0
117 (1, 1) 1.0
118 (2, 2) 1.0
119 (3, 3) 1.0
120 (4, 4) 1.0
121 (5, 5) 1.0
122 (6, 6) 1.0
123 (7, 7) 1.0
124 (8, 8) 1.0
125 (9, 9) 1.0
126 (10, 10) 1.0
127 (11, 11) 1.0
128 (12, 12) 1.0
129 (13, 13) 1.0
130 (14, 14) 1.0
131 (15, 15) 1.0
132 (16, 16) 1.0
133 (17, 17) 1.0
134 (18, 18) 1.0
135 (19, 19) 1.0
136 (20, 20) 1.0
137 (21, 21) 1.0
138 (22, 22) 1.0
139 (23, 23) 1.0
140 (24, 24) 1.0
141 : :
142 (2526096, 2526096) 1.0
143 (2526097, 2526097) 1.0
144 (2526098, 2526098) 1.0
145 (2526099, 2526099) 1.0
146 (2526100, 2526100) 1.0
147 (2526101, 2526101) 1.0
148 (2526102, 2526102) 1.0
149 (2526103, 2526103) 1.0
150 (2526104, 2526104) 1.0
151 (2526105, 2526105) 1.0
152 (2526106, 2526106) 1.0
153 (2526107, 2526107) 1.0
154 (2526108, 2526108) 1.0
155 (2526109, 2526109) 1.0
156 (2526110, 2526110) 1.0
157 (2526111, 2526111) 1.0
158 (2526112, 2526112) 1.0
159 (2526113, 2526113) 1.0
160 (2526114, 2526114) 1.0
161 (2526115, 2526115) 1.0
162 (2526116, 2526116) 1.0
163 (2526117, 2526117) 1.0
164 (2526118, 2526118) 1.0
165 (2526119, 2526119) 1.0
166 (2526120, 2526120) 1.0
167---------------------------------------------------------------------------
168AttributeError Traceback (most recent call last)
169/usr/local/lib/python3.7/dist-packages/sklearn/utils/__init__.py in _get_column_indices(X, key)
170 408 try:
171--> 409 all_columns = X.columns
172 410 except AttributeError:
173
1745 frames
175AttributeError: columns not found
176
177During handling of the above exception, another exception occurred:
178
179ValueError Traceback (most recent call last)
180/usr/local/lib/python3.7/dist-packages/sklearn/utils/__init__.py in _get_column_indices(X, key)
181 410 except AttributeError:
182 411 raise ValueError(
183--> 412 "Specifying the columns using strings is only "
184 413 "supported for pandas DataFrames"
185 414 )
186
187ValueError: Specifying the columns using strings is only supported for pandas DataFrames
188btc_data['Year'] = btc_data['Date'].astype('datetime64[ns]').dt.year
189btc_data['Month'] = btc_data['Date'].astype('datetime64[ns]').dt.month
190btc_data['Day'] = btc_data['Date'].astype('datetime64[ns]').dt.day
191
192enc = OneHotEncoder(handle_unknown="ignore")
193enc.fit(X_btc)
194X_btc = enc.transform(X_btc)
195X_btc = pd.DataFrame(X_btc.todense())
196X_btc["Time"] = btc_data["Time"]
197
one way to countournate the memory issue is :
1# model 4
2# next model
3import tensorflow as tf
4import matplotlib.pyplot as plt
5import pandas as pd
6import numpy as np
7from tensorflow.keras import layers
8from sklearn.compose import make_column_transformer
9from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
10from sklearn.model_selection import train_test_split
11
12np.set_printoptions(precision=3, suppress=True)
13btc_data = pd.read_csv(
14 "/content/drive/MyDrive/Science Fair/output2.csv",
15 names=["Time", "Open"])
16
17X_btc = btc_data[["Time"]]
18y_btc = btc_data["Open"]
19
20enc = OneHotEncoder(handle_unknown="ignore")
21enc.fit(X_btc)
22
23X_btc = enc.transform(X_btc)
24
25print(X_btc)
26
27X_train, X_test, y_train, y_test = train_test_split(X_btc, y_btc, test_size=0.2, random_state=62)
28
29ct = make_column_transformer(
30 (MinMaxScaler(), ["Time"])
31)
32
33ct.fit(X_train)
34X_train_normal = ct.transform(X_train)
35X_test_normal = ct.transform(X_test)
36
37callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=3)
38
39btc_model_4 = tf.keras.Sequential([
40 layers.Dense(100, activation="relu"),
41 layers.Dense(100, activation="relu"),
42 layers.Dense(100, activation="relu"),
43 layers.Dense(100, activation="relu"),
44 layers.Dense(100, activation="relu"),
45 layers.Dense(100, activation="relu"),
46 layers.Dense(1, activation="linear")
47])
48
49btc_model_4.compile(loss = tf.losses.MeanSquaredError(),
50 optimizer = tf.optimizers.Adam())
51
52history = btc_model_4.fit(X_train_normal, y_train, batch_size=8192, epochs=100, callbacks=[callback])
53
54btc_model_4.evaluate(X_test_normal, y_test, batch_size=8192)
55
56y_pred = btc_model_4.predict(X_test_normal)
57
58btc_model_4.save("btc_model_4")
59btc_model_4.save("btc_model_4.h5")
60
61# plot model
62def plot_evaluations(train_data=X_train_normal,
63 train_labels=y_train,
64 test_data=X_test_normal,
65 test_labels=y_test,
66 predictions=y_pred):
67 print(test_data.shape)
68 print(predictions.shape)
69
70 plt.figure(figsize=(100, 15))
71 plt.scatter(train_data, train_labels, c='b', label="Training")
72 plt.scatter(test_data, test_labels, c='g', label="Testing")
73 plt.scatter(test_data, predictions, c='r', label="Results")
74 plt.legend()
75
76plot_evaluations()
77
78# plot loss curve
79pd.DataFrame(history.history).plot()
80plt.ylabel("loss")
81plt.xlabel("epochs")
822015-12-05 12:52:00,377.48
832015-12-05 12:53:00,377.5
842015-12-05 12:54:00,377.5
852015-12-05 12:56:00,377.5
862015-12-05 12:57:00,377.5
872015-12-05 12:58:00,377.5
882015-12-05 12:59:00,377.5
892015-12-05 13:00:00,377.5
902015-12-05 13:01:00,377.79
912015-12-05 13:02:00,377.5
922015-12-05 13:03:00,377.79
932015-12-05 13:05:00,377.74
942015-12-05 13:06:00,377.79
952015-12-05 13:07:00,377.64
962015-12-05 13:08:00,377.79
972015-12-05 13:10:00,377.77
982015-12-05 13:11:00,377.7
992015-12-05 13:12:00,377.77
1002015-12-05 13:13:00,377.77
1012015-12-05 13:14:00,377.79
1022015-12-05 13:15:00,377.72
1032015-12-05 13:16:00,377.5
1042015-12-05 13:17:00,377.49
1052015-12-05 13:18:00,377.5
1062015-12-05 13:19:00,377.5
1072015-12-05 13:20:00,377.8
1082015-12-05 13:21:00,377.84
1092015-12-05 13:22:00,378.29
1102015-12-05 13:23:00,378.3
1112015-12-05 13:24:00,378.3
1122015-12-05 13:25:00,378.33
1132015-12-05 13:26:00,378.33
1142015-12-05 13:28:00,378.31
1152015-12-05 13:29:00,378.68
116 (0, 0) 1.0
117 (1, 1) 1.0
118 (2, 2) 1.0
119 (3, 3) 1.0
120 (4, 4) 1.0
121 (5, 5) 1.0
122 (6, 6) 1.0
123 (7, 7) 1.0
124 (8, 8) 1.0
125 (9, 9) 1.0
126 (10, 10) 1.0
127 (11, 11) 1.0
128 (12, 12) 1.0
129 (13, 13) 1.0
130 (14, 14) 1.0
131 (15, 15) 1.0
132 (16, 16) 1.0
133 (17, 17) 1.0
134 (18, 18) 1.0
135 (19, 19) 1.0
136 (20, 20) 1.0
137 (21, 21) 1.0
138 (22, 22) 1.0
139 (23, 23) 1.0
140 (24, 24) 1.0
141 : :
142 (2526096, 2526096) 1.0
143 (2526097, 2526097) 1.0
144 (2526098, 2526098) 1.0
145 (2526099, 2526099) 1.0
146 (2526100, 2526100) 1.0
147 (2526101, 2526101) 1.0
148 (2526102, 2526102) 1.0
149 (2526103, 2526103) 1.0
150 (2526104, 2526104) 1.0
151 (2526105, 2526105) 1.0
152 (2526106, 2526106) 1.0
153 (2526107, 2526107) 1.0
154 (2526108, 2526108) 1.0
155 (2526109, 2526109) 1.0
156 (2526110, 2526110) 1.0
157 (2526111, 2526111) 1.0
158 (2526112, 2526112) 1.0
159 (2526113, 2526113) 1.0
160 (2526114, 2526114) 1.0
161 (2526115, 2526115) 1.0
162 (2526116, 2526116) 1.0
163 (2526117, 2526117) 1.0
164 (2526118, 2526118) 1.0
165 (2526119, 2526119) 1.0
166 (2526120, 2526120) 1.0
167---------------------------------------------------------------------------
168AttributeError Traceback (most recent call last)
169/usr/local/lib/python3.7/dist-packages/sklearn/utils/__init__.py in _get_column_indices(X, key)
170 408 try:
171--> 409 all_columns = X.columns
172 410 except AttributeError:
173
1745 frames
175AttributeError: columns not found
176
177During handling of the above exception, another exception occurred:
178
179ValueError Traceback (most recent call last)
180/usr/local/lib/python3.7/dist-packages/sklearn/utils/__init__.py in _get_column_indices(X, key)
181 410 except AttributeError:
182 411 raise ValueError(
183--> 412 "Specifying the columns using strings is only "
184 413 "supported for pandas DataFrames"
185 414 )
186
187ValueError: Specifying the columns using strings is only supported for pandas DataFrames
188btc_data['Year'] = btc_data['Date'].astype('datetime64[ns]').dt.year
189btc_data['Month'] = btc_data['Date'].astype('datetime64[ns]').dt.month
190btc_data['Day'] = btc_data['Date'].astype('datetime64[ns]').dt.day
191
192enc = OneHotEncoder(handle_unknown="ignore")
193enc.fit(X_btc)
194X_btc = enc.transform(X_btc)
195X_btc = pd.DataFrame(X_btc.todense())
196X_btc["Time"] = btc_data["Time"]
197X_train, X_test, y_train, y_test = train_test_split(X_btc, y_btc, test_size=0.2, random_state=62)
198X_train_pd, X_test_pd, y_train_pd, y_test_pd = train_test_split(btc_data, y_btc, test_size=0.2, random_state=62)
199
1# model 4
2# next model
3import tensorflow as tf
4import matplotlib.pyplot as plt
5import pandas as pd
6import numpy as np
7from tensorflow.keras import layers
8from sklearn.compose import make_column_transformer
9from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
10from sklearn.model_selection import train_test_split
11
12np.set_printoptions(precision=3, suppress=True)
13btc_data = pd.read_csv(
14 "/content/drive/MyDrive/Science Fair/output2.csv",
15 names=["Time", "Open"])
16
17X_btc = btc_data[["Time"]]
18y_btc = btc_data["Open"]
19
20enc = OneHotEncoder(handle_unknown="ignore")
21enc.fit(X_btc)
22
23X_btc = enc.transform(X_btc)
24
25print(X_btc)
26
27X_train, X_test, y_train, y_test = train_test_split(X_btc, y_btc, test_size=0.2, random_state=62)
28
29ct = make_column_transformer(
30 (MinMaxScaler(), ["Time"])
31)
32
33ct.fit(X_train)
34X_train_normal = ct.transform(X_train)
35X_test_normal = ct.transform(X_test)
36
37callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=3)
38
39btc_model_4 = tf.keras.Sequential([
40 layers.Dense(100, activation="relu"),
41 layers.Dense(100, activation="relu"),
42 layers.Dense(100, activation="relu"),
43 layers.Dense(100, activation="relu"),
44 layers.Dense(100, activation="relu"),
45 layers.Dense(100, activation="relu"),
46 layers.Dense(1, activation="linear")
47])
48
49btc_model_4.compile(loss = tf.losses.MeanSquaredError(),
50 optimizer = tf.optimizers.Adam())
51
52history = btc_model_4.fit(X_train_normal, y_train, batch_size=8192, epochs=100, callbacks=[callback])
53
54btc_model_4.evaluate(X_test_normal, y_test, batch_size=8192)
55
56y_pred = btc_model_4.predict(X_test_normal)
57
58btc_model_4.save("btc_model_4")
59btc_model_4.save("btc_model_4.h5")
60
61# plot model
62def plot_evaluations(train_data=X_train_normal,
63 train_labels=y_train,
64 test_data=X_test_normal,
65 test_labels=y_test,
66 predictions=y_pred):
67 print(test_data.shape)
68 print(predictions.shape)
69
70 plt.figure(figsize=(100, 15))
71 plt.scatter(train_data, train_labels, c='b', label="Training")
72 plt.scatter(test_data, test_labels, c='g', label="Testing")
73 plt.scatter(test_data, predictions, c='r', label="Results")
74 plt.legend()
75
76plot_evaluations()
77
78# plot loss curve
79pd.DataFrame(history.history).plot()
80plt.ylabel("loss")
81plt.xlabel("epochs")
822015-12-05 12:52:00,377.48
832015-12-05 12:53:00,377.5
842015-12-05 12:54:00,377.5
852015-12-05 12:56:00,377.5
862015-12-05 12:57:00,377.5
872015-12-05 12:58:00,377.5
882015-12-05 12:59:00,377.5
892015-12-05 13:00:00,377.5
902015-12-05 13:01:00,377.79
912015-12-05 13:02:00,377.5
922015-12-05 13:03:00,377.79
932015-12-05 13:05:00,377.74
942015-12-05 13:06:00,377.79
952015-12-05 13:07:00,377.64
962015-12-05 13:08:00,377.79
972015-12-05 13:10:00,377.77
982015-12-05 13:11:00,377.7
992015-12-05 13:12:00,377.77
1002015-12-05 13:13:00,377.77
1012015-12-05 13:14:00,377.79
1022015-12-05 13:15:00,377.72
1032015-12-05 13:16:00,377.5
1042015-12-05 13:17:00,377.49
1052015-12-05 13:18:00,377.5
1062015-12-05 13:19:00,377.5
1072015-12-05 13:20:00,377.8
1082015-12-05 13:21:00,377.84
1092015-12-05 13:22:00,378.29
1102015-12-05 13:23:00,378.3
1112015-12-05 13:24:00,378.3
1122015-12-05 13:25:00,378.33
1132015-12-05 13:26:00,378.33
1142015-12-05 13:28:00,378.31
1152015-12-05 13:29:00,378.68
116 (0, 0) 1.0
117 (1, 1) 1.0
118 (2, 2) 1.0
119 (3, 3) 1.0
120 (4, 4) 1.0
121 (5, 5) 1.0
122 (6, 6) 1.0
123 (7, 7) 1.0
124 (8, 8) 1.0
125 (9, 9) 1.0
126 (10, 10) 1.0
127 (11, 11) 1.0
128 (12, 12) 1.0
129 (13, 13) 1.0
130 (14, 14) 1.0
131 (15, 15) 1.0
132 (16, 16) 1.0
133 (17, 17) 1.0
134 (18, 18) 1.0
135 (19, 19) 1.0
136 (20, 20) 1.0
137 (21, 21) 1.0
138 (22, 22) 1.0
139 (23, 23) 1.0
140 (24, 24) 1.0
141 : :
142 (2526096, 2526096) 1.0
143 (2526097, 2526097) 1.0
144 (2526098, 2526098) 1.0
145 (2526099, 2526099) 1.0
146 (2526100, 2526100) 1.0
147 (2526101, 2526101) 1.0
148 (2526102, 2526102) 1.0
149 (2526103, 2526103) 1.0
150 (2526104, 2526104) 1.0
151 (2526105, 2526105) 1.0
152 (2526106, 2526106) 1.0
153 (2526107, 2526107) 1.0
154 (2526108, 2526108) 1.0
155 (2526109, 2526109) 1.0
156 (2526110, 2526110) 1.0
157 (2526111, 2526111) 1.0
158 (2526112, 2526112) 1.0
159 (2526113, 2526113) 1.0
160 (2526114, 2526114) 1.0
161 (2526115, 2526115) 1.0
162 (2526116, 2526116) 1.0
163 (2526117, 2526117) 1.0
164 (2526118, 2526118) 1.0
165 (2526119, 2526119) 1.0
166 (2526120, 2526120) 1.0
167---------------------------------------------------------------------------
168AttributeError Traceback (most recent call last)
169/usr/local/lib/python3.7/dist-packages/sklearn/utils/__init__.py in _get_column_indices(X, key)
170 408 try:
171--> 409 all_columns = X.columns
172 410 except AttributeError:
173
1745 frames
175AttributeError: columns not found
176
177During handling of the above exception, another exception occurred:
178
179ValueError Traceback (most recent call last)
180/usr/local/lib/python3.7/dist-packages/sklearn/utils/__init__.py in _get_column_indices(X, key)
181 410 except AttributeError:
182 411 raise ValueError(
183--> 412 "Specifying the columns using strings is only "
184 413 "supported for pandas DataFrames"
185 414 )
186
187ValueError: Specifying the columns using strings is only supported for pandas DataFrames
188btc_data['Year'] = btc_data['Date'].astype('datetime64[ns]').dt.year
189btc_data['Month'] = btc_data['Date'].astype('datetime64[ns]').dt.month
190btc_data['Day'] = btc_data['Date'].astype('datetime64[ns]').dt.day
191
192enc = OneHotEncoder(handle_unknown="ignore")
193enc.fit(X_btc)
194X_btc = enc.transform(X_btc)
195X_btc = pd.DataFrame(X_btc.todense())
196X_btc["Time"] = btc_data["Time"]
197X_train, X_test, y_train, y_test = train_test_split(X_btc, y_btc, test_size=0.2, random_state=62)
198X_train_pd, X_test_pd, y_train_pd, y_test_pd = train_test_split(btc_data, y_btc, test_size=0.2, random_state=62)
199 ct = make_column_transformer((MinMaxScaler(), ["Time"]))
200 ct.fit(X_train_pd)
201 result_train = ct.transform(X_train_pd)
202 result_test = ct.transform(X_test_pd)
203
1# model 4
2# next model
3import tensorflow as tf
4import matplotlib.pyplot as plt
5import pandas as pd
6import numpy as np
7from tensorflow.keras import layers
8from sklearn.compose import make_column_transformer
9from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
10from sklearn.model_selection import train_test_split
11
12np.set_printoptions(precision=3, suppress=True)
13btc_data = pd.read_csv(
14 "/content/drive/MyDrive/Science Fair/output2.csv",
15 names=["Time", "Open"])
16
17X_btc = btc_data[["Time"]]
18y_btc = btc_data["Open"]
19
20enc = OneHotEncoder(handle_unknown="ignore")
21enc.fit(X_btc)
22
23X_btc = enc.transform(X_btc)
24
25print(X_btc)
26
27X_train, X_test, y_train, y_test = train_test_split(X_btc, y_btc, test_size=0.2, random_state=62)
28
29ct = make_column_transformer(
30 (MinMaxScaler(), ["Time"])
31)
32
33ct.fit(X_train)
34X_train_normal = ct.transform(X_train)
35X_test_normal = ct.transform(X_test)
36
37callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=3)
38
39btc_model_4 = tf.keras.Sequential([
40 layers.Dense(100, activation="relu"),
41 layers.Dense(100, activation="relu"),
42 layers.Dense(100, activation="relu"),
43 layers.Dense(100, activation="relu"),
44 layers.Dense(100, activation="relu"),
45 layers.Dense(100, activation="relu"),
46 layers.Dense(1, activation="linear")
47])
48
49btc_model_4.compile(loss = tf.losses.MeanSquaredError(),
50 optimizer = tf.optimizers.Adam())
51
52history = btc_model_4.fit(X_train_normal, y_train, batch_size=8192, epochs=100, callbacks=[callback])
53
54btc_model_4.evaluate(X_test_normal, y_test, batch_size=8192)
55
56y_pred = btc_model_4.predict(X_test_normal)
57
58btc_model_4.save("btc_model_4")
59btc_model_4.save("btc_model_4.h5")
60
61# plot model
62def plot_evaluations(train_data=X_train_normal,
63 train_labels=y_train,
64 test_data=X_test_normal,
65 test_labels=y_test,
66 predictions=y_pred):
67 print(test_data.shape)
68 print(predictions.shape)
69
70 plt.figure(figsize=(100, 15))
71 plt.scatter(train_data, train_labels, c='b', label="Training")
72 plt.scatter(test_data, test_labels, c='g', label="Testing")
73 plt.scatter(test_data, predictions, c='r', label="Results")
74 plt.legend()
75
76plot_evaluations()
77
78# plot loss curve
79pd.DataFrame(history.history).plot()
80plt.ylabel("loss")
81plt.xlabel("epochs")
822015-12-05 12:52:00,377.48
832015-12-05 12:53:00,377.5
842015-12-05 12:54:00,377.5
852015-12-05 12:56:00,377.5
862015-12-05 12:57:00,377.5
872015-12-05 12:58:00,377.5
882015-12-05 12:59:00,377.5
892015-12-05 13:00:00,377.5
902015-12-05 13:01:00,377.79
912015-12-05 13:02:00,377.5
922015-12-05 13:03:00,377.79
932015-12-05 13:05:00,377.74
942015-12-05 13:06:00,377.79
952015-12-05 13:07:00,377.64
962015-12-05 13:08:00,377.79
972015-12-05 13:10:00,377.77
982015-12-05 13:11:00,377.7
992015-12-05 13:12:00,377.77
1002015-12-05 13:13:00,377.77
1012015-12-05 13:14:00,377.79
1022015-12-05 13:15:00,377.72
1032015-12-05 13:16:00,377.5
1042015-12-05 13:17:00,377.49
1052015-12-05 13:18:00,377.5
1062015-12-05 13:19:00,377.5
1072015-12-05 13:20:00,377.8
1082015-12-05 13:21:00,377.84
1092015-12-05 13:22:00,378.29
1102015-12-05 13:23:00,378.3
1112015-12-05 13:24:00,378.3
1122015-12-05 13:25:00,378.33
1132015-12-05 13:26:00,378.33
1142015-12-05 13:28:00,378.31
1152015-12-05 13:29:00,378.68
116 (0, 0) 1.0
117 (1, 1) 1.0
118 (2, 2) 1.0
119 (3, 3) 1.0
120 (4, 4) 1.0
121 (5, 5) 1.0
122 (6, 6) 1.0
123 (7, 7) 1.0
124 (8, 8) 1.0
125 (9, 9) 1.0
126 (10, 10) 1.0
127 (11, 11) 1.0
128 (12, 12) 1.0
129 (13, 13) 1.0
130 (14, 14) 1.0
131 (15, 15) 1.0
132 (16, 16) 1.0
133 (17, 17) 1.0
134 (18, 18) 1.0
135 (19, 19) 1.0
136 (20, 20) 1.0
137 (21, 21) 1.0
138 (22, 22) 1.0
139 (23, 23) 1.0
140 (24, 24) 1.0
141 : :
142 (2526096, 2526096) 1.0
143 (2526097, 2526097) 1.0
144 (2526098, 2526098) 1.0
145 (2526099, 2526099) 1.0
146 (2526100, 2526100) 1.0
147 (2526101, 2526101) 1.0
148 (2526102, 2526102) 1.0
149 (2526103, 2526103) 1.0
150 (2526104, 2526104) 1.0
151 (2526105, 2526105) 1.0
152 (2526106, 2526106) 1.0
153 (2526107, 2526107) 1.0
154 (2526108, 2526108) 1.0
155 (2526109, 2526109) 1.0
156 (2526110, 2526110) 1.0
157 (2526111, 2526111) 1.0
158 (2526112, 2526112) 1.0
159 (2526113, 2526113) 1.0
160 (2526114, 2526114) 1.0
161 (2526115, 2526115) 1.0
162 (2526116, 2526116) 1.0
163 (2526117, 2526117) 1.0
164 (2526118, 2526118) 1.0
165 (2526119, 2526119) 1.0
166 (2526120, 2526120) 1.0
167---------------------------------------------------------------------------
168AttributeError Traceback (most recent call last)
169/usr/local/lib/python3.7/dist-packages/sklearn/utils/__init__.py in _get_column_indices(X, key)
170 408 try:
171--> 409 all_columns = X.columns
172 410 except AttributeError:
173
1745 frames
175AttributeError: columns not found
176
177During handling of the above exception, another exception occurred:
178
179ValueError Traceback (most recent call last)
180/usr/local/lib/python3.7/dist-packages/sklearn/utils/__init__.py in _get_column_indices(X, key)
181 410 except AttributeError:
182 411 raise ValueError(
183--> 412 "Specifying the columns using strings is only "
184 413 "supported for pandas DataFrames"
185 414 )
186
187ValueError: Specifying the columns using strings is only supported for pandas DataFrames
188btc_data['Year'] = btc_data['Date'].astype('datetime64[ns]').dt.year
189btc_data['Month'] = btc_data['Date'].astype('datetime64[ns]').dt.month
190btc_data['Day'] = btc_data['Date'].astype('datetime64[ns]').dt.day
191
192enc = OneHotEncoder(handle_unknown="ignore")
193enc.fit(X_btc)
194X_btc = enc.transform(X_btc)
195X_btc = pd.DataFrame(X_btc.todense())
196X_btc["Time"] = btc_data["Time"]
197X_train, X_test, y_train, y_test = train_test_split(X_btc, y_btc, test_size=0.2, random_state=62)
198X_train_pd, X_test_pd, y_train_pd, y_test_pd = train_test_split(btc_data, y_btc, test_size=0.2, random_state=62)
199 ct = make_column_transformer((MinMaxScaler(), ["Time"]))
200 ct.fit(X_train_pd)
201 result_train = ct.transform(X_train_pd)
202 result_test = ct.transform(X_test_pd)
203def nn_batch_generator(X_data, y_data, scaled, batch_size):
204 samples_per_epoch = X_data.shape[0]
205 number_of_batches = samples_per_epoch / batch_size
206 counter = 0
207 index = np.arange(np.shape(y_data)[0])
208 while True:
209 index_batch = index[batch_size * counter:batch_size * (counter + 1)]
210 scaled_array = scaled[index_batch]
211 X_batch = X_data[index_batch, :].todense()
212 y_batch = y_data.iloc[index_batch]
213 counter += 1
214 yield np.array(np.hstack((np.array(X_batch), scaled_array))), np.array(y_batch)
215 if (counter > number_of_batches):
216 counter = 0
217
218
219def nn_batch_generator_test(X_data, scaled, batch_size):
220 samples_per_epoch = X_data.shape[0]
221 number_of_batches = samples_per_epoch / batch_size
222 counter = 0
223 index = np.arange(np.shape(X_data)[0])
224 while True:
225 index_batch = index[batch_size * counter:batch_size * (counter + 1)]
226 scaled_array = scaled[index_batch]
227 X_batch = X_data[index_batch, :].todense()
228 counter += 1
229 yield np.hstack((X_batch, scaled_array))
230 if (counter > number_of_batches):
231 counter = 0
232
233
Finally fit the model
1# model 4
2# next model
3import tensorflow as tf
4import matplotlib.pyplot as plt
5import pandas as pd
6import numpy as np
7from tensorflow.keras import layers
8from sklearn.compose import make_column_transformer
9from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
10from sklearn.model_selection import train_test_split
11
12np.set_printoptions(precision=3, suppress=True)
13btc_data = pd.read_csv(
14 "/content/drive/MyDrive/Science Fair/output2.csv",
15 names=["Time", "Open"])
16
17X_btc = btc_data[["Time"]]
18y_btc = btc_data["Open"]
19
20enc = OneHotEncoder(handle_unknown="ignore")
21enc.fit(X_btc)
22
23X_btc = enc.transform(X_btc)
24
25print(X_btc)
26
27X_train, X_test, y_train, y_test = train_test_split(X_btc, y_btc, test_size=0.2, random_state=62)
28
29ct = make_column_transformer(
30 (MinMaxScaler(), ["Time"])
31)
32
33ct.fit(X_train)
34X_train_normal = ct.transform(X_train)
35X_test_normal = ct.transform(X_test)
36
37callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=3)
38
39btc_model_4 = tf.keras.Sequential([
40 layers.Dense(100, activation="relu"),
41 layers.Dense(100, activation="relu"),
42 layers.Dense(100, activation="relu"),
43 layers.Dense(100, activation="relu"),
44 layers.Dense(100, activation="relu"),
45 layers.Dense(100, activation="relu"),
46 layers.Dense(1, activation="linear")
47])
48
49btc_model_4.compile(loss = tf.losses.MeanSquaredError(),
50 optimizer = tf.optimizers.Adam())
51
52history = btc_model_4.fit(X_train_normal, y_train, batch_size=8192, epochs=100, callbacks=[callback])
53
54btc_model_4.evaluate(X_test_normal, y_test, batch_size=8192)
55
56y_pred = btc_model_4.predict(X_test_normal)
57
58btc_model_4.save("btc_model_4")
59btc_model_4.save("btc_model_4.h5")
60
61# plot model
62def plot_evaluations(train_data=X_train_normal,
63 train_labels=y_train,
64 test_data=X_test_normal,
65 test_labels=y_test,
66 predictions=y_pred):
67 print(test_data.shape)
68 print(predictions.shape)
69
70 plt.figure(figsize=(100, 15))
71 plt.scatter(train_data, train_labels, c='b', label="Training")
72 plt.scatter(test_data, test_labels, c='g', label="Testing")
73 plt.scatter(test_data, predictions, c='r', label="Results")
74 plt.legend()
75
76plot_evaluations()
77
78# plot loss curve
79pd.DataFrame(history.history).plot()
80plt.ylabel("loss")
81plt.xlabel("epochs")
822015-12-05 12:52:00,377.48
832015-12-05 12:53:00,377.5
842015-12-05 12:54:00,377.5
852015-12-05 12:56:00,377.5
862015-12-05 12:57:00,377.5
872015-12-05 12:58:00,377.5
882015-12-05 12:59:00,377.5
892015-12-05 13:00:00,377.5
902015-12-05 13:01:00,377.79
912015-12-05 13:02:00,377.5
922015-12-05 13:03:00,377.79
932015-12-05 13:05:00,377.74
942015-12-05 13:06:00,377.79
952015-12-05 13:07:00,377.64
962015-12-05 13:08:00,377.79
972015-12-05 13:10:00,377.77
982015-12-05 13:11:00,377.7
992015-12-05 13:12:00,377.77
1002015-12-05 13:13:00,377.77
1012015-12-05 13:14:00,377.79
1022015-12-05 13:15:00,377.72
1032015-12-05 13:16:00,377.5
1042015-12-05 13:17:00,377.49
1052015-12-05 13:18:00,377.5
1062015-12-05 13:19:00,377.5
1072015-12-05 13:20:00,377.8
1082015-12-05 13:21:00,377.84
1092015-12-05 13:22:00,378.29
1102015-12-05 13:23:00,378.3
1112015-12-05 13:24:00,378.3
1122015-12-05 13:25:00,378.33
1132015-12-05 13:26:00,378.33
1142015-12-05 13:28:00,378.31
1152015-12-05 13:29:00,378.68
116 (0, 0) 1.0
117 (1, 1) 1.0
118 (2, 2) 1.0
119 (3, 3) 1.0
120 (4, 4) 1.0
121 (5, 5) 1.0
122 (6, 6) 1.0
123 (7, 7) 1.0
124 (8, 8) 1.0
125 (9, 9) 1.0
126 (10, 10) 1.0
127 (11, 11) 1.0
128 (12, 12) 1.0
129 (13, 13) 1.0
130 (14, 14) 1.0
131 (15, 15) 1.0
132 (16, 16) 1.0
133 (17, 17) 1.0
134 (18, 18) 1.0
135 (19, 19) 1.0
136 (20, 20) 1.0
137 (21, 21) 1.0
138 (22, 22) 1.0
139 (23, 23) 1.0
140 (24, 24) 1.0
141 : :
142 (2526096, 2526096) 1.0
143 (2526097, 2526097) 1.0
144 (2526098, 2526098) 1.0
145 (2526099, 2526099) 1.0
146 (2526100, 2526100) 1.0
147 (2526101, 2526101) 1.0
148 (2526102, 2526102) 1.0
149 (2526103, 2526103) 1.0
150 (2526104, 2526104) 1.0
151 (2526105, 2526105) 1.0
152 (2526106, 2526106) 1.0
153 (2526107, 2526107) 1.0
154 (2526108, 2526108) 1.0
155 (2526109, 2526109) 1.0
156 (2526110, 2526110) 1.0
157 (2526111, 2526111) 1.0
158 (2526112, 2526112) 1.0
159 (2526113, 2526113) 1.0
160 (2526114, 2526114) 1.0
161 (2526115, 2526115) 1.0
162 (2526116, 2526116) 1.0
163 (2526117, 2526117) 1.0
164 (2526118, 2526118) 1.0
165 (2526119, 2526119) 1.0
166 (2526120, 2526120) 1.0
167---------------------------------------------------------------------------
168AttributeError Traceback (most recent call last)
169/usr/local/lib/python3.7/dist-packages/sklearn/utils/__init__.py in _get_column_indices(X, key)
170 408 try:
171--> 409 all_columns = X.columns
172 410 except AttributeError:
173
1745 frames
175AttributeError: columns not found
176
177During handling of the above exception, another exception occurred:
178
179ValueError Traceback (most recent call last)
180/usr/local/lib/python3.7/dist-packages/sklearn/utils/__init__.py in _get_column_indices(X, key)
181 410 except AttributeError:
182 411 raise ValueError(
183--> 412 "Specifying the columns using strings is only "
184 413 "supported for pandas DataFrames"
185 414 )
186
187ValueError: Specifying the columns using strings is only supported for pandas DataFrames
188btc_data['Year'] = btc_data['Date'].astype('datetime64[ns]').dt.year
189btc_data['Month'] = btc_data['Date'].astype('datetime64[ns]').dt.month
190btc_data['Day'] = btc_data['Date'].astype('datetime64[ns]').dt.day
191
192enc = OneHotEncoder(handle_unknown="ignore")
193enc.fit(X_btc)
194X_btc = enc.transform(X_btc)
195X_btc = pd.DataFrame(X_btc.todense())
196X_btc["Time"] = btc_data["Time"]
197X_train, X_test, y_train, y_test = train_test_split(X_btc, y_btc, test_size=0.2, random_state=62)
198X_train_pd, X_test_pd, y_train_pd, y_test_pd = train_test_split(btc_data, y_btc, test_size=0.2, random_state=62)
199 ct = make_column_transformer((MinMaxScaler(), ["Time"]))
200 ct.fit(X_train_pd)
201 result_train = ct.transform(X_train_pd)
202 result_test = ct.transform(X_test_pd)
203def nn_batch_generator(X_data, y_data, scaled, batch_size):
204 samples_per_epoch = X_data.shape[0]
205 number_of_batches = samples_per_epoch / batch_size
206 counter = 0
207 index = np.arange(np.shape(y_data)[0])
208 while True:
209 index_batch = index[batch_size * counter:batch_size * (counter + 1)]
210 scaled_array = scaled[index_batch]
211 X_batch = X_data[index_batch, :].todense()
212 y_batch = y_data.iloc[index_batch]
213 counter += 1
214 yield np.array(np.hstack((np.array(X_batch), scaled_array))), np.array(y_batch)
215 if (counter > number_of_batches):
216 counter = 0
217
218
219def nn_batch_generator_test(X_data, scaled, batch_size):
220 samples_per_epoch = X_data.shape[0]
221 number_of_batches = samples_per_epoch / batch_size
222 counter = 0
223 index = np.arange(np.shape(X_data)[0])
224 while True:
225 index_batch = index[batch_size * counter:batch_size * (counter + 1)]
226 scaled_array = scaled[index_batch]
227 X_batch = X_data[index_batch, :].todense()
228 counter += 1
229 yield np.hstack((X_batch, scaled_array))
230 if (counter > number_of_batches):
231 counter = 0
232
233
234history = btc_model_4.fit(nn_batch_generator(X_train, y_train, scaled=result_train, batch_size=2), steps_per_epoch=#Todetermine,
235 batch_size=2, epochs=10,
236 callbacks=[callback])
237
238btc_model_4.evaluate(nn_batch_generator(X_test, y_test, scaled=result_test, batch_size=2), batch_size=2, steps=#Todetermine)
239y_pred = btc_model_4.predict(nn_batch_generator_test(X_test, scaled=result_test, batch_size=2), steps=#Todetermine)
240
241
QUESTION
What are differences between AutoModelForSequenceClassification vs AutoModel
Asked 2021-Dec-05 at 09:07We can create a model from AutoModel(TFAutoModel) function:
1from transformers import AutoModel
2model = AutoModel.from_pretrained('distilbert-base-uncase')
3
In other hand, a model is created by AutoModelForSequenceClassification(TFAutoModelForSequenceClassification):
1from transformers import AutoModel
2model = AutoModel.from_pretrained('distilbert-base-uncase')
3from transformers import AutoModelForSequenceClassification
4model = AutoModelForSequenceClassification('distilbert-base-uncase')
5
As I know, both models use distilbert-base-uncase library to create models. From name of methods, the second class( AutoModelForSequenceClassification ) is created for Sequence Classification.
But what are really differences in 2 classes? And how to use them correctly?
(I searched in huggingface but it is not clear)
ANSWER
Answered 2021-Dec-05 at 09:07The difference between AutoModel and AutoModelForSequenceClassification model is that AutoModelForSequenceClassification has a classification head on top of the model outputs which can be easily trained with the base model
QUESTION
How can I check a confusion_matrix after fine-tuning with custom datasets?
Asked 2021-Nov-24 at 13:26This question is the same with How can I check a confusion_matrix after fine-tuning with custom datasets?, on Data Science Stack Exchange.
I would like to check a confusion_matrix, including precision, recall, and f1-score like below after fine-tuning with custom datasets.
Fine tuning process and the task are Sequence Classification with IMDb Reviews on the Fine-tuning with custom datasets tutorial on Hugging face.
After finishing the fine-tune with Trainer, how can I check a confusion_matrix in this case?
An image of confusion_matrix, including precision, recall, and f1-score original site: just for example output image
1predictions = np.argmax(trainer.test(test_x), axis=1)
2
3# Confusion matrix and classification report.
4print(classification_report(test_y, predictions))
5
6 precision recall f1-score support
7
8 0 0.75 0.79 0.77 1000
9 1 0.81 0.87 0.84 1000
10 2 0.63 0.61 0.62 1000
11 3 0.55 0.47 0.50 1000
12 4 0.66 0.66 0.66 1000
13 5 0.62 0.64 0.63 1000
14 6 0.74 0.83 0.78 1000
15 7 0.80 0.74 0.77 1000
16 8 0.85 0.81 0.83 1000
17 9 0.79 0.80 0.80 1000
18
19avg / total 0.72 0.72 0.72 10000
20
1predictions = np.argmax(trainer.test(test_x), axis=1)
2
3# Confusion matrix and classification report.
4print(classification_report(test_y, predictions))
5
6 precision recall f1-score support
7
8 0 0.75 0.79 0.77 1000
9 1 0.81 0.87 0.84 1000
10 2 0.63 0.61 0.62 1000
11 3 0.55 0.47 0.50 1000
12 4 0.66 0.66 0.66 1000
13 5 0.62 0.64 0.63 1000
14 6 0.74 0.83 0.78 1000
15 7 0.80 0.74 0.77 1000
16 8 0.85 0.81 0.83 1000
17 9 0.79 0.80 0.80 1000
18
19avg / total 0.72 0.72 0.72 10000
20from transformers import DistilBertForSequenceClassification, Trainer, TrainingArguments
21
22training_args = TrainingArguments(
23 output_dir='./results', # output directory
24 num_train_epochs=3, # total number of training epochs
25 per_device_train_batch_size=16, # batch size per device during training
26 per_device_eval_batch_size=64, # batch size for evaluation
27 warmup_steps=500, # number of warmup steps for learning rate scheduler
28 weight_decay=0.01, # strength of weight decay
29 logging_dir='./logs', # directory for storing logs
30 logging_steps=10,
31)
32
33model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")
34
35trainer = Trainer(
36 model=model, # the instantiated 🤗 Transformers model to be trained
37 args=training_args, # training arguments, defined above
38 train_dataset=train_dataset, # training dataset
39 eval_dataset=val_dataset # evaluation dataset
40)
41
42trainer.train()
43
Data set Preparation for Sequence Classification with IMDb Reviews, and I'm fine-tuning with Trainer.
1predictions = np.argmax(trainer.test(test_x), axis=1)
2
3# Confusion matrix and classification report.
4print(classification_report(test_y, predictions))
5
6 precision recall f1-score support
7
8 0 0.75 0.79 0.77 1000
9 1 0.81 0.87 0.84 1000
10 2 0.63 0.61 0.62 1000
11 3 0.55 0.47 0.50 1000
12 4 0.66 0.66 0.66 1000
13 5 0.62 0.64 0.63 1000
14 6 0.74 0.83 0.78 1000
15 7 0.80 0.74 0.77 1000
16 8 0.85 0.81 0.83 1000
17 9 0.79 0.80 0.80 1000
18
19avg / total 0.72 0.72 0.72 10000
20from transformers import DistilBertForSequenceClassification, Trainer, TrainingArguments
21
22training_args = TrainingArguments(
23 output_dir='./results', # output directory
24 num_train_epochs=3, # total number of training epochs
25 per_device_train_batch_size=16, # batch size per device during training
26 per_device_eval_batch_size=64, # batch size for evaluation
27 warmup_steps=500, # number of warmup steps for learning rate scheduler
28 weight_decay=0.01, # strength of weight decay
29 logging_dir='./logs', # directory for storing logs
30 logging_steps=10,
31)
32
33model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")
34
35trainer = Trainer(
36 model=model, # the instantiated 🤗 Transformers model to be trained
37 args=training_args, # training arguments, defined above
38 train_dataset=train_dataset, # training dataset
39 eval_dataset=val_dataset # evaluation dataset
40)
41
42trainer.train()
43from pathlib import Path
44
45def read_imdb_split(split_dir):
46 split_dir = Path(split_dir)
47 texts = []
48 labels = []
49 for label_dir in ["pos", "neg"]:
50 for text_file in (split_dir/label_dir).iterdir():
51 texts.append(text_file.read_text())
52 labels.append(0 if label_dir is "neg" else 1)
53
54 return texts, labels
55
56train_texts, train_labels = read_imdb_split('aclImdb/train')
57test_texts, test_labels = read_imdb_split('aclImdb/test')
58
59from sklearn.model_selection import train_test_split
60train_texts, val_texts, train_labels, val_labels = train_test_split(train_texts, train_labels, test_size=.2)
61
62from transformers import DistilBertTokenizerFast
63tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased')
64
65train_encodings = tokenizer(train_texts, truncation=True, padding=True)
66val_encodings = tokenizer(val_texts, truncation=True, padding=True)
67test_encodings = tokenizer(test_texts, truncation=True, padding=True)
68
69import torch
70
71class IMDbDataset(torch.utils.data.Dataset):
72 def __init__(self, encodings, labels):
73 self.encodings = encodings
74 self.labels = labels
75
76 def __getitem__(self, idx):
77 item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
78 item['labels'] = torch.tensor(self.labels[idx])
79 return item
80
81 def __len__(self):
82 return len(self.labels)
83
84train_dataset = IMDbDataset(train_encodings, train_labels)
85val_dataset = IMDbDataset(val_encodings, val_labels)
86test_dataset = IMDbDataset(test_encodings, test_labels)
87
ANSWER
Answered 2021-Nov-24 at 13:26What you could do in this situation is to iterate on the validation set(or on the test set for that matter) and manually create a list of y_true
and y_pred
.
1predictions = np.argmax(trainer.test(test_x), axis=1)
2
3# Confusion matrix and classification report.
4print(classification_report(test_y, predictions))
5
6 precision recall f1-score support
7
8 0 0.75 0.79 0.77 1000
9 1 0.81 0.87 0.84 1000
10 2 0.63 0.61 0.62 1000
11 3 0.55 0.47 0.50 1000
12 4 0.66 0.66 0.66 1000
13 5 0.62 0.64 0.63 1000
14 6 0.74 0.83 0.78 1000
15 7 0.80 0.74 0.77 1000
16 8 0.85 0.81 0.83 1000
17 9 0.79 0.80 0.80 1000
18
19avg / total 0.72 0.72 0.72 10000
20from transformers import DistilBertForSequenceClassification, Trainer, TrainingArguments
21
22training_args = TrainingArguments(
23 output_dir='./results', # output directory
24 num_train_epochs=3, # total number of training epochs
25 per_device_train_batch_size=16, # batch size per device during training
26 per_device_eval_batch_size=64, # batch size for evaluation
27 warmup_steps=500, # number of warmup steps for learning rate scheduler
28 weight_decay=0.01, # strength of weight decay
29 logging_dir='./logs', # directory for storing logs
30 logging_steps=10,
31)
32
33model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")
34
35trainer = Trainer(
36 model=model, # the instantiated 🤗 Transformers model to be trained
37 args=training_args, # training arguments, defined above
38 train_dataset=train_dataset, # training dataset
39 eval_dataset=val_dataset # evaluation dataset
40)
41
42trainer.train()
43from pathlib import Path
44
45def read_imdb_split(split_dir):
46 split_dir = Path(split_dir)
47 texts = []
48 labels = []
49 for label_dir in ["pos", "neg"]:
50 for text_file in (split_dir/label_dir).iterdir():
51 texts.append(text_file.read_text())
52 labels.append(0 if label_dir is "neg" else 1)
53
54 return texts, labels
55
56train_texts, train_labels = read_imdb_split('aclImdb/train')
57test_texts, test_labels = read_imdb_split('aclImdb/test')
58
59from sklearn.model_selection import train_test_split
60train_texts, val_texts, train_labels, val_labels = train_test_split(train_texts, train_labels, test_size=.2)
61
62from transformers import DistilBertTokenizerFast
63tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased')
64
65train_encodings = tokenizer(train_texts, truncation=True, padding=True)
66val_encodings = tokenizer(val_texts, truncation=True, padding=True)
67test_encodings = tokenizer(test_texts, truncation=True, padding=True)
68
69import torch
70
71class IMDbDataset(torch.utils.data.Dataset):
72 def __init__(self, encodings, labels):
73 self.encodings = encodings
74 self.labels = labels
75
76 def __getitem__(self, idx):
77 item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
78 item['labels'] = torch.tensor(self.labels[idx])
79 return item
80
81 def __len__(self):
82 return len(self.labels)
83
84train_dataset = IMDbDataset(train_encodings, train_labels)
85val_dataset = IMDbDataset(val_encodings, val_labels)
86test_dataset = IMDbDataset(test_encodings, test_labels)
87import torch
88import torch.nn.functional as F
89from sklearn import metrics
90
91y_preds = []
92y_trues = []
93for index,val_text in enumerate(val_texts):
94 tokenized_val_text = tokenizer([val_text],
95 truncation=True,
96 padding=True,
97 return_tensor='pt')
98 logits = model(tokenized_val_text)
99 prediction = F.softmax(logits, dim=1)
100 y_pred = torch.argmax(prediction).numpy()
101 y_true = val_labels[index]
102 y_preds.append(y_pred)
103 y_trues.append(y_true)
104
Finally,
1predictions = np.argmax(trainer.test(test_x), axis=1)
2
3# Confusion matrix and classification report.
4print(classification_report(test_y, predictions))
5
6 precision recall f1-score support
7
8 0 0.75 0.79 0.77 1000
9 1 0.81 0.87 0.84 1000
10 2 0.63 0.61 0.62 1000
11 3 0.55 0.47 0.50 1000
12 4 0.66 0.66 0.66 1000
13 5 0.62 0.64 0.63 1000
14 6 0.74 0.83 0.78 1000
15 7 0.80 0.74 0.77 1000
16 8 0.85 0.81 0.83 1000
17 9 0.79 0.80 0.80 1000
18
19avg / total 0.72 0.72 0.72 10000
20from transformers import DistilBertForSequenceClassification, Trainer, TrainingArguments
21
22training_args = TrainingArguments(
23 output_dir='./results', # output directory
24 num_train_epochs=3, # total number of training epochs
25 per_device_train_batch_size=16, # batch size per device during training
26 per_device_eval_batch_size=64, # batch size for evaluation
27 warmup_steps=500, # number of warmup steps for learning rate scheduler
28 weight_decay=0.01, # strength of weight decay
29 logging_dir='./logs', # directory for storing logs
30 logging_steps=10,
31)
32
33model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")
34
35trainer = Trainer(
36 model=model, # the instantiated 🤗 Transformers model to be trained
37 args=training_args, # training arguments, defined above
38 train_dataset=train_dataset, # training dataset
39 eval_dataset=val_dataset # evaluation dataset
40)
41
42trainer.train()
43from pathlib import Path
44
45def read_imdb_split(split_dir):
46 split_dir = Path(split_dir)
47 texts = []
48 labels = []
49 for label_dir in ["pos", "neg"]:
50 for text_file in (split_dir/label_dir).iterdir():
51 texts.append(text_file.read_text())
52 labels.append(0 if label_dir is "neg" else 1)
53
54 return texts, labels
55
56train_texts, train_labels = read_imdb_split('aclImdb/train')
57test_texts, test_labels = read_imdb_split('aclImdb/test')
58
59from sklearn.model_selection import train_test_split
60train_texts, val_texts, train_labels, val_labels = train_test_split(train_texts, train_labels, test_size=.2)
61
62from transformers import DistilBertTokenizerFast
63tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased')
64
65train_encodings = tokenizer(train_texts, truncation=True, padding=True)
66val_encodings = tokenizer(val_texts, truncation=True, padding=True)
67test_