neural-networks-and-deep-learning | repository contains code samples | Machine Learning library
kandi X-RAY | neural-networks-and-deep-learning Summary
kandi X-RAY | neural-networks-and-deep-learning Summary
This repository contains code samples for my (forthcoming) book on "Neural Networks and Deep Learning". As the code is written to accompany the book, I don't intend to add new features. However, bug reports are welcome, and you should feel free to fork and modify the code.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Compute the SGD
- Update the mini - batch
- Evaluate the function
- Compute the feedforward layer
- Make plots of training
- Plots the test cost
- Plot the test accuracy
- Plots the overlaid overlays on the training data
- Wrapper for load_data
- Return vectorized result
- Load MNist dataset
- Plot a rotated image
- Plots a numpy ndist digit
- Load MNIST dataset
- Train a network
- Wrapper for SGD
- Extract images from training set
neural-networks-and-deep-learning Key Features
neural-networks-and-deep-learning Examples and Code Snippets
Community Discussions
Trending Discussions on neural-networks-and-deep-learning
QUESTION
I printed the vectorized form of the first training image of the mnist dataset of pytorch and https://github.com/mnielsen/neural-networks-and-deep-learning/tree/master/data . The difference seems too big for just floating point precision error.
Full Diff of first mnist train image: https://www.diffchecker.com/6y6YTFiN
Code to reproduce:
...ANSWER
Answered 2020-Oct-24 at 09:55MNIST images consist of pixel values that are integers in the range 0 to 255 (inclusive). To produce the tensors you are looking at, those integer values have been normalised to lie between 0.0 and 1,0, by dividing them all by some constant factor. It appears that your two sources chose different normalising factors: 255 in one case and 256 in the other.
QUESTION
What are all the differences between numpy.random.rand
and numpy.random.randn
?
From the docs, I know that the only difference among them are from the probabilistic distribution each number is drawn from, but the overall structure (dimension) and data type used (float) are the same. I have a hard time debugging a neural network because of believing this.
Specifically, I am trying to re-implement the Neural Network provided in the Neural Network and Deep Learning book by Michael Nielson. The original code can be found here. My implementation was the same as the original one, except that I defined and initialized weights and biases with numpy.random.rand
in init
function, rather than numpy.random.randn
as in the original.
However, my code that use random.rand
to initialize weights and biases
doesn't work because the network won't learn and the weights and biases are will not change.
What difference(s) among two random functions cause this weirdness?
...ANSWER
Answered 2018-Feb-05 at 02:35First, as you see from the documentation numpy.random.randn
generates samples from the normal distribution, while numpy.random.rand
from unifrom (in range [0,1)).
Second, why uniform distribution didn't work? The main reason in this is activation function, especially in your case where you use sigmoid function. The plot of the sigmoid looks like following:
So you can see that if your input is away from 0, the slope of the function decreases quite fast and as a result you get a tiny gradient and tiny weight update. And if you have many layers - those gradients get multiplied many times in the back pass, so even "proper" gradients after multiplications become small and stop making any influence. So if you have a lot of weights which bring your input to those regions you network is hardly trainable. That's why it is a usual practice to initialize network variables around zero value. This is done to ensure that you get reasonable gradients (close to 1) to train your net.
However, uniform distribution is not something completely undesirable, you just need to make the range smaller and closer to zero. As one of good practices is using Xavier initialization. In this approach you can initialize your weights with:
1) Normal distribution. Where mean is 0 and var = sqrt(2. / (in + out))
, where in - is the number of inputs to the neurons and out - number of outputs.
2)Unifrom distribution in range [-sqrt(6. / (in + out)), +sqrt(6. / (in + out))]
QUESTION
I tried porting the NN code presented here to Julia, hoping for a speed increase in training the network. On my desktop, this proved to be the case.
However, on my MacBook, Python + numpy beats Julia by miles.
Training with the same parameters, Python is more than twice as fast as Julia (4.4s vs 10.6s for one epoch). Considering that Julia is faster than Python (by ~2s) on my desktop, it seems like there's some resource that Python/numpy is utilizing on the mac that Julia isn't. Even parallelizing the code only gets me down to ~6.6s (although this might be due to me not being that experienced in writing parallel code). I thought the problem might be that Julia's BLAS was slower than the vecLib library used natively in mac, but experimenting with different builds didn't seem to get me much closer. I tried building both with USE_SYSTEM_BLAS = 1, and building with MKL, of which MKL gave the faster result (the times posted above).
I'll post my version info for the laptop as well as my Julia implementation below for reference. I don't have access to the desktop at this time, but I was running the same version of Julia on Windows, using openBLAS, comparing with a clean installation of Python 2.7 also using openBLAS.
Is there something I'm missing here?
EDIT: I know that my Julia code leaves a lot to be desired in terms of optimization, I really appreciate any tips to make it faster. However, this is not a case of Julia being slower on my laptop but rather Python being much faster. On my desktop, Python runs one epoch in ~13 seconds, on the laptop it only takes ~4.4s. What I'm interested in the most is where this difference comes from. I realize the question may be somewhat poorly formulated.
Versions on laptop:
...ANSWER
Answered 2018-Apr-09 at 02:02I started by running your code:
QUESTION
I was studying the AdaDelta optimization algorithm so I tried to implement it in Python, but there is something wrong with my code, since I get the following error:
AttributeError: 'numpy.ndarray' object has no attribute 'sqrt'
I did not find something about what is causing that error. According to the message, it's because of this line of code:
rms_grad = np.sqrt(self.e_grad + epsilon)
This line is similar to this equation:
RMS[g]t=√E[g^2]t+ϵ
I got the core equations of the algorithm in this article: http://ruder.io/optimizing-gradient-descent/index.html#adadelta
Just one more detail: I'm initializing the E[g^2]t
matrix like this:
self.e_grad = (1 - mu)*np.square(nabla)
Where nabla is the gradient. Similar to this equation:
E[g2]t = γE[g2]t−1 + (1−γ)g2t
(the first term is equal to zero in the first iteration, just like the line of code above)
So I want to know if I'm initializing the E matrix the wrong way or I'm doing the square root inappropriately. I tried to use the pow()
function but it doesn't work. If anyone could help me with this I would be very grateful, I'm trying this for weeks.
Additional details requested by andersource:
Here is the entire source code on github: https://github.com/pedrovbeltran/neural-networks-and-deep-learning/blob/experimental/modified-networks/network2_with_adadelta.py .
...ANSWER
Answered 2019-Jan-09 at 13:13I think the problem is that self.e_grad_w
is an ndarray
of shape (2,)
which further contains two additional ndarray
s with 2d shapes, instead of directly containing data. This seems to be initialized in e_grad_initializer
, in which nabla_w
has the same structure. I didn't track where this comes from all the way back, but I believe once you fix this issue the problem will be resolved.
QUESTION
I have a list of size 50000. say a
. Each element is a tuple, say b=a[0]
. Each tuple consists of 2 lists, say c=b[0], d=b[1]
. 1st list i.e. c
is of length 784 and the second i.e. d
is of length 10.
From this list, I need to extract the following:
Group first 10 elements of list a
. From these 10 tuples, extract their first element (c
) and put them in a matrix of size 784x10
. Also extract the second elements of the tuples and put them in another matrix of size 10x10
. Repeat this for every batch of 10 elements in list a
.
Is this possible to do in a single line using list comprehension? Or do I have to write multiple for loops? Which method is efficient and best?
Note: It is okay if I get as a list or numpy.ndarray matrix.
Additional Information: I'm following this tutorial on neural networks which aims to design a neural network to recognize handwritten digits. MNIST database is used to train the network. The training data is in the above described format. I need to create a matrix of input_images and expected_output for every mini_batch.
Here is the code I've tried. I'm getting a list of size 50000. Its not getting split into mini_batches
...ANSWER
Answered 2018-Dec-30 at 10:39I wrote my answer before you added the source and therefore it is purely based upon the first part where you write it out in words. It is therefore not very fail-safe in terms of changes in input size. If you read further in the book Anders Nielsen actually provides an implementation of his own.
My main answer is not a single line answer, as that would obfuscate what it does and I would advise you very much to write complex processes like this out so you have a better understanding of what actually happens. In my code I make a firstMatrix, which contains the c-elements in a matrix, and a secondMatrix, which contains the d-elements. I do this for every batch of 10, but didn't know what you want to do with the matrices afterwards so I just make them for every batch. If you want to group them or something, please say so and I will try to implement it.
QUESTION
I am trying to learn Keras. I see machine learning code for recognizing handwritten digits here (also given here). It seems to have feedforward, SGD and backpropagation methods written from a scratch. I just want to know if it is possible to write this program using Keras? A starting step in that direction will be appreciated.
...ANSWER
Answered 2018-Sep-03 at 02:33You can use this to understand how the MNIST dataset works for MLP first.Keras MNIST tutorial. As you proceed, you can look into how CNN works on the MNIST dataset.
I will describe a bit of the process of the keras code that you have attached to your comment
QUESTION
I am using the practising code of mnist data for deep learning in Python 3.4
The original code is
...ANSWER
Answered 2017-Dec-05 at 11:21First of all, I was able to reproduce the problem using the mnist.pkl.gz
data file extracted from the https://github.com/mnielsen/neural-networks-and-deep-learning/archive/master.zip archive I downloaded. The following exception is raised from the pickle.load(f)
call:
UnicodeDecodeError: 'ascii' codec can't decode byte 0x90 in position 614: ordinal not in range(128)
However the error went away when I added the encoding='bytes'
argument to the pickle.load()
call as I suggested in a comment under your question.
Another change was to replace the import _pickle as cPickle
with just import pickle
, however I don't think that's significant (see What difference between pickle and _pickle in python 3?).
Other differences that might be significant, however, are the fact that I'm using Python 3.6.3 on Windows.
QUESTION
I'm following along mnielsen's online book. I'm trying to implement momentum weight update as defined here to his code here. The overall idea is that for momentum weight update, you don't directly change weight vector with negative gradient. You have a parameter velocity
which you set to zero to begin with and then you set hyperparameter mu
to typically 0.9
.
ANSWER
Answered 2017-Nov-28 at 04:22As discussed in the comments, something is not a numpy array here. The error given above
QUESTION
I have mnist training list in the following form:
...ANSWER
Answered 2017-Nov-23 at 22:38I do this in a bit of a different way. Recall that you must scale your training and testing sets by the same function which is built from all your training data. Also, you only want to manipulate your features. I would start by converting to a train and test dataframe and a list of features.
QUESTION
I followed a tutorial for creating a simple neural network using signoidal function "Using neural nets to recognize handwritten digits", the tutorial is very simple with theory and code examples.
The problem is that it does not give any examples of digits recognition using network.py.
For example, I have the following number and I want to recognize it as 0 from the image below What should I do next for number recognition?
To make recognition of number requires the use of other technologies such as theano or tensorflow? Good day!
...ANSWER
Answered 2017-Nov-13 at 19:10Building upon the example below, you can add a function for prediction in the NeuralNetwork
class:
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install neural-networks-and-deep-learning
You can use neural-networks-and-deep-learning like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page