christopher5106.github.io
kandi X-RAY | christopher5106.github.io Summary
kandi X-RAY | christopher5106.github.io Summary
christopher5106.github.io
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of christopher5106.github.io
christopher5106.github.io Key Features
christopher5106.github.io Examples and Code Snippets
Community Discussions
Trending Discussions on christopher5106.github.io
QUESTION
I was trying to understand how matrix multiplication works over 2 dimensions in DL frameworks and I stumbled upon an article here. He used Keras to explain the same and it works for him. But when I try to reproduce the same code in Pytorch, it fails with the error as in the output of the following code
Pytorch Code:
...ANSWER
Answered 2021-Jan-10 at 07:10Matrix multiplication (aka matrix dot product) is a well defined algebraic operation taking two 2D matrices.
Deep-learning frameworks (e.g., tensorflow, keras, pytorch) are tuned to operate of batches of matrices, hence they usually implement batched matrix multiplication, that is, applying matrix dot product to a batch of 2D matrices.
The examples you linked to show how matmul
processes a batch of matrices:
QUESTION
I am working on implementing YOLO v2 and 3 for object detection on a custom dataset. While YOLO v2 and 3 use something like 5 or so anchor boxes, I generally have maybe 50-100 detections each image. My sense is that if there are only 5 anchor boxes, then there are at most 5 detections per image right? So I was trying to understand if I needed to adjust the number of anchor boxes to my dataset.
My questions is, does the number of anchor boxes need to be larger than the maximum count of bounding boxes in any training image? That way, I would never run into detections where there is no corresponding anchor box. Is that the right way of thinking about adapting YOLO?
If my intuition is correct then would I need to do k-means to cluster the bounding boxes in the ground truth images and set the anchor box coordinates. Then I would use the usual regression method as specified in this blog post.
Thanks for any help that anyone can provide.
...ANSWER
Answered 2018-Jun-05 at 21:58My sense is that if there are only 5 anchor boxes, then there are at most 5 detections per image right?
There are five anchor boxes for each prediction cell and not for the whole image. Lets consider Yolo v2, where the input image is of size 416x416x3
and outputs is 13x13xN
. Each of the 13x13 corresponds to a 32x32 cell region in the input image (as shown in the image below from the blog post) and for each of the 13x13 cells there are 5 anchors defined. So you can technically have 13x13x5 bounding boxes for an image of size 416x416 (You can train with larger images as well, as yolo v2 is a fully convolutional network and then you get more cell regions).
Lets say you have 50 bounding boxes in your image, each of the bounding box should be assigned to a cell based on how close the center of the bounding box is to the cell center. Now for this cell pick one of the 5 anchor boxes which gives the best IOU. For each cell construct a label that should contain confidence scores and box position and dimension of all 5 anchor boxes (except for the anchor box selected, others will be marked zero) along with the class scores.
On the k-means clustering mentioned in the link, it describes how they arrived at the five anchor boxes. Its better you just stick with the 5 bounding boxes, unless you have any specific reasons to include more or have different shapes if any specific requirement arises.
QUESTION
I am trying to learn deep learning, I have stumbled on one exercise here
It is first warm-up exercise. I am stuck. For constant sequence of small lengths(2,3) it solves it no problem. However when I try whole sequence of 50. it stops at 50% accuracy, which is basically random guess.
According to here it is too big flat space ant cant find gradient to solve it. So i tried approach of continuously increasing length ans saving model each time (2,5,10,15,20,30,40,50).It seems it does not generalise well, as if i type bigger sequence then what I learned it on, it fails.
According to here it should be easy problem. I cant figure it out. There is used some different LSTM architecture hoverer.
And one solution here to exactly same problem says it works with Adagrad optimizer and learning rate of 0.5.
I am unsure about one bit at time, if I am feeding it right in first place. I hope I got it right.
And for variable length, i tried and failed miserably.
Code:
...ANSWER
Answered 2018-Feb-07 at 20:57Well, this might be a really valuable exercise about LSTM
and vanishing gradient. So let's dive into it. I'd start from changing task a little bit. Let's change our dataset to:
QUESTION
Following this question and this tutorial I've create a simple net just like the tutorial but with 100X100 images and first convolution kernel of 11X11 and pad=0.
I understand that the formula is : (W−F+2P)/S+1 and in my case dimension became [51X51X3] (3 is channel of rgb) but the number 96 popup in my net diagram and as this tutorial said it is third dimension of the output, in other hand , my net after first conv became [51X51X96]. I couldn't figure out , how the number 96 calculated and why.
Isn't the network convolution layer suppose to pass throw three color channel and the output should be three feature map? How come its dimension grow like this? Isn't it true that we have one kernel for each channel ? How this one kernel create 96(or in the first tutorial, 256 or 384) feature map ?
...ANSWER
Answered 2017-Jun-11 at 07:57You are mixing input channels and output channels.
Your input image has three channels: R, G and B. Each filter in your conv layer acts on these three channels and its spatial kernel size (e.g., 3-by-3). Each filter outputs a single number per spatial location. So, if you have one filter in your layer then your output would have only one output channel(!)
Normally, you would like to compute more than a single filter at each layer, this is what num_output
parameter is used for in convolution_param
: It allows you to define how many filters will be trained in a specific convolutional layer.
Thus a Conv layer
QUESTION
I am trying to evaluate the training function of the Watson visual Recognition API. Has anyone some experience with costumizing classifers for Visual Recognition? I have some expierence myself with training the classifier and found some infomation in this blog: http://christopher5106.github.io/computer/vision/2016/12/23/ibm-watson-bluemix-visual-api-to-create-custom-classifier.html
What I really would like to know is how much pictures do I need of an object to classify it with an accuracy of 75%? How long does it take to get such a result?
Thank you in advance for your help.
...ANSWER
Answered 2017-May-30 at 06:29The number of pictures you need depends on how unique the object is, how many distinct image features a picture with it has, etc.
To give you a few examples from my own experience:
Logo detection: one image of the logo can be used to create several samples by adding noise, changing contrast, making small distortions and rotations, etc. If the logo is detailed and has good contrast, you should easily get 75%.
Cat detection using Haar wavelets: 100 images with data augmentation can yield around 75%
Human ear detection: about 300 images could get me to around 80%. This detector is being used in an iPhone app for virtual-trying eyeglasses.
You can also try this out yourself using Kaggle's Dogs Vs. Cats data. Just try various classifiers on them with different amounts of data, and you will get a very good idea.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install christopher5106.github.io
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page