kandi X-RAY | stride Summary
kandi X-RAY | stride Summary
[Join the chat at
Top functions reviewed by kandi - BETA
stride Key Features
stride Examples and Code Snippets
def _compute_mesh_strides(mesh_dims: List[MeshDimension]) -> List[int]: strides =  for idx, dim in enumerate(reversed(mesh_dims[1:])): strides.append(strides[idx] * dim.size) strides.reverse() return strides
Trending Discussions on stride
I have an array (of any rank), and I would like to have an index operator that:
Allows for missing indices, such that the following is equivalent...
ANSWERAnswered 2022-Mar-11 at 11:03
In earlier version of this answer I didn't provided full implementation since something was not adding up for me.
index should calculate index for flattened multidimensional array then your example implementation is invalid. Problem is hidden since you are comparing two results for
index with all indexes provided and shorten version where zero padding is assumed.
Sadly I flowed this pattern in first versions of test in Catch2.
Here is proper test for index of flattened multidimensional array, where last index matches flattened index:
I have implemented a Convolutional Neural Network in C and have been studying what parts of it have the longest latency.
Based on my research, the massive amounts of matricial multiplication required by CNNs makes running them on CPUs and even GPUs very inefficient. However, when I actually profiled my code (on an unoptimized build) I found out that something other than the multiplication itself was the bottleneck of the implementation.
convolution2D function takes the largest amount of time to run, followed by the batch normalization and depthwise convolution functions.
The convolution function in question looks like this:...
ANSWERAnswered 2022-Mar-10 at 13:57
Looking at the result of Cachegrind, it doesn't look like the memory is your bottleneck. The NN has to be stored in memory anyway, but if it's too large that your program's having a lot of L1 cache misses, then it's worth thinking to try to minimize L1 misses, but 1.7% of L1 (data) miss rate is not a problem.
So you're trying to make this run fast anyway. Looking at your code, what's happening at the most inner loop is very simple (load-> multiply -> add -> store), and it doesn't have any side effect other than the final store. This kind of code is easily parallelizable, for example, by multithreading or vectorizing. I think you'll know how to make this run in multiple threads seeing that you can write code with some complexity, and you asked in comments how to manually vectorize the code.
I will explain that part, but one thing to bear in mind is that once you choose to manually vectorize the code, it will often be tied to certain CPU architectures. Let's not consider non-AMD64 compatible CPUs like ARM. Still, you have the option of MMX, SSE, AVX, and AVX512 to choose as an extension for vectorized computation, and each extension has multiple versions. If you want maximum portability, SSE2 is a reasonable choice. SSE2 appeared with Pentium 4, and it supports 128-bit vectors. For this post I'll use AVX2, which supports 128-bit and 256-bit vectors. It runs fine on your CPU, and has reasonable portability these days, supported from Haswell (2013) and Excavator (2015).
The pattern you're using in the inner loop is called FMA (fused multiply and add). AVX2 has an instruction for this. Have a look at this function and the compiled output.
I am training a Unet segmentation model for binary class. The dataset is loaded in tensorflow data pipeline. The images are in (512, 512, 3) shape, masks are in (512, 512, 1) shape. The model expects the input in (512, 512, 3) shape. But I am getting the following error. Input 0 of layer "model" is incompatible with the layer: expected shape=(None, 512, 512, 3), found shape=(512, 512, 3)
Here are the images in metadata dataframe.
Randomly sampling the indices to select the training and validation set...
ANSWERAnswered 2022-Mar-08 at 13:38
model.fit and not
train_images. Also, you do not need to use
repeat(), which causes an infinite dataset if you do not specify how many times you want to repeat your dataset. Regarding your labels error, try rewriting your model like this:
I am currently trying to figure a way to return a mutidimensional array (of doubles) from a shared library in C to python and make it an np.array. My current approach looks like this:
shared library ("utils.c")...
ANSWERAnswered 2022-Feb-21 at 10:17
First of all, a scoped C array allocated on the stack (like in
somefunction) must never be returned by a function. The space of the stack will be reused by other function like the one of CPython for example. The returned array must be allocated on the heap instead.
Moreover, writing a function working with Numpy arrays using ctypes is pretty cumbersome. As you found out, you need to pass the full shape in parameter. But the thing is you also need to pass the strides for each dimension and for each input arrays in parameter of the function since they may not be contiguous in memory (for example
np.transpose change this). That being said, we can assume that the input array is contiguous for sake of performance and sanity. This can be enforced with
np.ascontiguousarray. The pointer of the views
b can be extracted using
numpy.ctypeslib.as_ctypes, but hopefully ctype can do that automatically. Furthermore, the returned array is currently a C pointer and not a Numpy array. Thus, you need to create a Numpy array with the right shape and strides from it
numpy.ctypeslib.as_array. Because the resulting shape is not known from the caller, you need to retrieve it from the callee function using several integer pointers (one per dimension). In the end, this results in a pretty-big ugly highly-bug-prone code (which will often silently crash if anything goes wrong not to mention the possible memory leaks if you do not pay attention to that). You can use Cython to do most of this work for you.
Assuming you do not want to use Cython or you cannot, here is an example code with ctypes:
I am getting an error when trying to save a model with data augmentation layers with Tensorflow version 2.7.0.
Here is the code of data augmentation:...
ANSWERAnswered 2022-Feb-04 at 17:25
This seems to be a bug in Tensorflow 2.7 when using
model.save combined with the parameter
save_format="tf", which is set by default. The layers
RandomContrast are causing the problems, since they are not serializable. Interestingly, the
Rescaling layer can be saved without any problems. A workaround would be to simply save your model with the older Keras H5 format
This code throws a compiler error: "Argument type 'StrideTo' expected to be an instance of a class or class-constrained type"...
ANSWERAnswered 2022-Jan-21 at 20:20
That is a bug: SR-13847 Wrong generic used in extensions:
For some reason when calling initializer in extension, compiler tries to match unrelated generics.
In your case,
Array is interpreted as
Array. As a workaround, you can specify the type of the index array explicitly:
I trained a Keras model with the following architecture:...
ANSWERAnswered 2022-Jan-11 at 21:51
The number of parameters is at most and indication how fast a model trains or runs inference. It might depend on many other factors.
Here some examples, which might influence the throughput of your model:
- The activation function: ReLu activations are faster then e.g. ELU or GELU which have exponetial terms. Not only is computing an exponention number slower than a linear number, but also the gradient is much more complex to compute since in Case of Relu is constant number, the slope of the activation (e.g.1).
- the bit precission used for your data. Some HW accelerators can make faster computations in float16 than in float32 and also reading less bits decreses latency.
- Some layers might not have parameters but perform fixed calculations. Eventhough no parameter is added to the network's weight, a computation still is performed.
- The archetecture of your training HW. Certain filter sizes and batch sizes can be computed more efficiently than others.
- sometimes the speed of the computing HW is not the bottleneck, the input pipeline for loading and preprocessing your data
It's hard to tell without testing but in your particular example I would guess, that the following might slow down your inference:
- large perceptive field with a 7x7 conv
- leaky_relu is slightly slower than relu
- Probably your data input pipeline is the bottleneck, not the inference speed. If the inference speed is much faster than the data preparation, it might appear that both models have the same speed. But in reality the HW is idle and waits for data.
To understand whats going on, you could either change some parameters and evaluate the speed, or you could analyze your input pipeline by tracing your hardware using tensorboard. Here is a smal guide: https://www.tensorflow.org/tensorboard/tensorboard_profiling_keras
I am currently working on an multi label fashion item dataset which is highly imbalanced I tried using class_weights to tackle it, but still the accuracy is stuck at 0.7556 every epoch. Is there any way, I can avoid this problem. Did I implement the class weights in a wrong way? I tried using data augmentation too.
I have like 224 unique classes in train set. And some of them have only one example which is very frustrating
Tried to solve the problem with the help of this notebook as well, but I am unable to get the same accuracy score. Looks like, in this notebook the possibility of imbalance in the dataset is not considered....
ANSWERAnswered 2022-Jan-09 at 21:17
First of all, metrics such as Precision and Recall are focused on the positive class only, avoiding the problems encountered by multi-class focus metrics in the case of the class imbalance. Thus, we may not obtain enough information about the performance of the negative class if we keep considering all indicators. Haibo He et al suggest the metrics below to rate both items:
- Geometric Mean.
- Macro-Averaged Accuracy.
- Newer Combinations of Threshold Metrics: Mean-Class-Weighted Accuracy, Optimized Precision, Adjusted Geometric Mean, Index of Balanced Accuracy.
- Use the PR-curve and the F1-score.
- Try with geometric transformations, photometric transformations, random occlusions (to avoid overfitting), SMOTE, Tomek links (for undersampling majorities), etc.
- Random undersampling may delete relevant features of your dataset. Again, analyze your dataset using KNN and other similar techniques.
- Check this book: H. He and Y. Ma, Imbalanced Learning: Foundations, Algorithms, and Applications, Hoboken, New Jersey: Wiley-IEEE Press, 2013.
I have created a working CNN model in Keras/Tensorflow, and have successfully used the CIFAR-10 & MNIST datasets to test this model. The functioning code as seen below:...
ANSWERAnswered 2021-Dec-16 at 10:18
If the hyperspectral dataset is given to you as a large image with many channels, I suppose that the classification of each pixel should depend on the pixels around it (otherwise I would not format the data as an image, i.e. without grid structure). Given this assumption, breaking up the input picture into 1x1 parts is not a good idea as you are loosing the grid structure.
I further suppose that the order of the channels is arbitrary, which implies that convolution over the channels is probably not meaningful (which you however did not plan to do anyways).
Instead of reformatting the data the way you did, you may want to create a model that takes an image as input and also outputs an "image" containing the classifications for each pixel. I.e. if you have 10 classes and take a (145, 145, 200) image as input, your model would output a (145, 145, 10) image. In that architecture you would not have any fully-connected layers. Your output layer would also be a convolutional layer.
That however means that you will not be able to keep your current architecture. That is because the tasks for MNIST/CIFAR10 and your hyperspectral dataset are not the same. For MNIST/CIFAR10 you want to classify an image in it's entirety, while for the other dataset you want to assign a class to each pixel (while most likely also using the pixels around each pixel).
Some further ideas:
- If you want to turn the pixel classification task on the hyperspectral dataset into a classification task for an entire image, maybe you can reformulate that task as "classifying a hyperspectral image as the class of it's center (or top-left, or bottom-right, or (21th, 104th), or whatever) pixel". To obtain the data from your single hyperspectral image, for each pixel, I would shift the image such that the target pixel is at the desired location (e.g. the center). All pixels that "fall off" the border could be inserted at the other side of the image.
- If you want to stick with a pixel classification task but need more data, maybe split up the single hyperspectral image you have into many smaller images (e.g. 10x10x200). You may even want to use images of many different sizes. If you model only has convolution and pooling layers and you make sure to maintain the sizes of the image, that should work out.
arr = np.arange(16).reshape((2, 2, 4)) arr.strides (32, 16, 4)
ANSWERAnswered 2021-Dec-15 at 20:52
In my experience working through the C code is too much work. Simply finding the relevant function(s) is the hardest part.
strides work the same regardless of dimensions or 'transpose'.
Start with something simpler, like a (2,3) array, whose transposed strides will be (8,24). Imagine stepping through the flat [0,1,2...].
Sample array, with 1 byte size so the sequential stride will be just 1
Reuse Trending Solutions
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page