stride | Stride Game Engine | Game Engine library
kandi X-RAY | stride Summary
kandi X-RAY | stride Summary
[Join the chat at
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of stride
stride Key Features
stride Examples and Code Snippets
def _compute_mesh_strides(mesh_dims: List[MeshDimension]) -> List[int]:
strides = [1]
for idx, dim in enumerate(reversed(mesh_dims[1:])):
strides.append(strides[idx] * dim.size)
strides.reverse()
return strides
Community Discussions
Trending Discussions on stride
QUESTION
I have an array (of any rank), and I would like to have an index operator that:
Allows for missing indices, such that the following is equivalent
...
ANSWER
Answered 2022-Mar-11 at 11:03In earlier version of this answer I didn't provided full implementation since something was not adding up for me.
If this index
should calculate index for flattened multidimensional array then your example implementation is invalid. Problem is hidden since you are comparing two results for index
with all indexes provided and shorten version where zero padding is assumed.
Sadly I flowed this pattern in first versions of test in Catch2.
Here is proper test for index of flattened multidimensional array, where last index matches flattened index:
QUESTION
I have implemented a Convolutional Neural Network in C and have been studying what parts of it have the longest latency.
Based on my research, the massive amounts of matricial multiplication required by CNNs makes running them on CPUs and even GPUs very inefficient. However, when I actually profiled my code (on an unoptimized build) I found out that something other than the multiplication itself was the bottleneck of the implementation.
After turning on optimization (-O3 -march=native -ffast-math
, gcc cross compiler), the Gprof result was the following:
Clearly, the convolution2D
function takes the largest amount of time to run, followed by the batch normalization and depthwise convolution functions.
The convolution function in question looks like this:
...ANSWER
Answered 2022-Mar-10 at 13:57Looking at the result of Cachegrind, it doesn't look like the memory is your bottleneck. The NN has to be stored in memory anyway, but if it's too large that your program's having a lot of L1 cache misses, then it's worth thinking to try to minimize L1 misses, but 1.7% of L1 (data) miss rate is not a problem.
So you're trying to make this run fast anyway. Looking at your code, what's happening at the most inner loop is very simple (load-> multiply -> add -> store), and it doesn't have any side effect other than the final store. This kind of code is easily parallelizable, for example, by multithreading or vectorizing. I think you'll know how to make this run in multiple threads seeing that you can write code with some complexity, and you asked in comments how to manually vectorize the code.
I will explain that part, but one thing to bear in mind is that once you choose to manually vectorize the code, it will often be tied to certain CPU architectures. Let's not consider non-AMD64 compatible CPUs like ARM. Still, you have the option of MMX, SSE, AVX, and AVX512 to choose as an extension for vectorized computation, and each extension has multiple versions. If you want maximum portability, SSE2 is a reasonable choice. SSE2 appeared with Pentium 4, and it supports 128-bit vectors. For this post I'll use AVX2, which supports 128-bit and 256-bit vectors. It runs fine on your CPU, and has reasonable portability these days, supported from Haswell (2013) and Excavator (2015).
The pattern you're using in the inner loop is called FMA (fused multiply and add). AVX2 has an instruction for this. Have a look at this function and the compiled output.
QUESTION
I am training a Unet segmentation model for binary class. The dataset is loaded in tensorflow data pipeline. The images are in (512, 512, 3) shape, masks are in (512, 512, 1) shape. The model expects the input in (512, 512, 3) shape. But I am getting the following error. Input 0 of layer "model" is incompatible with the layer: expected shape=(None, 512, 512, 3), found shape=(512, 512, 3)
Here are the images in metadata dataframe.
Randomly sampling the indices to select the training and validation set
...ANSWER
Answered 2022-Mar-08 at 13:38Use train_batches
in model.fit
and not train_images
. Also, you do not need to use repeat()
, which causes an infinite dataset if you do not specify how many times you want to repeat your dataset. Regarding your labels error, try rewriting your model like this:
QUESTION
I am currently trying to figure a way to return a mutidimensional array (of doubles) from a shared library in C to python and make it an np.array. My current approach looks like this:
shared library ("utils.c")
...ANSWER
Answered 2022-Feb-21 at 10:17First of all, a scoped C array allocated on the stack (like in somefunction
) must never be returned by a function. The space of the stack will be reused by other function like the one of CPython for example. The returned array must be allocated on the heap instead.
Moreover, writing a function working with Numpy arrays using ctypes is pretty cumbersome. As you found out, you need to pass the full shape in parameter. But the thing is you also need to pass the strides for each dimension and for each input arrays in parameter of the function since they may not be contiguous in memory (for example np.transpose
change this). That being said, we can assume that the input array is contiguous for sake of performance and sanity. This can be enforced with np.ascontiguousarray
. The pointer of the views a
and b
can be extracted using numpy.ctypeslib.as_ctypes
, but hopefully ctype can do that automatically. Furthermore, the returned array is currently a C pointer and not a Numpy array. Thus, you need to create a Numpy array with the right shape and strides from it numpy.ctypeslib.as_array
. Because the resulting shape is not known from the caller, you need to retrieve it from the callee function using several integer pointers (one per dimension). In the end, this results in a pretty-big ugly highly-bug-prone code (which will often silently crash if anything goes wrong not to mention the possible memory leaks if you do not pay attention to that). You can use Cython to do most of this work for you.
Assuming you do not want to use Cython or you cannot, here is an example code with ctypes:
QUESTION
I am getting an error when trying to save a model with data augmentation layers with Tensorflow version 2.7.0.
Here is the code of data augmentation:
...ANSWER
Answered 2022-Feb-04 at 17:25This seems to be a bug in Tensorflow 2.7 when using model.save
combined with the parameter save_format="tf"
, which is set by default. The layers RandomFlip
, RandomRotation
, RandomZoom
, and RandomContrast
are causing the problems, since they are not serializable. Interestingly, the Rescaling
layer can be saved without any problems. A workaround would be to simply save your model with the older Keras H5 format model.save("test", save_format='h5')
:
QUESTION
This code throws a compiler error: "Argument type 'StrideTo' expected to be an instance of a class or class-constrained type"
...ANSWER
Answered 2022-Jan-21 at 20:20That is a bug: SR-13847 Wrong generic used in extensions:
For some reason when calling initializer in extension, compiler tries to match unrelated generics.
In your case, Array
is interpreted as Array
. As a workaround, you can specify the type of the index array explicitly:
QUESTION
I trained a Keras model with the following architecture:
...ANSWER
Answered 2022-Jan-11 at 21:51The number of parameters is at most and indication how fast a model trains or runs inference. It might depend on many other factors.
Here some examples, which might influence the throughput of your model:
- The activation function: ReLu activations are faster then e.g. ELU or GELU which have exponetial terms. Not only is computing an exponention number slower than a linear number, but also the gradient is much more complex to compute since in Case of Relu is constant number, the slope of the activation (e.g.1).
- the bit precission used for your data. Some HW accelerators can make faster computations in float16 than in float32 and also reading less bits decreses latency.
- Some layers might not have parameters but perform fixed calculations. Eventhough no parameter is added to the network's weight, a computation still is performed.
- The archetecture of your training HW. Certain filter sizes and batch sizes can be computed more efficiently than others.
- sometimes the speed of the computing HW is not the bottleneck, the input pipeline for loading and preprocessing your data
It's hard to tell without testing but in your particular example I would guess, that the following might slow down your inference:
- large perceptive field with a 7x7 conv
- leaky_relu is slightly slower than relu
- Probably your data input pipeline is the bottleneck, not the inference speed. If the inference speed is much faster than the data preparation, it might appear that both models have the same speed. But in reality the HW is idle and waits for data.
To understand whats going on, you could either change some parameters and evaluate the speed, or you could analyze your input pipeline by tracing your hardware using tensorboard. Here is a smal guide: https://www.tensorflow.org/tensorboard/tensorboard_profiling_keras
Best, Sascha
QUESTION
I am currently working on an multi label fashion item dataset which is highly imbalanced I tried using class_weights to tackle it, but still the accuracy is stuck at 0.7556 every epoch. Is there any way, I can avoid this problem. Did I implement the class weights in a wrong way? I tried using data augmentation too.
I have like 224 unique classes in train set. And some of them have only one example which is very frustrating
Tried to solve the problem with the help of this notebook as well, but I am unable to get the same accuracy score. Looks like, in this notebook the possibility of imbalance in the dataset is not considered.
...ANSWER
Answered 2022-Jan-09 at 21:17First of all, metrics such as Precision and Recall are focused on the positive class only, avoiding the problems encountered by multi-class focus metrics in the case of the class imbalance. Thus, we may not obtain enough information about the performance of the negative class if we keep considering all indicators. Haibo He et al suggest the metrics below to rate both items:
- Geometric Mean.
- F-Measure.
- Macro-Averaged Accuracy.
- Newer Combinations of Threshold Metrics: Mean-Class-Weighted Accuracy, Optimized Precision, Adjusted Geometric Mean, Index of Balanced Accuracy.
My suggestions:
- Use the PR-curve and the F1-score.
- Try with geometric transformations, photometric transformations, random occlusions (to avoid overfitting), SMOTE, Tomek links (for undersampling majorities), etc.
- Random undersampling may delete relevant features of your dataset. Again, analyze your dataset using KNN and other similar techniques.
- Check this book: H. He and Y. Ma, Imbalanced Learning: Foundations, Algorithms, and Applications, Hoboken, New Jersey: Wiley-IEEE Press, 2013.
QUESTION
I have created a working CNN model in Keras/Tensorflow, and have successfully used the CIFAR-10 & MNIST datasets to test this model. The functioning code as seen below:
...ANSWER
Answered 2021-Dec-16 at 10:18If the hyperspectral dataset is given to you as a large image with many channels, I suppose that the classification of each pixel should depend on the pixels around it (otherwise I would not format the data as an image, i.e. without grid structure). Given this assumption, breaking up the input picture into 1x1 parts is not a good idea as you are loosing the grid structure.
I further suppose that the order of the channels is arbitrary, which implies that convolution over the channels is probably not meaningful (which you however did not plan to do anyways).
Instead of reformatting the data the way you did, you may want to create a model that takes an image as input and also outputs an "image" containing the classifications for each pixel. I.e. if you have 10 classes and take a (145, 145, 200) image as input, your model would output a (145, 145, 10) image. In that architecture you would not have any fully-connected layers. Your output layer would also be a convolutional layer.
That however means that you will not be able to keep your current architecture. That is because the tasks for MNIST/CIFAR10 and your hyperspectral dataset are not the same. For MNIST/CIFAR10 you want to classify an image in it's entirety, while for the other dataset you want to assign a class to each pixel (while most likely also using the pixels around each pixel).
Some further ideas:
- If you want to turn the pixel classification task on the hyperspectral dataset into a classification task for an entire image, maybe you can reformulate that task as "classifying a hyperspectral image as the class of it's center (or top-left, or bottom-right, or (21th, 104th), or whatever) pixel". To obtain the data from your single hyperspectral image, for each pixel, I would shift the image such that the target pixel is at the desired location (e.g. the center). All pixels that "fall off" the border could be inserted at the other side of the image.
- If you want to stick with a pixel classification task but need more data, maybe split up the single hyperspectral image you have into many smaller images (e.g. 10x10x200). You may even want to use images of many different sizes. If you model only has convolution and pooling layers and you make sure to maintain the sizes of the image, that should work out.
QUESTION
arr = np.arange(16).reshape((2, 2, 4))
arr.strides
(32, 16, 4)
...ANSWER
Answered 2021-Dec-15 at 20:52In my experience working through the C code is too much work. Simply finding the relevant function(s) is the hardest part. strides
work the same regardless of dimensions or 'transpose'.
Start with something simpler, like a (2,3) array, whose transposed strides will be (8,24). Imagine stepping through the flat [0,1,2...].
Sample array, with 1 byte size so the sequential stride will be just 1
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
Install stride
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page