keras-blog | Blog with Keras news , tutorials , and demos

by fchollet HTML Version: Current License: No License

X-Ray Key Features Code Snippets Community Discussions(1)Vulnerabilities Install Support

kandi X-RAY | keras-blog Summary

keras-blog is a HTML library. keras-blog has no bugs, it has no vulnerabilities and it has low support. You can download it from GitHub.

Generated with Pelican, hosted on Github pages. To contribute an article: send a Pull Request to the content branch. We welcome anything that would be of interest to Keras users, from a simple tutorial about a specific use case, to an advanced application demo, and everything in between.

Support

Quality

Security

License

Reuse

Support

keras-blog has a low active ecosystem.

It has 125 star(s) with 57 fork(s). There are 18 watchers for this library.

It had no major release in the last 6 months.

There are 3 open issues and 3 have been closed. On average issues are closed in 17 days. There are 7 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of keras-blog is current.

Quality

keras-blog has 0 bugs and 0 code smells.

Security

keras-blog has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

keras-blog code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

keras-blog does not have a standard license declared.

Check the repository for any license declaration and review the terms closely.

Without a license, all rights are reserved, and you cannot use the library in your applications.

Reuse

keras-blog releases are not available. You will need to build from source code and install.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of keras-blog

Get all kandi verified functions for this library.

keras-blog Key Features

No Key Features are available at this moment for keras-blog.

keras-blog Examples and Code Snippets

No Code Snippets are available at this moment for keras-blog.

Community Discussions

Trending Discussions on keras-blog

sagemaker horizontally scaling tensorflow (keras) model

QUESTION

sagemaker horizontally scaling tensorflow (keras) model

Asked 2019-Dec-23 at 11:44

I am roughly following this script fashion-MNIST-sagemaker.

I see that in the notebook

...

ANSWER

Answered 2019-Dec-23 at 11:44

distributed training is model and framework specific. Not all models are easy to distribute, and from ML framework to ML framework things are not equally easy. It is rarely automatic, even less so with TensorFlow and Keras.

Neural nets are conceptually easy to distribute under the data-parallel paradigm, whereby the gradient computation of a given mini-batch is split among workers, which could be multiple devices in the same host (multi-device) or multiple hosts with each multiple devices (multi-device multi-host). The D2L.ai course provides an in-depth view of how neural nets are distributed here and here.

Keras used to be trivial to distribute in multi-device, single host fashion with the multi_gpu_model, which will sadly get deprecated in 4 months. In your case, you seem to refer to multi-host model (more than one machine), and that requires writing ad-hoc synchronization code such as the one seen in this official tutorial.

Now let's look at how does this relate to SageMaker.

SageMaker comes with 3 options for algorithm development. Using distributed training may require a varying amount of custom work depending on the option you choose:

The built-in algorithms is a library of 18 pre-written algorithms. Many of them are written to be distributed in single-host multi-GPU or multi-GPU multi-host. With that first option, you don't have anything to do apart from setting train_instance_count > 1 to distribute over multiple instances
The Framework containers (the option you are using) are containers developed for popular frameworks (TensorFlow, PyTorch, Sklearn, MXNet) and provide pre-written docker environment in which you can write arbitrary code. In this options, some container will support one-click creation of ephemeral training clusters to do distributed training, however using train_instance_count greater than one is not enough to distribute the training of your model. It will just run your script on multiple machines. In order to distribute your training, you must write appropriate distribution and synchronization code in your mnist_keras_tf.py script. For some frameworks such code modification will be very simple, for example for TensorFlow and Keras, SageMaker comes with Horovod pre-installed. Horovod is a peer-to-peer ring-style communication mechanism that requires very little code modification and is highly scalable (initial annoucement from Uber, SageMaker doc, SageMaker example, SageMaker blog post). My recommendation would be to try using Horovod to distribute your code. Similarly, in Apache MXNet you can easily create Parameter Stores to host model parameters in a distributed fashion and sync with them from multiple nodes. MXNet scalability and ease of distribution is one of the reason Amazon loves it.
The Bring-Your-Own Container requires you to write both docker container and algorithm code. In this situation, you can of course distribute your training over multiple machines but you also have to write machine-to-machine communication code

For your specific situation my recommendation would be to scale horizontally first in a single node with multiple GPUs over bigger and bigger machine types, because latency and complexity increase drastically as you switch from single-host to multi-host context. If truly necessary, use multi-node context and things maybe easier if that's done with Horovod. In any case, things are still much easier to do with SageMaker since it manages creation of ephemeral, billed-per-second clusters with built-in, logging and metadata and artifacts persistence and also handles fast training data loading from s3, sharded over training nodes.

Note on the relevancy of distributed training: Keep in mind that when you distribute over N devices a model that was running fine on one device, you usually grow the batch size by N so that the per-device batch size stays constant and each device keeps being busy. This will disturb your model convergence, because bigger batches means a less noisy SGD. A common heuristic is to grow the learning rate by N (more info in this great paper from Priya Goyal et al), but this on the other hand induces instability at the first couple epochs, so it is sometimes associated with a learning rate warmup. Scaling SGD to work well with very large batches is still an active research problem, with new ideas coming up frequently. Reaching good model performance with very large batches sometimes require ad-hoc research and a fair amount of parameter tuning, occasionally to the extent where the extra money spent on finding how to distribute well overcome the benefits of the faster training you eventually manage to run. A situation where distributed training makes sense is when an individual record represent too much compute to form a big enough physical batch on a device, a situation seen on big input sizes (eg vision over HD pictures) or big parameter counts (eg BERT). That being said, for those models requiring very big logical batch you don't necessarily have to distribute things physically: you can run sequentially N batches through your single GPU and wait N per-device batches before doing the gradient averaging and parameter update to simulate having an N times bigger GPU. (a clever hack sometimes called gradient accumulation)

Source https://stackoverflow.com/questions/59446807

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install keras-blog

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: