Build a Realtime Voice-to-Image Generator using Generative AI

share link

by kandikits dot icon Updated: Aug 17, 2023

technology logo
technology logo

1-Click Kit 1-Click Kit  



Generative artificial intelligence (AI) describes algorithms that help in creating/generating new content, including audio, code, images, text and videos. 

 

In this kit, we build a real-time Voice-to-Image Generator using the concept of Generative AI. It is carried out in two steps:

 

  • Voice-to-text conversion - The speech is captured in real-time through the microphone and converted to text using state-of-the-art Opensource AI models from OpenAI and Whisper libraries.

 

  • Text to Image Generation - The converted text is provided as input to the state-of-the-art Image Generation models like Dalle-2, and the image is thus generated.

Deployment Information

This repository helps you build your own AI based voice to image generation with OpenAI API & Gradio


Getting the secret key to use OPENAI API

  1. Get your OpenAI API key
  2. Signup on the OpenAI platform if not done already.
  3. Replace that key in the code in the "voice-to-image-generation-dalle.ipynb" notebook.
  4. Please keep the api key highly private and do not share with anyone.
  5. If you face any access errors with regards to the ChatGPT kit, please visit here to check for the usage limits of your account, check the API keys limit and create another API key if possible. If not, try creating another account, create a new API key and utilize it.


For Windows OS,

  1. Download, extract the zip file and run. Do ensure to extract the zip file before running it.
  2. After successful installation of the kit, press 'Y' to run the kit and execute cells in the notebook.
  3. To run the kit manually, press 'N' and follow the below steps. To run the solution anytime manually after installation, follow the below steps:
  4. Navigate to the 'voice-to-image-generation' folder located in C:\kandikits
  5. Open command prompt inside the extracted directory 'voice-to-image-generation'
  6. Run this command - "voice-to-image-generation-env\Scripts\activate.bat" to activate the virtual environment
  7. Run the command - "cd voice-to-image-generation"
  8. Run the command 'jupyter notebook' which would start a Jupyter notebook instance.
  9. Locate and open the 'voice-to-image-generation-dalle.ipynb' notebook from the Jupyter Notebook browser window.
  10. Execute cells in the notebook.


For Linux distros and macOS,

  1. Follow the instructions to download & install Python3.9 & pip for your respective Linux distros or mac OS.
  2. Install ffmpeg and its libraries. Check installation instructions for Linux and Mac.
  3. Click here to download the repository.
  4. Extract the zip file and navigate to the directory voice-to-image-generation
  5. Open a terminal in the extracted directory 'voice-to-image-generation'
  6. Create and activate virtual environment using this command: 'virtualenv venv & source ./venv/bin/activate'
  7. Install dependencies using the command 'pip3.9 install -r requirements.txt'
  8. Once the dependencies are installed, run the command 'jupyter notebook' to start jupyter notebook (Pls use --allow-root if you're running as root)
  9. Locate and open the 'voice-to-image-generation-dalle.ipynb' notebook from the Jupyter Notebook browser window.
  10. Execute cells in the notebook.


Click the button below to download the solution and follow the deployment information to begin set-up. This 1-click kit has all the required dependencies and resources to build your Voice to Image Generator App.

Libraries used in this solution


Development Environment


VSCode and Jupyter Notebook are used for development and debugging. Jupyter Notebook is a web based interactive environment often used for experiments, whereas VSCode is used to get a typical experience of IDE for developers.


Jupyter Notebook is used for our development.

jupyterby jupyter

Python doticonstar image 14404 doticonVersion:Currentdoticon
License: Permissive (BSD-3-Clause)

Jupyter metapackage for installation, docs and chat

Support
    Quality
      Security
        License
          Reuse

            jupyterby jupyter

            Python doticon star image 14404 doticonVersion:Currentdoticon License: Permissive (BSD-3-Clause)

            Jupyter metapackage for installation, docs and chat
            Support
              Quality
                Security
                  License
                    Reuse

                      vscodeby microsoft

                      TypeScript doticonstar image 147328 doticonVersion:1.79.2doticon
                      License: Permissive (MIT)

                      Visual Studio Code

                      Support
                        Quality
                          Security
                            License
                              Reuse

                                vscodeby microsoft

                                TypeScript doticon star image 147328 doticonVersion:1.79.2doticon License: Permissive (MIT)

                                Visual Studio Code
                                Support
                                  Quality
                                    Security
                                      License
                                        Reuse

                                          Machine Learning


                                          Machine learning libraries and frameworks here are helpful in providing state-of-the-art solutions using Machine learning

                                          openai-pythonby openai

                                          Python doticonstar image 9579 doticonVersion:v0.27.8doticon
                                          License: Permissive (MIT)

                                          The OpenAI Python library provides convenient access to the OpenAI API from applications written in the Python language.

                                          Support
                                            Quality
                                              Security
                                                License
                                                  Reuse

                                                    openai-pythonby openai

                                                    Python doticon star image 9579 doticonVersion:v0.27.8doticon License: Permissive (MIT)

                                                    The OpenAI Python library provides convenient access to the OpenAI API from applications written in the Python language.
                                                    Support
                                                      Quality
                                                        Security
                                                          License
                                                            Reuse

                                                              pytorchby pytorch

                                                              Python doticonstar image 67874 doticonVersion:v2.0.1doticon
                                                              License: Others (Non-SPDX)

                                                              Tensors and Dynamic neural networks in Python with strong GPU acceleration

                                                              Support
                                                                Quality
                                                                  Security
                                                                    License
                                                                      Reuse

                                                                        pytorchby pytorch

                                                                        Python doticon star image 67874 doticonVersion:v2.0.1doticon License: Others (Non-SPDX)

                                                                        Tensors and Dynamic neural networks in Python with strong GPU acceleration
                                                                        Support
                                                                          Quality
                                                                            Security
                                                                              License
                                                                                Reuse

                                                                                  whisperby openai

                                                                                  Python doticonstar image 39256 doticonVersion:v20230314doticon
                                                                                  License: Permissive (MIT)

                                                                                  Robust Speech Recognition via Large-Scale Weak Supervision

                                                                                  Support
                                                                                    Quality
                                                                                      Security
                                                                                        License
                                                                                          Reuse

                                                                                            whisperby openai

                                                                                            Python doticon star image 39256 doticonVersion:v20230314doticon License: Permissive (MIT)

                                                                                            Robust Speech Recognition via Large-Scale Weak Supervision
                                                                                            Support
                                                                                              Quality
                                                                                                Security
                                                                                                  License
                                                                                                    Reuse

                                                                                                      Kit Solution Source

                                                                                                      voice-to-image-generationby kandi1clickkits

                                                                                                      Jupyter Notebook doticonstar image 0 doticonVersion:v1.0.0doticon
                                                                                                      License: Permissive (Apache-2.0)

                                                                                                      Generate Images from voice in realtime using OpenAI

                                                                                                      Support
                                                                                                        Quality
                                                                                                          Security
                                                                                                            License
                                                                                                              Reuse

                                                                                                                voice-to-image-generationby kandi1clickkits

                                                                                                                Jupyter Notebook doticon star image 0 doticonVersion:v1.0.0doticon License: Permissive (Apache-2.0)

                                                                                                                Generate Images from voice in realtime using OpenAI
                                                                                                                Support
                                                                                                                  Quality
                                                                                                                    Security
                                                                                                                      License
                                                                                                                        Reuse

                                                                                                                          UI App Integration

                                                                                                                          gradioby gradio-app

                                                                                                                          Python doticonstar image 18771 doticonVersion:v3.35.2doticon
                                                                                                                          License: Permissive (Apache-2.0)

                                                                                                                          Create UIs for your machine learning model in Python in 3 minutes

                                                                                                                          Support
                                                                                                                            Quality
                                                                                                                              Security
                                                                                                                                License
                                                                                                                                  Reuse

                                                                                                                                    gradioby gradio-app

                                                                                                                                    Python doticon star image 18771 doticonVersion:v3.35.2doticon License: Permissive (Apache-2.0)

                                                                                                                                    Create UIs for your machine learning model in Python in 3 minutes
                                                                                                                                    Support
                                                                                                                                      Quality
                                                                                                                                        Security
                                                                                                                                          License
                                                                                                                                            Reuse

                                                                                                                                              Support


                                                                                                                                              For any support, you can reach us at OpenWeaver Community Support

                                                                                                                                              See similar Kits and Libraries