Build a Realtime Voice-to-Image Generator using Generative AI

by kandikits Updated: Aug 17, 2023

1-Click Kit

1-Click Kit installer

Generative artificial intelligence (AI) describes algorithms that help in creating/generating new content, including audio, code, images, text and videos.

In this kit, we build a real-time Voice-to-Image Generator using the concept of Generative AI. It is carried out in two steps:

Voice-to-text conversion - The speech is captured in real-time through the microphone and converted to text using state-of-the-art Opensource AI models from OpenAI and Whisper libraries.

Text to Image Generation - The converted text is provided as input to the state-of-the-art Image Generation models like Dalle-2, and the image is thus generated.

Deployment Information

This repository helps you build your own AI based voice to image generation with OpenAI API & Gradio

Getting the secret key to use OPENAI API

Get your OpenAI API key
Signup on the OpenAI platform if not done already.
Replace that key in the code in the "voice-to-image-generation-dalle.ipynb" notebook.
Please keep the api key highly private and do not share with anyone.
If you face any access errors with regards to the ChatGPT kit, please visit here to check for the usage limits of your account, check the API keys limit and create another API key if possible. If not, try creating another account, create a new API key and utilize it.

For Windows OS,

Download, extract the zip file and run. Do ensure to extract the zip file before running it.
After successful installation of the kit, press 'Y' to run the kit and execute cells in the notebook.
To run the kit manually, press 'N' and follow the below steps. To run the solution anytime manually after installation, follow the below steps:
Navigate to the 'voice-to-image-generation' folder located in C:\kandikits
Open command prompt inside the extracted directory 'voice-to-image-generation'
Run this command - "voice-to-image-generation-env\Scripts\activate.bat" to activate the virtual environment
Run the command - "cd voice-to-image-generation"
Run the command 'jupyter notebook' which would start a Jupyter notebook instance.
Locate and open the 'voice-to-image-generation-dalle.ipynb' notebook from the Jupyter Notebook browser window.
Execute cells in the notebook.

For Linux distros and macOS,

Follow the instructions to download & install Python3.9 & pip for your respective Linux distros or mac OS.
Install ffmpeg and its libraries. Check installation instructions for Linux and Mac.
Click here to download the repository.
Extract the zip file and navigate to the directory voice-to-image-generation
Open a terminal in the extracted directory 'voice-to-image-generation'
Create and activate virtual environment using this command: 'virtualenv venv & source ./venv/bin/activate'
Install dependencies using the command 'pip3.9 install -r requirements.txt'
Once the dependencies are installed, run the command 'jupyter notebook' to start jupyter notebook (Pls use --allow-root if you're running as root)
Locate and open the 'voice-to-image-generation-dalle.ipynb' notebook from the Jupyter Notebook browser window.
Execute cells in the notebook.

Click the button below to download the solution and follow the deployment information to begin set-up. This 1-click kit has all the required dependencies and resources to build your Voice to Image Generator App.

1-Click Kit installer

Libraries used in this solution

Development Environment

VSCode and Jupyter Notebook are used for development and debugging. Jupyter Notebook is a web based interactive environment often used for experiments, whereas VSCode is used to get a typical experience of IDE for developers.

Jupyter Notebook is used for our development.

jupyterby jupyter

Python

14404

Version:Current

License: Permissive (BSD-3-Clause)

Jupyter metapackage for installation, docs and chat

Support

Quality

Security

License

Reuse

jupyterby jupyter

Python 14404 Version:Current License: Permissive (BSD-3-Clause)

Jupyter metapackage for installation, docs and chat

Support

Quality

Security

License

Reuse

vscodeby microsoft

TypeScript

147328

Version:1.79.2

License: Permissive (MIT)

Visual Studio Code

Support

Quality

Security

License

Reuse

vscodeby microsoft

TypeScript 147328 Version:1.79.2 License: Permissive (MIT)

Visual Studio Code

Support

Quality

Security

License

Reuse

Machine Learning

Machine learning libraries and frameworks here are helpful in providing state-of-the-art solutions using Machine learning

openai-pythonby openai

Python

9579

Version:v0.27.8

License: Permissive (MIT)

The OpenAI Python library provides convenient access to the OpenAI API from applications written in the Python language.

Support

Quality

Security

License

Reuse

openai-pythonby openai

Python 9579 Version:v0.27.8 License: Permissive (MIT)

The OpenAI Python library provides convenient access to the OpenAI API from applications written in the Python language.

Support

Quality

Security

License

Reuse

pytorchby pytorch

Python

67874

Version:v2.0.1

License: Others (Non-SPDX)

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Support

Quality

Security

License

Reuse

pytorchby pytorch

Python 67874 Version:v2.0.1 License: Others (Non-SPDX)

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Support

Quality

Security

License

Reuse

whisperby openai

Python

39256

Version:v20230314

License: Permissive (MIT)

Robust Speech Recognition via Large-Scale Weak Supervision

Support

Quality

Security

License

Reuse

whisperby openai

Python 39256 Version:v20230314 License: Permissive (MIT)

Robust Speech Recognition via Large-Scale Weak Supervision

Support

Quality

Security

License

Reuse

Kit Solution Source

voice-to-image-generationby kandi1clickkits

Jupyter Notebook

Version:v1.0.0

License: Permissive (Apache-2.0)

Generate Images from voice in realtime using OpenAI

Support

Quality

Security

License

Reuse

voice-to-image-generationby kandi1clickkits

Jupyter Notebook 0 Version:v1.0.0 License: Permissive (Apache-2.0)

Generate Images from voice in realtime using OpenAI

Support

Quality

Security

License

Reuse

UI App Integration

gradioby gradio-app

Python

18771

Version:v3.35.2

License: Permissive (Apache-2.0)

Create UIs for your machine learning model in Python in 3 minutes

Support

Quality

Security

License

Reuse

gradioby gradio-app

Python 18771 Version:v3.35.2 License: Permissive (Apache-2.0)

Create UIs for your machine learning model in Python in 3 minutes

Support

Quality

Security

License

Reuse

Support

For any support, you can reach us at OpenWeaver Community Support

See similar Kits and Libraries

Open Weaver – Develop Applications Faster with Open Source

Terms
Privacy policy

Build a Realtime Voice-to-Image Generator using Generative AI

Deployment Information

Libraries used in this solution

Development Environment

Machine Learning

Kit Solution Source

UI App Integration

Support

Open Weaver – Develop Applications Faster with Open Source

kandi

Community and Support

Company

Follow