Build a Generative AI based Automated Image Captioning and Visual QnA Engine

by kandikits Updated: Jul 25, 2023

1-Click Kit

1-Click Kit installer

AI has been to build intelligent agents that can understand the vision and language inputs and communicate with humans through natural language.

Vision and language, two of the most fundamental methods for humans to perceive the world, are also two key cornerstones of AI. A longstanding goal of AI has been to build intelligent agents that can understand the world through vision and language inputs, and communicate with humans through natural language.

In order to achieve this goal, vision-language pre-training has emerged as an effective approach, where deep neural network models are pre-trained on large scale image-text datasets to improve performance on downstream vision-language tasks, such as image-text retrieval, image captioning, and visual question answering.

Image Captioning and Visual Question and Answering involves the usage of Large Multimodal Models (LMMs). Multimodal Learning seeks to allow computers to represent real-world objects and concepts using multiple data streams. We make use of one such model - Saleforce's BLIP (Bootstrapping Language-Image Pre-training)

Deployment Information

For Windows OS,

Download, extract the zip file and run. Do ensure to extract the zip file before running it.
After successful installation of the kit, press 'Y' to run the kit and execute cells in the notebook.
To run the kit manually, press 'N' and locate the zip file 'image_captioning'.zip'.
Extract the zip file and navigate to the directory 'image_captioning'.
Open command prompt in the extracted directory 'image_captioning' and run the command 'jupyter notebook'
Locate and open the 'Captioning_visualQnA.ipynb' notebook from the Jupyter Notebook browser window.
Execute cells in the notebook.

For other Operating Systems,

Click here to download the repository.
Extract the zip file and navigate to the directory image_captioning.zip
Extract the zip file and navigate to the directory 'image_captioning'.
Open command prompt in the extracted directory 'image_captioning' and run the command 'jupyter notebook'
Locate and open the 'Captioning_visualQnA.ipynb' notebook from the Jupyter Notebook browser window.
Execute cells in the notebook.

Click the button below to download the solution and follow the deployment information to begin set-up. This 1-click kit has all the required dependencies and resources to build your Image Captioning and Visual QnA Engine.

1-Click Kit installer

Libraries used in this solution

Development Environment

VSCode and Jupyter Notebook are used for development and debugging. Jupyter Notebook is a web based interactive environment often used for experiments, whereas VSCode is used to get a typical experience of IDE for developers.

Jupyter Notebook is used for our development.

vscodeby microsoft

TypeScript

147328

Version:1.79.2

License: Permissive (MIT)

Visual Studio Code

Support

Quality

Security

License

Reuse

vscodeby microsoft

TypeScript 147328 Version:1.79.2 License: Permissive (MIT)

Visual Studio Code

Support

Quality

Security

License

Reuse

jupyterby jupyter

Python

14404

Version:Current

License: Permissive (BSD-3-Clause)

Jupyter metapackage for installation, docs and chat

Support

Quality

Security

License

Reuse

jupyterby jupyter

Python 14404 Version:Current License: Permissive (BSD-3-Clause)

Jupyter metapackage for installation, docs and chat

Support

Quality

Security

License

Reuse

Machine Learning

Machine learning libraries and frameworks here are helpful in providing state-of-the-art solutions using Machine learning

transformersby huggingface

Python

104111

Version:v4.30.2

License: Permissive (Apache-2.0)

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Support

Quality

Security

License

Reuse

transformersby huggingface

Python 104111 Version:v4.30.2 License: Permissive (Apache-2.0)

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Support

Quality

Security

License

Reuse

pytorchby pytorch

Python

67874

Version:v2.0.1

License: Others (Non-SPDX)

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Support

Quality

Security

License

Reuse

pytorchby pytorch

Python 67874 Version:v2.0.1 License: Others (Non-SPDX)

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Support

Quality

Security

License

Reuse

Kit Solution Source

App User Interface

gradioby gradio-app

Python

18771

Version:v3.35.2

License: Permissive (Apache-2.0)

Create UIs for your machine learning model in Python in 3 minutes

Support

Quality

Security

License

Reuse

gradioby gradio-app

Python 18771 Version:v3.35.2 License: Permissive (Apache-2.0)

Create UIs for your machine learning model in Python in 3 minutes

Support

Quality

Security

License

Reuse

Support

For any support, you can reach us at OpenWeaver Community Support

See similar Kits and Libraries

Open Weaver – Develop Applications Faster with Open Source

Terms
Privacy policy

Build a Generative AI based Automated Image Captioning and Visual QnA Engine

Deployment Information

Libraries used in this solution

Development Environment

Machine Learning

Kit Solution Source

App User Interface

Support

Open Weaver – Develop Applications Faster with Open Source

kandi

Community and Support

Company

Follow