Build a Generative AI based Automated Image Captioning and Visual QnA Engine

share link

by kandikits dot icon Updated: Jul 25, 2023

technology logo
technology logo

1-Click Kit 1-Click Kit  


AI has been to build intelligent agents that can understand the vision and language inputs and communicate with humans through natural language. 


Vision and language, two of the most fundamental methods for humans to perceive the world, are also two key cornerstones of AI. A longstanding goal of AI has been to build intelligent agents that can understand the world through vision and language inputs, and communicate with humans through natural language.


In order to achieve this goal, vision-language pre-training has emerged as an effective approach, where deep neural network models are pre-trained on large scale image-text datasets to improve performance on downstream vision-language tasks, such as image-text retrieval, image captioning, and visual question answering.


Image Captioning and Visual Question and Answering involves the usage of Large Multimodal Models (LMMs). Multimodal Learning seeks to allow computers to represent real-world objects and concepts using multiple data streams. We make use of one such model - Saleforce's BLIP (Bootstrapping Language-Image Pre-training)

Deployment Information

For Windows OS,

  1. Download, extract the zip file and run. Do ensure to extract the zip file before running it.
  2. After successful installation of the kit, press 'Y' to run the kit and execute cells in the notebook.
  3. To run the kit manually, press 'N' and locate the zip file 'image_captioning'.zip'.
  4. Extract the zip file and navigate to the directory 'image_captioning'.
  5. Open command prompt in the extracted directory 'image_captioning' and run the command 'jupyter notebook'
  6. Locate and open the 'Captioning_visualQnA.ipynb' notebook from the Jupyter Notebook browser window.
  7. Execute cells in the notebook.


For other Operating Systems,

  1. Click here to download the repository.
  2. Extract the zip file and navigate to the directory image_captioning.zip
  3. Extract the zip file and navigate to the directory 'image_captioning'.
  4. Open command prompt in the extracted directory 'image_captioning' and run the command 'jupyter notebook'
  5. Locate and open the 'Captioning_visualQnA.ipynb' notebook from the Jupyter Notebook browser window.
  6. Execute cells in the notebook.



Click the button below to download the solution and follow the deployment information to begin set-up. This 1-click kit has all the required dependencies and resources to build your Image Captioning and Visual QnA Engine.


Libraries used in this solution


Development Environment


VSCode and Jupyter Notebook are used for development and debugging. Jupyter Notebook is a web based interactive environment often used for experiments, whereas VSCode is used to get a typical experience of IDE for developers.


Jupyter Notebook is used for our development.

vscodeby microsoft

TypeScript doticonstar image 147328 doticonVersion:1.79.2doticon
License: Permissive (MIT)

Visual Studio Code

Support
    Quality
      Security
        License
          Reuse

            vscodeby microsoft

            TypeScript doticon star image 147328 doticonVersion:1.79.2doticon License: Permissive (MIT)

            Visual Studio Code
            Support
              Quality
                Security
                  License
                    Reuse

                      jupyterby jupyter

                      Python doticonstar image 14404 doticonVersion:Currentdoticon
                      License: Permissive (BSD-3-Clause)

                      Jupyter metapackage for installation, docs and chat

                      Support
                        Quality
                          Security
                            License
                              Reuse

                                jupyterby jupyter

                                Python doticon star image 14404 doticonVersion:Currentdoticon License: Permissive (BSD-3-Clause)

                                Jupyter metapackage for installation, docs and chat
                                Support
                                  Quality
                                    Security
                                      License
                                        Reuse

                                          Machine Learning


                                          Machine learning libraries and frameworks here are helpful in providing state-of-the-art solutions using Machine learning

                                          transformersby huggingface

                                          Python doticonstar image 104111 doticonVersion:v4.30.2doticon
                                          License: Permissive (Apache-2.0)

                                          🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

                                          Support
                                            Quality
                                              Security
                                                License
                                                  Reuse

                                                    transformersby huggingface

                                                    Python doticon star image 104111 doticonVersion:v4.30.2doticon License: Permissive (Apache-2.0)

                                                    🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
                                                    Support
                                                      Quality
                                                        Security
                                                          License
                                                            Reuse

                                                              pytorchby pytorch

                                                              Python doticonstar image 67874 doticonVersion:v2.0.1doticon
                                                              License: Others (Non-SPDX)

                                                              Tensors and Dynamic neural networks in Python with strong GPU acceleration

                                                              Support
                                                                Quality
                                                                  Security
                                                                    License
                                                                      Reuse

                                                                        pytorchby pytorch

                                                                        Python doticon star image 67874 doticonVersion:v2.0.1doticon License: Others (Non-SPDX)

                                                                        Tensors and Dynamic neural networks in Python with strong GPU acceleration
                                                                        Support
                                                                          Quality
                                                                            Security
                                                                              License
                                                                                Reuse

                                                                                  Kit Solution Source


                                                                                  App User Interface

                                                                                  gradioby gradio-app

                                                                                  Python doticonstar image 18771 doticonVersion:v3.35.2doticon
                                                                                  License: Permissive (Apache-2.0)

                                                                                  Create UIs for your machine learning model in Python in 3 minutes

                                                                                  Support
                                                                                    Quality
                                                                                      Security
                                                                                        License
                                                                                          Reuse

                                                                                            gradioby gradio-app

                                                                                            Python doticon star image 18771 doticonVersion:v3.35.2doticon License: Permissive (Apache-2.0)

                                                                                            Create UIs for your machine learning model in Python in 3 minutes
                                                                                            Support
                                                                                              Quality
                                                                                                Security
                                                                                                  License
                                                                                                    Reuse

                                                                                                      Support


                                                                                                      For any support, you can reach us at OpenWeaver Community Support

                                                                                                      See similar Kits and Libraries