Here's a kit of 8 amazing open-source multimodal projects.
by shweta10 Updated: Mar 25, 2024
Guide Kit ย
Apple Releases New Multimodal Models: MM1 Family
LAVISby salesforce
LAVIS - A One-stop Library for Language-Vision Intelligence
LAVISby salesforce
Python 5474 Version:v1.0.2 License: Permissive (BSD-3-Clause)
pytorch-widedeepby jrzaurin
A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch
pytorch-widedeepby jrzaurin
Python 1049 Version:v1.2.2 License: Permissive (Apache-2.0)
jinaby jina-ai
๐ฎ Build multimodal AI services via cloud native technologies
jinaby jina-ai
Python 18565 Version:v3.17.0 License: Permissive (Apache-2.0)
unilmby microsoft
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
unilmby microsoft
Python 12771 Version:s2s-ft.v0.3 License: Permissive (MIT)
mmfby facebookresearch
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
mmfby facebookresearch
Python 5241 Version:v0.3.1 License: Others (Non-SPDX)
LLaVAby haotian-liu
Large Language-and-Vision Assistant built towards multimodal GPT-4 level capabilities.
LLaVAby haotian-liu
Python 3169 Version:Current License: Permissive (Apache-2.0)