Janus Pro
Testing Janus-Pro's multimodality over Flickr 30K
This notebook showcases the capability of Deepseek's Multimodal Janus with Flickr30K. We take first 1000 samples of Flickr dataset and generate descriptions for them and then query it using both existing original descriptions and generated descriptions by Janus Pro.
Same sentence-transformers embedding function is used to embed both original and generated descriptions. The analysis focuses on How well does Janus performs in understanding and extracting meaning out of them.
The goal is to highlight the level of image undestanding Deepseek's Janus Pro focuses on.
The Flickr30k dataset is a popular benchmark for sentence-based picture portrayal. The dataset is comprised of 31,783 images that capture people engaged in everyday activities and events. Each image has a descriptive caption. Flickr30k is used for understanding the visual media (image) that correspond to a linguistic expression (description of the image). This dataset is commonly used as a standard benchmark for sentence-based image descriptions.
What Makes Janus-Pro Unique?
Breaking the One-Encoder Bottleneck
Unlike prior multimodal models that rely on a single visual encoder to handle both image understanding and image generation, Janus-Pro decouples these tasks into two specialized pathways:
- Visual Understanding Encoder → Extracts meaning from images
- Visual Generation Encoder → Synthesizes images from text descriptions
This architecture allows task-specific optimizations, preventing conflicts between interpretation and creativity.
Installations
We'll install datasets library to import Flickr30k dataset.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 480.6/480.6 kB 6.0 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 32.3/32.3 MB 21.4 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 38.4/38.4 MB 9.9 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 116.3/116.3 kB 4.8 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 179.3/179.3 kB 8.3 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 143.5/143.5 kB 5.6 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 194.8/194.8 kB 7.7 MB/s eta 0:00:00 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. gcsfs 2024.10.0 requires fsspec==2024.10.0, but you have fsspec 2024.9.0 which is incompatible. torch 2.5.1+cu124 requires nvidia-cublas-cu12==12.4.5.8; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cublas-cu12 12.5.3.2 which is incompatible. torch 2.5.1+cu124 requires nvidia-cuda-cupti-cu12==12.4.127; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cuda-cupti-cu12 12.5.82 which is incompatible. torch 2.5.1+cu124 requires nvidia-cuda-nvrtc-cu12==12.4.127; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cuda-nvrtc-cu12 12.5.82 which is incompatible. torch 2.5.1+cu124 requires nvidia-cuda-runtime-cu12==12.4.127; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cuda-runtime-cu12 12.5.82 which is incompatible. torch 2.5.1+cu124 requires nvidia-cudnn-cu12==9.1.0.70; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cudnn-cu12 9.3.0.75 which is incompatible. torch 2.5.1+cu124 requires nvidia-cufft-cu12==11.2.1.3; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cufft-cu12 11.2.3.61 which is incompatible. torch 2.5.1+cu124 requires nvidia-curand-cu12==10.3.5.147; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-curand-cu12 10.3.6.82 which is incompatible. torch 2.5.1+cu124 requires nvidia-cusolver-cu12==11.6.1.9; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cusolver-cu12 11.6.3.83 which is incompatible. torch 2.5.1+cu124 requires nvidia-cusparse-cu12==12.3.1.170; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cusparse-cu12 12.5.1.3 which is incompatible. torch 2.5.1+cu124 requires nvidia-nvjitlink-cu12==12.4.127; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-nvjitlink-cu12 12.5.82 which is incompatible.
Requirement already satisfied: pillow in /usr/local/lib/python3.11/dist-packages (11.1.0)
Dataset
For this analysis, we'll use Flickr30k Dataset
Description: The Flickr30k dataset is a popular benchmark for sentence-based picture portrayal. The dataset is comprised of 31,783 images that capture people engaged in everyday activities and events. Each image has a descriptive caption.
Sample count: more than 30K samples
/usr/local/lib/python3.11/dist-packages/huggingface_hub/utils/_auth.py:94: UserWarning: The secret `HF_TOKEN` does not exist in your Colab secrets. To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session. You will be able to reuse this secret in all of your notebooks. Please note that authentication is recommended but still optional to access public models or datasets. warnings.warn(
README.md: 0%| | 0.00/641 [00:00<?, ?B/s]
flickr30k.py: 0%| | 0.00/2.51k [00:00<?, ?B/s]
0000.parquet: 0%| | 0.00/506M [00:00<?, ?B/s]
0001.parquet: 0%| | 0.00/502M [00:00<?, ?B/s]
0002.parquet: 0%| | 0.00/506M [00:00<?, ?B/s]
0003.parquet: 0%| | 0.00/512M [00:00<?, ?B/s]
0004.parquet: 0%| | 0.00/504M [00:00<?, ?B/s]
0005.parquet: 0%| | 0.00/495M [00:00<?, ?B/s]
0006.parquet: 0%| | 0.00/495M [00:00<?, ?B/s]
0007.parquet: 0%| | 0.00/497M [00:00<?, ?B/s]
0008.parquet: 0%| | 0.00/289M [00:00<?, ?B/s]
Generating test split: 0%| | 0/31014 [00:00<?, ? examples/s]
For this analysis we'll use first 1000 samples of this dataset, you can run it on whole dataset, it would take little bit more time but could be really good comprehensive analysis.
Save Images
Save Images from seperated sample dataset into local directory to generate description of them using Deepseek's Janus Pro.
Images have been saved successfully.
Configure Deepseek's Multimodal Janus Pro
Clone Janus repo and build it using commands in following cell
Cloning into 'Janus'... remote: Enumerating objects: 121, done. remote: Counting objects: 100% (74/74), done. remote: Compressing objects: 100% (38/38), done. remote: Total 121 (delta 51), reused 36 (delta 36), pack-reused 47 (from 2) Receiving objects: 100% (121/121), 7.19 MiB | 13.73 MiB/s, done. Resolving deltas: 100% (57/57), done.
/content/Janus Obtaining file:///content/Janus Installing build dependencies ... done Checking if build backend supports build_editable ... done Getting requirements to build editable ... done Preparing editable metadata (pyproject.toml) ... done Requirement already satisfied: torch>=2.0.1 in /usr/local/lib/python3.11/dist-packages (from janus==1.0.0) (2.5.1+cu124) Requirement already satisfied: transformers>=4.38.2 in /usr/local/lib/python3.11/dist-packages (from janus==1.0.0) (4.48.2) Requirement already satisfied: timm>=0.9.16 in /usr/local/lib/python3.11/dist-packages (from janus==1.0.0) (1.0.14) Requirement already satisfied: accelerate in /usr/local/lib/python3.11/dist-packages (from janus==1.0.0) (1.3.0) Requirement already satisfied: sentencepiece in /usr/local/lib/python3.11/dist-packages (from janus==1.0.0) (0.2.0) Collecting attrdict (from janus==1.0.0) Downloading attrdict-2.0.1-py2.py3-none-any.whl.metadata (6.7 kB) Requirement already satisfied: einops in /usr/local/lib/python3.11/dist-packages (from janus==1.0.0) (0.8.0) Requirement already satisfied: torchvision in /usr/local/lib/python3.11/dist-packages (from timm>=0.9.16->janus==1.0.0) (0.20.1+cu124) Requirement already satisfied: pyyaml in /usr/local/lib/python3.11/dist-packages (from timm>=0.9.16->janus==1.0.0) (6.0.2) Requirement already satisfied: huggingface_hub in /usr/local/lib/python3.11/dist-packages (from timm>=0.9.16->janus==1.0.0) (0.28.1) Requirement already satisfied: safetensors in /usr/local/lib/python3.11/dist-packages (from timm>=0.9.16->janus==1.0.0) (0.5.2) Requirement already satisfied: filelock in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.1->janus==1.0.0) (3.17.0) Requirement already satisfied: typing-extensions>=4.8.0 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.1->janus==1.0.0) (4.12.2) Requirement already satisfied: networkx in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.1->janus==1.0.0) (3.4.2) Requirement already satisfied: jinja2 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.1->janus==1.0.0) (3.1.5) Requirement already satisfied: fsspec in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.1->janus==1.0.0) (2024.9.0) Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=2.0.1->janus==1.0.0) Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB) Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=2.0.1->janus==1.0.0) Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB) Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=2.0.1->janus==1.0.0) Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB) Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=2.0.1->janus==1.0.0) Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB) Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=2.0.1->janus==1.0.0) Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB) Collecting nvidia-cufft-cu12==11.2.1.3 (from torch>=2.0.1->janus==1.0.0) Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB) Collecting nvidia-curand-cu12==10.3.5.147 (from torch>=2.0.1->janus==1.0.0) Downloading nvidia_curand_cu12-10.3.5.147-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB) Collecting nvidia-cusolver-cu12==11.6.1.9 (from torch>=2.0.1->janus==1.0.0) Downloading nvidia_cusolver_cu12-11.6.1.9-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB) Collecting nvidia-cusparse-cu12==12.3.1.170 (from torch>=2.0.1->janus==1.0.0) Downloading nvidia_cusparse_cu12-12.3.1.170-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB) Requirement already satisfied: nvidia-nccl-cu12==2.21.5 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.1->janus==1.0.0) (2.21.5) Requirement already satisfied: nvidia-nvtx-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.1->janus==1.0.0) (12.4.127) Collecting nvidia-nvjitlink-cu12==12.4.127 (from torch>=2.0.1->janus==1.0.0) Downloading nvidia_nvjitlink_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB) Requirement already satisfied: triton==3.1.0 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.1->janus==1.0.0) (3.1.0) Requirement already satisfied: sympy==1.13.1 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.1->janus==1.0.0) (1.13.1) Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.11/dist-packages (from sympy==1.13.1->torch>=2.0.1->janus==1.0.0) (1.3.0) Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.11/dist-packages (from transformers>=4.38.2->janus==1.0.0) (1.26.4) Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.11/dist-packages (from transformers>=4.38.2->janus==1.0.0) (24.2) Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.11/dist-packages (from transformers>=4.38.2->janus==1.0.0) (2024.11.6) Requirement already satisfied: requests in /usr/local/lib/python3.11/dist-packages (from transformers>=4.38.2->janus==1.0.0) (2.32.3) Requirement already satisfied: tokenizers<0.22,>=0.21 in /usr/local/lib/python3.11/dist-packages (from transformers>=4.38.2->janus==1.0.0) (0.21.0) Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.11/dist-packages (from transformers>=4.38.2->janus==1.0.0) (4.67.1) Requirement already satisfied: psutil in /usr/local/lib/python3.11/dist-packages (from accelerate->janus==1.0.0) (5.9.5) Requirement already satisfied: six in /usr/local/lib/python3.11/dist-packages (from attrdict->janus==1.0.0) (1.17.0) Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.11/dist-packages (from jinja2->torch>=2.0.1->janus==1.0.0) (3.0.2) Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests->transformers>=4.38.2->janus==1.0.0) (3.4.1) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/dist-packages (from requests->transformers>=4.38.2->janus==1.0.0) (3.10) Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/dist-packages (from requests->transformers>=4.38.2->janus==1.0.0) (2.3.0) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.11/dist-packages (from requests->transformers>=4.38.2->janus==1.0.0) (2025.1.31) Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /usr/local/lib/python3.11/dist-packages (from torchvision->timm>=0.9.16->janus==1.0.0) (11.1.0) Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl (363.4 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 363.4/363.4 MB 2.3 MB/s eta 0:00:00 Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (13.8 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.8/13.8 MB 66.6 MB/s eta 0:00:00 Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (24.6 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 24.6/24.6 MB 34.6 MB/s eta 0:00:00 Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (883 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 883.7/883.7 kB 41.5 MB/s eta 0:00:00 Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl (664.8 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 664.8/664.8 MB 2.8 MB/s eta 0:00:00 Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl (211.5 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 211.5/211.5 MB 3.7 MB/s eta 0:00:00 Downloading nvidia_curand_cu12-10.3.5.147-py3-none-manylinux2014_x86_64.whl (56.3 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.3/56.3 MB 13.1 MB/s eta 0:00:00 Downloading nvidia_cusolver_cu12-11.6.1.9-py3-none-manylinux2014_x86_64.whl (127.9 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 127.9/127.9 MB 7.7 MB/s eta 0:00:00 Downloading nvidia_cusparse_cu12-12.3.1.170-py3-none-manylinux2014_x86_64.whl (207.5 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 207.5/207.5 MB 6.3 MB/s eta 0:00:00 Downloading nvidia_nvjitlink_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (21.1 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21.1/21.1 MB 88.0 MB/s eta 0:00:00 Downloading attrdict-2.0.1-py2.py3-none-any.whl (9.9 kB) Building wheels for collected packages: janus Building editable for janus (pyproject.toml) ... done Created wheel for janus: filename=janus-1.0.0-0.editable-py3-none-any.whl size=15915 sha256=847476651fbd7575fb83c41e2d3018b3c2edac10574cc61fa75f5bd265c834dc Stored in directory: /tmp/pip-ephem-wheel-cache-w6je9205/wheels/04/ee/d6/76a460ef4080a263aa86cc3fdbb1c5bb29f559fbd8155d1c83 Successfully built janus Installing collected packages: nvidia-nvjitlink-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, attrdict, nvidia-cusparse-cu12, nvidia-cudnn-cu12, nvidia-cusolver-cu12, janus Attempting uninstall: nvidia-nvjitlink-cu12 Found existing installation: nvidia-nvjitlink-cu12 12.5.82 Uninstalling nvidia-nvjitlink-cu12-12.5.82: Successfully uninstalled nvidia-nvjitlink-cu12-12.5.82 Attempting uninstall: nvidia-curand-cu12 Found existing installation: nvidia-curand-cu12 10.3.6.82 Uninstalling nvidia-curand-cu12-10.3.6.82: Successfully uninstalled nvidia-curand-cu12-10.3.6.82 Attempting uninstall: nvidia-cufft-cu12 Found existing installation: nvidia-cufft-cu12 11.2.3.61 Uninstalling nvidia-cufft-cu12-11.2.3.61: Successfully uninstalled nvidia-cufft-cu12-11.2.3.61 Attempting uninstall: nvidia-cuda-runtime-cu12 Found existing installation: nvidia-cuda-runtime-cu12 12.5.82 Uninstalling nvidia-cuda-runtime-cu12-12.5.82: Successfully uninstalled nvidia-cuda-runtime-cu12-12.5.82 Attempting uninstall: nvidia-cuda-nvrtc-cu12 Found existing installation: nvidia-cuda-nvrtc-cu12 12.5.82 Uninstalling nvidia-cuda-nvrtc-cu12-12.5.82: Successfully uninstalled nvidia-cuda-nvrtc-cu12-12.5.82 Attempting uninstall: nvidia-cuda-cupti-cu12 Found existing installation: nvidia-cuda-cupti-cu12 12.5.82 Uninstalling nvidia-cuda-cupti-cu12-12.5.82: Successfully uninstalled nvidia-cuda-cupti-cu12-12.5.82 Attempting uninstall: nvidia-cublas-cu12 Found existing installation: nvidia-cublas-cu12 12.5.3.2 Uninstalling nvidia-cublas-cu12-12.5.3.2: Successfully uninstalled nvidia-cublas-cu12-12.5.3.2 Attempting uninstall: nvidia-cusparse-cu12 Found existing installation: nvidia-cusparse-cu12 12.5.1.3 Uninstalling nvidia-cusparse-cu12-12.5.1.3: Successfully uninstalled nvidia-cusparse-cu12-12.5.1.3 Attempting uninstall: nvidia-cudnn-cu12 Found existing installation: nvidia-cudnn-cu12 9.3.0.75 Uninstalling nvidia-cudnn-cu12-9.3.0.75: Successfully uninstalled nvidia-cudnn-cu12-9.3.0.75 Attempting uninstall: nvidia-cusolver-cu12 Found existing installation: nvidia-cusolver-cu12 11.6.3.83 Uninstalling nvidia-cusolver-cu12-11.6.3.83: Successfully uninstalled nvidia-cusolver-cu12-11.6.3.83 Successfully installed attrdict-2.0.1 janus-1.0.0 nvidia-cublas-cu12-12.4.5.8 nvidia-cuda-cupti-cu12-12.4.127 nvidia-cuda-nvrtc-cu12-12.4.127 nvidia-cuda-runtime-cu12-12.4.127 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.2.1.3 nvidia-curand-cu12-10.3.5.147 nvidia-cusolver-cu12-11.6.1.9 nvidia-cusparse-cu12-12.3.1.170 nvidia-nvjitlink-cu12-12.4.127 Collecting flash-attn Downloading flash_attn-2.7.4.post1.tar.gz (6.0 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.0/6.0 MB 65.4 MB/s eta 0:00:00 Preparing metadata (setup.py) ... done Requirement already satisfied: torch in /usr/local/lib/python3.11/dist-packages (from flash-attn) (2.5.1+cu124) Requirement already satisfied: einops in /usr/local/lib/python3.11/dist-packages (from flash-attn) (0.8.0) Requirement already satisfied: filelock in /usr/local/lib/python3.11/dist-packages (from torch->flash-attn) (3.17.0) Requirement already satisfied: typing-extensions>=4.8.0 in /usr/local/lib/python3.11/dist-packages (from torch->flash-attn) (4.12.2) Requirement already satisfied: networkx in /usr/local/lib/python3.11/dist-packages (from torch->flash-attn) (3.4.2) Requirement already satisfied: jinja2 in /usr/local/lib/python3.11/dist-packages (from torch->flash-attn) (3.1.5) Requirement already satisfied: fsspec in /usr/local/lib/python3.11/dist-packages (from torch->flash-attn) (2024.9.0) Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch->flash-attn) (12.4.127) Requirement already satisfied: nvidia-cuda-runtime-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch->flash-attn) (12.4.127) Requirement already satisfied: nvidia-cuda-cupti-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch->flash-attn) (12.4.127) Requirement already satisfied: nvidia-cudnn-cu12==9.1.0.70 in /usr/local/lib/python3.11/dist-packages (from torch->flash-attn) (9.1.0.70) Requirement already satisfied: nvidia-cublas-cu12==12.4.5.8 in /usr/local/lib/python3.11/dist-packages (from torch->flash-attn) (12.4.5.8) Requirement already satisfied: nvidia-cufft-cu12==11.2.1.3 in /usr/local/lib/python3.11/dist-packages (from torch->flash-attn) (11.2.1.3) Requirement already satisfied: nvidia-curand-cu12==10.3.5.147 in /usr/local/lib/python3.11/dist-packages (from torch->flash-attn) (10.3.5.147) Requirement already satisfied: nvidia-cusolver-cu12==11.6.1.9 in /usr/local/lib/python3.11/dist-packages (from torch->flash-attn) (11.6.1.9) Requirement already satisfied: nvidia-cusparse-cu12==12.3.1.170 in /usr/local/lib/python3.11/dist-packages (from torch->flash-attn) (12.3.1.170) Requirement already satisfied: nvidia-nccl-cu12==2.21.5 in /usr/local/lib/python3.11/dist-packages (from torch->flash-attn) (2.21.5) Requirement already satisfied: nvidia-nvtx-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch->flash-attn) (12.4.127) Requirement already satisfied: nvidia-nvjitlink-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch->flash-attn) (12.4.127) Requirement already satisfied: triton==3.1.0 in /usr/local/lib/python3.11/dist-packages (from torch->flash-attn) (3.1.0) Requirement already satisfied: sympy==1.13.1 in /usr/local/lib/python3.11/dist-packages (from torch->flash-attn) (1.13.1) Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.11/dist-packages (from sympy==1.13.1->torch->flash-attn) (1.3.0) Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.11/dist-packages (from jinja2->torch->flash-attn) (3.0.2) Building wheels for collected packages: flash-attn Building wheel for flash-attn (setup.py) ... done Created wheel for flash-attn: filename=flash_attn-2.7.4.post1-cp311-cp311-linux_x86_64.whl size=187815463 sha256=d944fc7d2f962bce83fc4708c2fc0c21eaf8255962a0b350ae919362a51b7ef2 Stored in directory: /root/.cache/pip/wheels/3d/88/d8/284b89f56af7d5bf366b10d6b8e251ac8a7c7bf3f04203fb4f Successfully built flash-attn Installing collected packages: flash-attn Successfully installed flash-attn-2.7.4.post1
Import Janus-Pro-1B model.
We have used Janus-Pro-1B for running it on Colab's free T4 GPU, You can go for bigger Janus-Pro-7B as per your compute.
The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
0it [00:00, ?it/s]
Python version is above 3.10, patching the collections module.
/usr/local/lib/python3.11/dist-packages/transformers/models/auto/image_processing_auto.py:590: FutureWarning: The image_processor_class argument is deprecated and will be removed in v4.42. Please use `slow_image_processor_class`, or `fast_image_processor_class` instead warnings.warn(
preprocessor_config.json: 0%| | 0.00/346 [00:00<?, ?B/s]
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
tokenizer_config.json: 0%| | 0.00/285 [00:00<?, ?B/s]
tokenizer.json: 0%| | 0.00/4.72M [00:00<?, ?B/s]
special_tokens_map.json: 0%| | 0.00/344 [00:00<?, ?B/s]
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
processor_config.json: 0%| | 0.00/210 [00:00<?, ?B/s]
Some kwargs in processor config are unused and will not have any effect: mask_prompt, ignore_id, image_tag, add_special_token, num_image_tokens, sft_format.
config.json: 0%| | 0.00/1.46k [00:00<?, ?B/s]
pytorch_model.bin: 0%| | 0.00/4.18G [00:00<?, ?B/s]
model.safetensors: 0%| | 0.00/4.18G [00:00<?, ?B/s]
Generate descriptions of saved images and save them in csv file, as per Colab's compute it took around 55 mins to generate description for 1000 images so its recommended to save these generated description into a CSV file and then run test on them.
Processing images: 100%|██████████| 1000/1000 [55:20<00:00, 3.32s/image]
Merge original and generated captions of images into same dataframe.
Saved this merged data as an backup
filename description \
0 1000092795.jpg The image shows two people standing in a garde...
1 10002456.jpg The image shows a group of workers on a tall m...
2 1000268201.jpg The image shows a small wooden structure, poss...
3 1000344755.jpg The image shows a person wearing a blue shirt ...
4 1000366164.jpg The image shows two people in a kitchen. One p...
.. ... ...
995 1321651400.jpg The image shows a person sitting on a tree bra...
996 1321723162.jpg The image shows a woman wearing a white shirt ...
997 1321949151.jpg The image shows a group of people in an art ga...
998 1322323208.jpg The image shows two children playing on a beac...
999 132298659.jpg The image shows a person wearing a white overa...
caption
0 ['Two young guys with shaggy hair look at thei...
1 ['Several men in hard hats are operating a gia...
2 ['A child in a pink dress is climbing up a set...
3 ['Someone in a blue shirt and hat is standing ...
4 ['Two men, one in a gray shirt, one in a black...
.. ...
995 ['A blond woman in a white dress sits in a flo...
996 ['The two girls hold hands with one another wh...
997 ['Two tired children rest on couches at an art...
998 ['Two children, one of which is holding a stic...
999 ['A man wearing white overalls and a baseball ...
[1000 rows x 3 columns]
Analysis
Here, we'll create two tables, one containing generated descriptions another containing dataset descriptions and then running same query on both of them to see which one gives us better results.
Query
Query on Janus-Pro Generated Description
Query on Original Dataset Description
Query on Janus-Pro Generated Description
Query on Original Dataset Description
Analysis Conclusion
From above two queries it can be clearly seen than descriptions generated by Janus-Pro clearly outperforms the original descriptions which dataset has, which shows the understanding of Janus-Pro model.