![Paper reivew | VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding - Datahunt Paper reivew | VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding - Datahunt](https://assets-global.website-files.com/6434d6a6071644318551fd72/6461ca9b1bb49b5330f8002e_6450db691a4f217e2071d906.webp)
Paper reivew | VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding - Datahunt
![PDF] X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval | Semantic Scholar PDF] X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval | Semantic Scholar](https://d3i71xaburhd42.cloudfront.net/1ec886e2235763b08fa606a5d5ea3f4540f715ec/4-Figure2-1.png)
PDF] X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval | Semantic Scholar
![2203.02053] Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning 2203.02053] Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning](https://ar5iv.labs.arxiv.org/html/2203.02053/assets/x1.png)
2203.02053] Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning
![The network architecture of Contrastive Language-Image Pre-Training (CLIP). | Download Scientific Diagram The network architecture of Contrastive Language-Image Pre-Training (CLIP). | Download Scientific Diagram](https://www.researchgate.net/publication/373003531/figure/fig3/AS:11431281180520925@1691626830411/The-network-architecture-of-Contrastive-Language-Image-Pre-Training-CLIP.jpg)
The network architecture of Contrastive Language-Image Pre-Training (CLIP). | Download Scientific Diagram
![Understand CLIP (Contrastive Language-Image Pre-Training) — Visual Models from NLP | by mithil shah | Medium Understand CLIP (Contrastive Language-Image Pre-Training) — Visual Models from NLP | by mithil shah | Medium](https://miro.medium.com/v2/resize:fit:1400/0*A91IqC50AGEHlS98.png)
Understand CLIP (Contrastive Language-Image Pre-Training) — Visual Models from NLP | by mithil shah | Medium
GitHub - openai/CLIP: CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
![Understand CLIP (Contrastive Language-Image Pre-Training) — Visual Models from NLP | by mithil shah | Medium Understand CLIP (Contrastive Language-Image Pre-Training) — Visual Models from NLP | by mithil shah | Medium](https://miro.medium.com/v2/resize:fit:438/0*f6C78re5i1EVfv_J.png)