For downloads and more information, please view on a desktop device.
The CLIP (Contrastive Language-Image Pretraining) model combines vision and language using contrastive learning. It understands images and text together, enabling tasks like image classification and object detection.
November 15, 2023
AI models generate responses and outputs based on complex algorithms and machine learning techniques, and those responses or outputs may be inaccurate or indecent. By testing this model, you assume the risk of any harm caused by any response or output of the model. Please do not upload any confidential information or personal data. Your use is logged for security.