Skip to main content

VLM

CLIP is a multimodal machine learning model proposed by OpenAI. By performing comparative learning on large-scale image and text pairs, the model can simultaneously process images and text, mapping them into a shared vector space. This example demonstrates using CLIP for image management and text search on the RDK platform.

Code repository: ( https://github.com/D-Robotics/hobot_clip.git )

Application scenarios: Use the CLIP image feature extractor to manage images, perform image-text search, and image-based search.

More information

Please refer to the D-Robotics official documentation https://developer.d-robotics.cc/rdk_doc/Robot_development/boxs/function/hobot_clip