Computer vision | Model Monster

A field of AI focused on interpreting and generating information from images and video (e.g., object detection, segmentation, captioning). Many modern systems use multimodal models that combine vision and language capabilities.