by Peter Grad, Tech Xplore
We aim to build a foundation model for segmentation by introducing three interconnected components: a promptable segmentation task, a segmentation model (SAM) that powers data annotation and enables zero-shot transfer to a range of tasks via prompt engineering, and a data engine for collecting SA-1B, our dataset of over 1 billion masks. Credit: arXiv (2023). DOI: 10.48550/arxiv.2304.02643
Meta took a big leap forward this week with the unveiling of a model that can detect and isolate objects in an image even if it never saw them before. The technology is introduced and described in an article on the arXiv pre-print server.
The AI tool represents a major advance in one of technology’s tougher challenges: allowing computers to detect and comprehend the elements of a previously unseen image and isolate them for user interaction.
It recalls a concept the former chair of the National Security Commission on Artificial Intelligence Robert O. Work once described: “What AI and machine learning allows you to do is find the needle in the haystack.”
In this instance, Meta’s Segment Anything Model (SAM) hunts for related pixels in an image and identifies the common components that make up all the pieces of the picture.
“SAM has learned a general notion of what objects are, and it can generate masks for any object in any image or any video, even including objects and image types that it had not encountered during training,” Meta AI announced in a blog post Wednesday.
The recognition task is called segmentation. We do it daily without a moment’s thought. We recognize items on our offices desks such as smartphones, cables, computer screen, a lamp, a melting candy bar, a cup of coffee.
But without prior programming, a computer must strain to distinguish all components down to the last pixel in a two-dimensional image, and it’s more complicated when there are overlapping items, shadows or an irregular or partitioned shape.
Prior approaches to segmentation usually required human intervention to define a mask. Earlier automated segmentation permitted detection of objects but, according to Meta AI, that required “thousands or even tens of thousands of examples” of objects along with “computer resources and technical expertise to train the segmentation model.”
SAM incorporates the two approaches in a fully automated system. It employs more than 1 billion masks that allow it to recognize new types of objects.
“This ability to generalize means that, by and large, practitioners will no longer need to collect their own segmentation data and fine-tune a model for their use case,” the Meta blog stated.
One reviewer called SAM “Photoshop’s ‘Magic Wand’ tool on steroids.”
SAM can be activated by user clicks or text prompts. Meta researchers envision SAM’s further utilization in the AR/VR realm. When users focus on an object, it can be delineated, defined and “lifted” into a 3D image and incorporated into a movie, game or presentation.
A free working model is available online. Users can select from an image gallery or upload their own photos. They can then tap anywhere on the screen or draw a rectangle around an item of interest and watch SAM define, for instance, the outline of a nose, face or entire body. Another option directs SAM to identify every object in an image.
Although SAM has not been applied to Facebook yet, similar technology has been applied to familiar processes such as photo tagging, moderation and tagging of disallowed content, and generation of recommended posts on both Facebook and Instagram.
Leave a Reply