New

At Amazon, we are constantly working to improve our logistics operations through advanced AI and computer vision. Today we are pleased to announce the public release of Kaputt, a large -scale data set for detection of visual defect in retail logistics. This data set, which will be presented at International Conference on Computer Vision (ICCV) 2025, represents a big step forward in our efforts to automate defective detection.

The Kaputt data set contains 238,421 high-resolution images of 48,376 unique objects, included 29,316 defective occurrences, making it 40 times as large as current advanced benchmark data sets. It captures the real world complexities by detecting defects and damage across a huge rage of products-minor fold, greater waste and everything in between.

Overview of defective severities and defective types. Our data set categorizes defective samples in two severity classes: less (Top two rows) and Major (Bottom two rows). In addition, each defective sample is assigned one or more defective types (columns)that characterize the defect (s) an item that exhibits in a more fine -grained way. The figure shows two representative samples per Defective type/severity combination.

The challenge of automated defective detection

Development of robust visual defective detection systems for retail logistics gives meaningful that it is that existing research is fully added. Existing benchmarks focus most on manufacture and have reached saturation and achieved almost perfect performance with more than 99.9% AUROC (area during the reception-counting characteristic curve that measures the balance between genuine-positive and false-positive prices). Unlike manufacturing kits that often focus on highly standardized goods and limited numbers of different objects, retail logistics handle millions of unique products, most of which have only been seen a handful of times. Without adequate data, it is extremely difficulty for a system to tearn what “usually” constituted “defect” across such different objects.

A new data set for applications in the real world

The structure of our data set reflects these real challenges and opportunities. For each inquiry image, we deliver up to three reference images showing the item in “Normal” (which means that more than 99% are likely to be defective – but not 100%) mode, mirroring how human inspectors can compare objects to determine defects. We have also included detailed comments on seven different types of defects and their severity that recognize the subjective nature of the defect assessment.

Each query image is associated with one to three reference images that can exhibit considerable variation. (1) All three reference pictures are defective and show the same face of the package. (2) The reference images demonstrate defects (packages have been released away from their wrapping) and constitute variability (an image shows the back of the package).

Understanding of model performance

Our extensive evaluation of several leading methods reveals both the complexity of the task and the current technological limitations. We tested four different approaches: Zero-shot methods use generally-displaced visual models, few-shot approaches to utilize reference images, monitored learning and hybrid methods that combine multiple techniques.

Related content

Benchmarking Framework, which included in Product-Representational Public Data Set, Guidelines for Model Selection and an evaluation method help bridge the gap between research and implementation in the real world.

The results are striking: While monitored models with access to the full data set achieve 94.27% Auroc by defect detection, their performance drops to 74.4% in more realistic scenarios with a limited number of defective samples available for training. Advanced zero-shot methods work even worse, no EXCEDING 56.96% AUROC-not much better than random guesses.

Qualitative Thrush analysis we identified several key challenges for these methods: Models struggle with subtle deviations, rare defective types and reference-dependent defects such as missing devices, and they often mislead deformable objects or objects with dimming design. Vision -language models can detect obvious defects, but fails to capture subtle defects in deficiencies or minor deviations such as stickers and dirt.

In general, these results are in sharp contrast to the almost perfect performance that advanced detection methods achieve in manufacturing settings, which highlights the unique challenges of retailogistics added by our data set.

Impact beyond retail operations

The effect of improving visual defective detection extends far beyond operating efficiency. Early detection of defective items helps reduce waste, labor and resource consumption by preventing defective products from moving further through the supply chain, which ultimately supports sustainability goals. It also helps to ensure that customers receive their orders in perfect condition, reducing returns and refills – which in turn reduces carbon emissions from transport.

Pick-end-place scenario.png

Related content

Collection data set in an industrial setting contains more than 190,000 objects, size orders more than previous data sets.

Potential applications extend beyond retail. The challenges of this data set – Handling of various objects, handling limited data per. The instance and control of significant positions – is lined with quality control in vehicle damage, infrastructure inspection and even medical imaging. By sharing this data set, we hope to speed up progress across these domains.

Kaputt -Data set can now be downloaded. We encourage computer vision researchers to exploit this resource to develop new approaches to this challenging problem. We look forward to engaging in the research community and seeing the innovative solutions that come out of this work.

Leave a Comment