At Amazon, we are constantly working to improve our logistics operations through advanced AI and computer vision. Today we are pleased to announce the public release of Kaputt, a large -scale data set for detection of visual defect in retail logistics. This data set, which will be presented at International Conference on Computer Vision (ICCV) 2025, represents a big step forward in our efforts to automate defective detection.
The Kaputt data set contains 238,421 high-resolution images of 48,376 unique objects, included 29,316 defective occurrences, making it 40 times as large as current advanced benchmark data sets. It captures the real world complexities by detecting defects and damage across a huge rage of products-minor fold, greater waste and everything in between.
The challenge of automated defective detection
Development of robust visual defective detection systems for retail logistics gives meaningful that it is that existing research is fully added. Existing benchmarks focus most on manufacture and have reached saturation and achieved almost perfect performance with more than 99.9% AUROC (area during the reception-counting characteristic curve that measures the balance between genuine-positive and false-positive prices). Unlike manufacturing kits that often focus on highly standardized goods and limited numbers of different objects, retail logistics handle millions of unique products, most of which have only been seen a handful of times. Without adequate data, it is extremely difficulty for a system to tearn what “usually” constituted “defect” across such different objects.
A new data set for applications in the real world
The structure of our data set reflects these real challenges and opportunities. For each inquiry image, we deliver up to three reference images showing the item in “Normal” (which means that more than 99% are likely to be defective – but not 100%) mode, mirroring how human inspectors can compare objects to determine defects. We have also included detailed comments on seven different types of defects and their severity that recognize the subjective nature of the defect assessment.
Understanding of model performance
Our extensive evaluation of several leading methods reveals both the complexity of the task and the current technological limitations. We tested four different approaches: Zero-shot methods use generally-displaced visual models, few-shot approaches to utilize reference images, monitored learning and hybrid methods that combine multiple techniques.
The results are striking: While monitored models with access to the full data set achieve 94.27% Auroc by defect detection, their performance drops to 74.4% in more realistic scenarios with a limited number of defective samples available for training. Advanced zero-shot methods work even worse, no EXCEDING 56.96% AUROC-not much better than random guesses.
Qualitative Thrush analysis we identified several key challenges for these methods: Models struggle with subtle deviations, rare defective types and reference-dependent defects such as missing devices, and they often mislead deformable objects or objects with dimming design. Vision -language models can detect obvious defects, but fails to capture subtle defects in deficiencies or minor deviations such as stickers and dirt.
In general, these results are in sharp contrast to the almost perfect performance that advanced detection methods achieve in manufacturing settings, which highlights the unique challenges of retailogistics added by our data set.
Impact beyond retail operations
The effect of improving visual defective detection extends far beyond operating efficiency. Early detection of defective items helps reduce waste, labor and resource consumption by preventing defective products from moving further through the supply chain, which ultimately supports sustainability goals. It also helps to ensure that customers receive their orders in perfect condition, reducing returns and refills – which in turn reduces carbon emissions from transport.
Potential applications extend beyond retail. The challenges of this data set – Handling of various objects, handling limited data per. The instance and control of significant positions – is lined with quality control in vehicle damage, infrastructure inspection and even medical imaging. By sharing this data set, we hope to speed up progress across these domains.
Kaputt -Data set can now be downloaded. We encourage computer vision researchers to exploit this resource to develop new approaches to this challenging problem. We look forward to engaging in the research community and seeing the innovative solutions that come out of this work.