Options
Towards Identification of Packaged Products via Computer Vision - Convolutional Neural Networks for Object Detection and Image Classification in Retail Environments
ISBN
978-1-4503-7207-7
Type
conference paper
Date Issued
2019-10
Author(s)
Abstract
Identification of packaged products in retail environments still
relies on barcodes, requiring active user input and limited to one
product at a time. Computer vision (CV) has already enabled many
applications, but has so far been under-discussed in the retail
domain, albeit allowing for faster, hands-free, more natural humanobject interaction (e.g. via mixed reality headsets). To assess the potential of current convolutional neural network (CNN)
architectures to reliably identify packaged products within a retail environment, we created and open-source a dataset of 300 images of vending machines with 15k labeled instances of 90 products. We assessed observed accuracies from transfer learning for imagebased product classification (IC) and multi-product object detection (OD) on multiple CNN architectures, and the number of images instances required per product to achieve meaningful predictions. Results show that as little as six images are enough for 90% IC accuracy, but around 30 images are needed for 95% IC accuracy. For simultaneous OD, 42 instances per product are necessary and far more than 100 instances to produce robust results. Thus, this study demonstrates that even in realistic, fast-paced retail environments, image-based product identification provides an alternative to barcodes, especially for use-cases that do not require perfect 100% accuracy.
relies on barcodes, requiring active user input and limited to one
product at a time. Computer vision (CV) has already enabled many
applications, but has so far been under-discussed in the retail
domain, albeit allowing for faster, hands-free, more natural humanobject interaction (e.g. via mixed reality headsets). To assess the potential of current convolutional neural network (CNN)
architectures to reliably identify packaged products within a retail environment, we created and open-source a dataset of 300 images of vending machines with 15k labeled instances of 90 products. We assessed observed accuracies from transfer learning for imagebased product classification (IC) and multi-product object detection (OD) on multiple CNN architectures, and the number of images instances required per product to achieve meaningful predictions. Results show that as little as six images are enough for 90% IC accuracy, but around 30 images are needed for 95% IC accuracy. For simultaneous OD, 42 instances per product are necessary and far more than 100 instances to produce robust results. Thus, this study demonstrates that even in realistic, fast-paced retail environments, image-based product identification provides an alternative to barcodes, especially for use-cases that do not require perfect 100% accuracy.
Language
English
Keywords
Computer vision
Product identification
CNN
HSG Classification
contribution to scientific community
HSG Profile Area
SoM - Business Innovation
Publisher
ACM
Publisher place
Bilbao, Spanien
Pages
8
Event Title
9th International Conference on the Internet of Things
Event Location
Bilbao, Spanien
Event Date
22.10.-25.10.2019
Official URL
Subject(s)
Division(s)
Eprints ID
259700