In this week’s post, we break down some of the computer vision deployment costs associated with developing a scalable, production-ready solutions in retail and how AI system design can impact total costs. 

This is the latest in our series of blog posts that explore the challenges in deploying and operationalizing computer vision for product recognition in retail. Our first two posts provide high-level overviews of the technical roadblocks and the operational considerations for retail adoption of computer vision. For the past two weeks, we’ve been diving deep, exploring why computer vision-aided product recognition poses such a challenge in retail and what to consider in terms of product enrollment and scaling as retailers move from “can it recognize my product catalog?” to “what would it take to get started?” Today, we continue that thread as we think about the costs of scaling computer vision across your storefronts.

When it comes to calculating costs, retailers need to consider two critical vectors: the cost of building a holistic solution based on their requirements and the cost of deploying and maintaining that solution at scale. In this post, we’ll take a look at the production costs of building such a system, and next week, we’ll begin to explore the operational costs.

Computer vision in context: Scoping the solution

As computer vision technology matures, it enables the development of retail solutions that were previously impossible. Retailers and retail solution providers can begin to conceptualize how computer vision might be used in myriad ways to combat shrink, address operational challenges, and improve the customer experience. But of course, computer vision by itself does none of these things. As a technology, computer vision isn’t ‘the’ solution—it’s part of the solution.

While retailers and retail solution providers can capitalize on the new capabilities that a computer vision engine provides them, they still need to identify the specific problem to solve and conceptualize the rest of the solution. Depending on the use case, the solution would include a number of different components, such as the camera(s) that provide the video, the bandwidth and computing power required to transmit, store, and process the video, customized hardware, integrations into existing systems, software to analyze and present the results, etc. While no single solution may need all these components, it’s a pretty good bet that most solutions will need more than one.

Let’s break this down into specific components.

Computer vision deployment costs: Hardware

In order for retailers to deploy computer vision at scale, hardware costs need to be carefully considered, calculated, and multiplied by the number of units a retailer is considering putting into production. Alongside accuracy and speed, hardware costs are often the limiting factor to scaling a computer vision system beyond one or two concept or pilot stores.

One of the reasons this is such a challenge is that a typical computer vision system doesn’t have “a” hardware component, but rather several that can quickly drive up the deployment costs. For example, a typical computer vision edge device has a Bill of Materials (BOM) that includes:

  • one or more cameras (with clearly defined specifications)
  • a GPU for AI model evaluation
  • a CPU for running the application
  • a network connection (wired or wireless)
  • enough RAM to support the anticipated workload
  • local storage for programs and data (ie, videos, pictures, etc)

Different computer vision solutions can have wildly different minimum requirements based on the needs of the computer vision model, the deployment approach (e.g., edge versus cloud), use case, etc. Reducing the cost of the BOM is critical for cost-effective, widespread deployment in retail. By working closely with the computer vision provider, you may find areas where you can reduce cost. But make sure any cost reductions do not lower the accuracy or responsiveness of computer vision solution to unacceptable levels.

For cameras, the use case and environment determine the resolution required and the capabilities needed. An on-shelf monitoring use case could benefit from a depth sensing camera as an example, while a traffic analysis use case might want infrared cameras that can detect objects or people by their thermal profile. In addition, of course the quality of resolution needed may vary based on the use case, environmental conditions, and flexibility of the AI model. As a result of all these factors, estimating the cost of cameras can be quite difficult as they may range from relatively cheap (under $50, for example) to quite expensive (more than $3,000 each).

Another potential area for cost containment is whether a particular AI model evaluation can be optimized to run on a CPU and eliminate the need for a GPU, which would remove the cost of the GPU from the BOM. Another area where cost savings can be found is if a SoC (system-on-a-chip) can be used for deployment which also can help drive down the BOM cost.

As the market continues to innovate, retailers will likely find companies like Ambarella are pushing compute into cameras using SoC, which also has the potential to reduce hardware costs.

Once retailers and retail solution providers reduce the BOM costs as much as possible, the retailer now has to think about how that cost scales. For a self-checkout use case, the BOM costs need to be multiplied by the number of self-checkout stations per store and then multiplied again by the number of stores the retailer plans to deploy the solution to.

Such costs get multiplied further if the use case involves deploying a computer vision solution directly on a cart. In that case, the BOM has to be multiplied by the number of carts per store, times the number of stores. That quickly turns into a large number. Such a mobile use case also requires additions to the BOM in terms of batteries and the charging infrastructure as ongoing operational costs.

All the outlined costs can also shift depending on whether the system requires on-premises servers or cloud-based ones, and those decisions in turn impact decisions around connectivity and bandwidth.

Ultimately, it’s obvious that a lower BOM is critically important for a computer vision solution to make financial sense for retailers. As retailers consider computer vision options, they should be taking into account the full bill of materials—what hardware requirements does their preferred computer vision choice require and how those cumulative costs affect the overall cost of the solution and the anticipated ROI of the innovation.

Computer vision deployment costs: Software development, system integration, and licensing

Regardless of the hardware involved, computer vision-based solutions will still have significant requirements around integration and software licensing and/or development. For example, regardless which problem is being solved, the solution won’t exist in a vacuum and will need to integrate into a retailer’s existing technology ecosystem.

Depending on the use case, that might need a tight integration with existing inventory management systems or point-of-sale (POS) systems, or need to work side-by-side with RFID and electronic shelf labels, or feed analytic and management systems. Any of these integrations may require significant resources to build the integration, adjust or update current workflows, or adjustments to infrastructure. For instance, systems that rely on cloud-based infrastructure may have severe latency issues without significant upgrades to internet connectivity.

Such considerations open up a different set of issues – what are the minimum requirements for each retail location? Many retail environments, especially smaller or older stores, may not have the necessary infrastructure to support computer vision systems with sophisticated requirements. This includes adequate internet bandwidth, power supply, and physical space for hardware installations. Retailers need to think about how such requirements can or should be implemented across their locations, what it would cost to do so, and even if such implementation is possible.

One half of the cost equation

As we’ve outlined here, building a scalable, production-ready computer vision solution for retail involves a number of factors that can dramatically impact the cost of rolling out the eventual solution. These production costs are often the primary focus for retail solution providers as they begin to build the system; however, equally important are the costs associated with the ongoing, operational deployment of such a solution. Next week, we’ll explore those costs in depth.