A Quick Look into Custom Vision A.I.

With the popularity of artificial intelligence technology on the rise, many of the larger software companies have created services that allow users/companies to have access to A.I. tools for their own projects and software solutions. These services range from optical character recognition, natural language processing to image recognition.

This post aims to introduce one such service in the form of Azure Custom Vision, an image recognition tool by Microsoft that is still in its early stages, but it is showing a lot of promise. First, we will quickly introduce what it is. We will then follow through how we at Inceptum used this tool for a project. We will explain a few important things during the creation of a project, as well as go through a simple walkthrough how we approached the project as well as describe some bumps along the way, and how we overcame them.

What is Custom Vision?

It is a neural network service on Azure specifically designed to identify what is in a picture, with the addition of being able to teach it yourself. It is a totally blank slate where you can upload images and tag them. With enough uploads and correct tags, you can then upload a new image and ask the service to make a prediction of what the new image is.

screenshot of the GUI of a project in Custom Vision. There are technical details, menus, and examples. — GUI of a project in Custom Vision showing simplicity, but don’t be fooled, it is powerful

The three steps of machine learning

What we wanted to do is to use this service to identify what certain bottles and brands were in a fridge. For testing purposes, we went with beer. Reasons why, a fridge is a nice frame, all bottles are relatively the same shape, and hell hath no fury like me when I’m given a warm larger so it’s easy to find bottles of beer in fridges in many supermarkets, corner shops, and bars to use for reference images and testing. We went to the supermarket, took a load of photos, bought some beers and had a photo shoot in the office.

Many images of a beer bottle taken from different angles. — Our beer photoshoot, pictures taken from many angles, the more the better

But how do you teach an A.I.? Where do you start? At first, we went very specific, this is beer x 330 ml, and this is also beer x 0.5 L, and beer x 2 L. Very quickly we found out that we taught it a bunch of nonsense. It just got confused. At that point, you could probably show it a picture of a sausage dog and it would say that it was some dark beer. The results from showing it an actual fridge with beer were not even close to being accurate.

After trying several approaches, the one that worked best is splitting the workload into 3 parts, and thus have three separate projects on our Azure Custom Vision portal.

To recognize what a shelf is
To recognize what a bottle is
To classify what the bottle is

Steps 1 and 2 require the ability to detect an object within a picture, while step 3 only requires classification. So, it was vital that we selected the correct project type when creating one.

A form to create a new project on the Custom Vision portal — Required fields for a new project, highlighted are the two possible project types

Classification type projects only return a prediction of the entire image, whereas object detection gives us a boundary box within the image with a prediction for that box.

All the returned data is in a JSON format, so it’s quite simple to deserialize the information to your models. It’s all quite self-explanatory, as you can see below, the only part that’s “tricky” is understanding that the region information is the bounding box and its returned as a percentage of the given image.

{
  "Id": "string",
  "Project": "string",
  "Iteration": "string",
  "Created": "string",
  "Predictions": [
    {
     "Probability": 0.0,
     "TagId": "string",
     "TagName": "string",
     "Region":  {  
       "Left":  0.0,
       "Top": 0.0,
       "Width": 0.0,
       "Height": 0.0 
…

Our PoC solution was to send an image into the first Custom Vision project to find our shelves, crop the shelves, send each shelf to the second project and crop each bottle, and send each bottle to the third project for classification of what the bottle of beer is. Now it is a lot of requests, but by this method, we were able to get very accurate results in a short amount of time, and this is a specific solution for a specific task. We saved all the data into a NoSQL database for future reference.

What Custom Vision also does, is it automatically adds all images that were sent to the service for prediction to the training images for the service in an “untagged” state, allowing you to easily add those images with a tag so that they can be used to create a new iteration of the project that will be more accurate. And you will need a lot of reference images.

It’s your turn now

This was just a very brief overview, and I did not want to get technical in this post, rather I wanted this to be a quick introduction into the A.I. world using Cognitive services to hopefully encourage people to explore the possibilities it provides. The documentation for the API on how to use it is quite well done and has examples that should get you started quickly.

And no, no Terminators on the horizon, but I might make a robot to fetch me a beer on those hot summer days.

a big, green robot with long arms that have three fingers. — Buddy from Fallout 4, a robot that serves beer and my ultimate life goal