Using the Google Vision API in R
After doing my post last month on OpenCV and face detection, I started looking into other algorithms used for pattern detection in images. As it turns out, Google has done a phenomenal job with their Vision API. It’s absolutely incredible the amount of information it can spit back to you by simply sending it a picture.
Also, it’s 100% free! I believe that includes 1000 images per month. Amazing!
In this post I’m going to walk you through the absolute basics of accessing the power of the Google Vision API using the RoogleVision package in R.
As always, we’ll start off loading some libraries. I wrote some extra notation around where you can install them within the code.
# Normal Libraries library(tidyverse) # devtools::install_github("flovv/RoogleVision") library(RoogleVision) library(jsonlite) # to import credentials # For image processing # source("http://bioconductor.org/biocLite.R") # biocLite("EBImage") library(EBImage) # For Latitude Longitude Map library(leaflet)
In order to use the API, you have to authenticate. There is plenty of documentation out there about how to setup an account, create a project, download credentials, etc. Head over to Google Cloud Console if you don’t have an account already.
# Credentials file I downloaded from the cloud console creds = fromJSON('credentials.json') # Google Authentication - Use Your Credentials # options("googleAuthR.client_id" = "xxx.apps.googleusercontent.com") # options("googleAuthR.client_secret" = "") options("googleAuthR.client_id" = creds$installed$client_id) options("googleAuthR.client_secret" = creds$installed$client_secret) options("googleAuthR.scopes.selected" = c("https://www.googleapis.com/auth/cloud-platform")) googleAuthR::gar_auth()
Now You’re Ready to Go
The function getGoogleVisionResponse takes three arguments:
Numbers 1 and 3 are self-explanatory, “feature” has 5 options:
These are self-explanatory but it’s nice to see each one in action.
As a side note: there are also other features that the API has which aren’t included (yet) in the RoogleVision package such as “Safe Search” which identifies inappropriate content, “Properties” which identifies dominant colors and aspect ratios and a few others can be found at the Cloud Vision website
This is used to help determine content within the photo. It can basically add a level of metadata around the image.
Here is a photo of our dog when we hiked up to Audubon Peak in Colorado:
dog_mountain_label = getGoogleVisionResponse('dog_mountain.jpg', feature = 'LABEL_DETECTION') head(dog_mountain_label)
## mid description score ## 1 /m/09d_r mountain 0.9188690 ## 2 /g/11jxkqbpp mountainous landforms 0.9009549 ## 3 /m/023bbt wilderness 0.8733696 ## 4 /m/0kpmf dog breed 0.8398435 ## 5 /m/0d4djn dog hiking 0.8352048
All 5 responses were incredibly accurate! The “score” that is returned is how confident the Google Vision algorithms are, so there’s a 91.9% chance a mountain is prominent in this photo. I like “dog hiking” the best – considering that’s what we were doing at the time. Kind of a little bit too accurate…
This is a feature designed to specifically pick out a recognizable landmark! It provides the position in the image along with the geolocation of the landmark (in longitude and latitude).
My wife and I took this selfie in at the Linderhof Castle in Bavaria, Germany.
The response from the Google Vision API was spot on. It returned “Linderhof Palace” as the description. It also provided a score (I reduced the resolution of the image which hurt the score), a boundingPoly field and locations.
- Bounding Poly – gives x,y coordinates for a polygon around the landmark in the image
- Locations – provides longitude,latitude coordinates
us_landmark = getGoogleVisionResponse('us_castle_2.jpg', feature = 'LANDMARK_DETECTION') head(us_landmark)
## mid description score ## 1 /m/066h19 Linderhof Palace 0.4665011 ## vertices locations ## 1 25, 382, 382, 25, 178, 178, 659, 659 47.57127, 10.96072
I plotted the polygon over the image using the coordinates returned. It does a great job (certainly not perfect) of getting the castle identified. It’s a bit tough to say what the actual “landmark” would be in this case due to the fact the fountains, stairs and grounds are certainly important and are a key part of the castle.
Turning to the locations – I plotted this using the leaflet library. If you haven’t used leaflet, start doing so immediately. I’m a huge fan of it due to speed and simplicity. There are a lot of customization options available as well that you can check out.
The location = spot on! While it isn’t a shock to me that Google could provide the location of “Linderhof Castle” – it is amazing to me that I don’t have to write a web crawler search function to find it myself! That’s just one of many little luxuries they have built into this API.
latt = us_landmark$locations[][][] lon = us_landmark$locations[][][] m = leaflet() %>% addProviderTiles(providers$CartoDB.Positron) %>% setView(lng = lon, lat = latt, zoom = 5) %>% addMarkers(lng = lon, lat = latt) m
My last blog post showed the OpenCV package utilizing the haar cascade algorithm in action. I didn’t dig into Google’s algorithms to figure out what is under the hood, but it provides similar results. However, rather than layering in each subsequent “find the eyes” and “find the mouth” and …etc… it returns more than you ever needed to know.
- Bounding Poly = highest level polygon
- FD Bounding Poly = polygon surrounding each face
- Landmarks = (funny name) includes each feature of the face (left eye, right eye, etc.)
- Roll Angle, Pan Angle, Tilt Angle = all of the different angles you’d need per face
- Confidence (detection and landmarking) = how certain the algorithm is that it’s accurate
- Joy, sorrow, anger, surprise, under exposed, blurred, headwear likelihoods = how likely it is that each face contains that emotion or characteristic
The likelihoods is another amazing piece of information returned! I have run about 20 images through this API and every single one has been accurate – very impressive!
I wanted to showcase the face detection and headwear first. Here’s a picture of my wife and I at “The Bean” in Chicago (side note: it’s awesome! I thought it was going to be really silly, but you can really have a lot of fun with all of the angles and reflections):
us_hats = getGoogleVisionResponse('us_hats.jpg', feature = 'FACE_DETECTION') head(us_hats)
## vertices ## 1 295, 410, 410, 295, 164, 164, 297, 297 ## 2 353, 455, 455, 353, 261, 261, 381, 381 ## vertices ## 1 327, 402, 402, 327, 206, 206, 280, 280 ## 2 368, 439, 439, 368, 298, 298, 370, 370 ## ## landmarks... landmarks ## rollAngle panAngle tiltAngle detectionConfidence landmarkingConfidence ## 1 7.103324 23.46835 -2.816312 0.9877176 0.7072066 ## 2 2.510939 -1.17956 -7.393063 0.9997375 0.7268016 ## joyLikelihood sorrowLikelihood angerLikelihood surpriseLikelihood ## 1 VERY_LIKELY VERY_UNLIKELY VERY_UNLIKELY VERY_UNLIKELY ## 2 VERY_LIKELY VERY_UNLIKELY VERY_UNLIKELY VERY_UNLIKELY ## underExposedLikelihood blurredLikelihood headwearLikelihood ## 1 VERY_UNLIKELY VERY_UNLIKELY VERY_LIKELY ## 2 VERY_UNLIKELY VERY_UNLIKELY VERY_LIKELY
Here’s a shot that should be familiar (copied directly from my last blog) – and I wanted to highlight the different features that can be detected. Look at how many points are perfectly placed:
my_face = getGoogleVisionResponse('my_face.jpg', feature = 'FACE_DETECTION') head(my_face)
## vertices ## 1 456, 877, 877, 456, NA, NA, 473, 473 ## vertices ## 1 515, 813, 813, 515, 98, 98, 395, 395 ## landmarks ## landmarks ... ## rollAngle panAngle tiltAngle detectionConfidence landmarkingConfidence ## 1 -0.6375801 -2.120439 5.706552 0.996818 0.8222974 ## joyLikelihood sorrowLikelihood angerLikelihood surpriseLikelihood ## 1 VERY_LIKELY VERY_UNLIKELY VERY_UNLIKELY VERY_UNLIKELY ## underExposedLikelihood blurredLikelihood headwearLikelihood ## 1 VERY_UNLIKELY VERY_UNLIKELY VERY_UNLIKELY
## [] ## type position.x position.y position.z ## 1 LEFT_EYE 598.7636 192.1949 -0.001859295 ## 2 RIGHT_EYE 723.1612 192.4955 -4.805475700 ## 3 LEFT_OF_LEFT_EYEBROW 556.1954 165.2836 15.825399000 ## 4 RIGHT_OF_LEFT_EYEBROW 628.8224 159.9029 -23.345352000 ## 5 LEFT_OF_RIGHT_EYEBROW 693.0257 160.6680 -25.614508000 ## 6 RIGHT_OF_RIGHT_EYEBROW 767.7514 164.2806 7.637372000 ## 7 MIDPOINT_BETWEEN_EYES 661.2344 185.0575 -29.068363000 ## 8 NOSE_TIP 661.9072 260.9006 -74.153710000 ...
To continue along the Chicago trip, we drove by Wrigley field and I took a really bad photo of the sign from a moving car as it was under construction. It’s nice because it has a lot of different lines and writing the Toyota logo isn’t incredibly prominent or necessarily fit to brand colors.
This call returns:
- Description = Brand name of the logo detected
- Score = Confidence of prediction accuracy
- Bounding Poly = (Again) coordinates of the logo
wrigley_logo = getGoogleVisionResponse('wrigley_text.jpg', feature = 'LOGO_DETECTION') head(wrigley_logo)
## mid description score vertices ## 1 /g/1tk6469q Toyota 0.3126611 435, 551, 551, 435, 449, 449, 476, 476
I’ll continue using the Wrigley Field picture. There is text all over the place and it’s fun to see what is captured and what isn’t. It appears as if the curved text at the top “field” isn’t easily interpreted as text. However, the rest is caught and the words are captured.
The response sent back is a bit more difficult to interpret than the rest of the API calls – it breaks things apart by word but also returns everything as one line. Here’s what comes back:
- Locale = language, returned as source
- Description = the text (the first line is everything, and then the rest are indiviudal words)
- Bounding Poly = I’m sure you can guess by now
wrigley_text = getGoogleVisionResponse('wrigley_text.jpg', feature = 'TEXT_DETECTION') head(wrigley_text)
## locale ## 1 en ## description ## 1 RIGLEY FnICHICAGO CUBSnORDER ONLINE AT GIORDANOS.COMnTOYOTAnMIDWESTnFENCEn773-722-6616nCAUTIONnCAUTIONn ORDER ## vertices ## 1 55, 657, 657, 55, 210, 210, 852, 852 ## 2 343, 482, 484, 345, 217, 211, 260, 266
That’s about it for the basics of using the Google Vision API with the RoogleVision library. I highly recommend tinkering around with it a bit, especially because it won’t cost you a dime.
While I do enjoy the math under the hood and the thinking required to understand alrgorithms, I do think these sorts of API’s will become the way of the future for data science. Outside of specific use cases or special industries, it seems hard to imagine wanting to try and create algorithms that would be better than ones created for mass consumption. As long as they’re fast, free and accurate, I’m all about making my life easier! From the hiring perspective, I much prefer someone who can get the job done over someone who can slightly improve performance (as always, there are many cases where this doesn’t apply).
Please comment if you are utilizing any of the Google API’s for business purposes, I would love to hear it!
As always you can find this on my GitHub
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…
Source:: R News