View

2 comments

Emerging Technology From the arXiv

December 1, 2014

How Google "Translates" Pictures Into Words Using Vector Space Mathematics

Google engineers have trained a machine learning algorithm to write picture captions using the same techniques it developed for language translation.

Translating one language into another has always been a difficult task. But in recent years, Google has transformed this process by developing machine translation algorithms that changing the nature of cross cultural communications through Google Translate.

Now that company is using the same machine learning technique to translate pictures into words. The result is a system that automatically generates picture captions that accurately describe the content of images. That’s something that will be useful for search engines, for automated publishing and for helping the visually impaired navigate the web and, indeed, the wider world.

The conventional approach to language translation is an iterative process that starts by translating words individually and then reordering the words and phrases to improve the translation. But in recent years, Google has worked out how to use its massive search database to translate text in an entirely different way.

The approach is essentially to count how often words appear next to, or close to, other words and then define them in an abstract vector space in relation to each other. This allows every word to be represented by a vector in this space and sentences to be represented by combinations of vectors.
Google goes on to make an important assumption. This is that specific words have the same relationship to each other regardless of the language. For example, the vector “king - man + woman = queen” should hold true in all languages.

That makes language translation a problem of vector space mathematics. Google Translate approaches it by turning a sentence into a vector and then using that vector to generate the equivalent sentence in another language.

Now Oriol Vinyals and pals at Google are using a similar approach to translate images into words. Their technique is to use a neural network to study a dataset of 100,000 images and their captions and so learn how to classify the content of images.

But instead of producing a set of words that describe the image, their algorithm produces a vector that represents the relationship between the words. This vector can then be plugged into Google’s existing translation algorithm to produce a caption in English, or indeed in any other language. In effect, Google’s machine learning approach has learnt to “translate” images into words.
To test the efficacy of this approach, they used human evaluators recruited from Amazon’s Mechanical Turk to rate captions generated automatically in this way along with those generated by other automated approaches and by humans.

The results show that the new system, which Google calls Neural Image Caption, fares well. Using a well known dataset of images called PASCAL, Neural image Capture clearly outperformed other automated approaches. “NIC yielded a BLEU score of 59, to be compared to the current state-of-the-art of 25, while human performance reaches 69,” says Vinyals and co.
That’s not bad and the approach looks set to get better as the size of the training datasets increases. “It is clear from these experiments that, as the size of the available datasets for image description increases, so will the performance of approaches like NIC,” say the Google team.
Clearly, this is yet another task for which the days of human supremacy over machines are numbered.
Ref: arxiv.org/abs/1411.4555 Show and Tell: A Neural Image Caption Generator

2 comments. Share your thoughts »

Tagged: Computing

Reprints and Permissions | Send feedback to the editor

Translate

Monday, December 1, 2014

MIT Tech Review- Google

MIT Technology Review

2 comments. Share your thoughts »

Yahoo Labs' Algorithm Identifies Creativity in 6-Second Vine Videos

Seven Must-Read Stories (Week Ending November 29, 2014)

Seven Must-Read Stories (Week Ending November 22, 2014)

Machine-Learning Algorithm Ranks the World's Most Notable Authors

Japanese Artists Solve The Problem of How To Sell Multiple Copies of Interactive Artworks

Seven Must-Read Stories (Week Ending November 15, 2014)

Why the World Needs Anonymous

Seven Must-Read Stories (Week Ending November 8, 2014)

When AI Experts Have “It’s Alive!” Moments

8 hours ago

How Google "Translates" Pictures Into Words Using Vector Space Mathematics

10 hours ago

How Buildings Could Keep Cool without Electricity

View from the Marketplace

New Technology for Tracking Consumers Across Devices Grows Results

21 hours ago

Printing Circuitry for Bionic Implants

2 days ago

Other Interesting arXiv Papers (Week ending November 29, 2014)

MIT Technology Review Special Edition: Best In Tech 2014

3 days ago

Seven Stories You Shouldn’t Miss (Week Ending November 29, 2014)

3 days ago

Startup’s Wristband Can Track Seizures

4 days ago

Recommended from Around the Web (Week Ending November 29, 2014)

4 days ago

From Beyond the Grave, Steve Jobs Still Wins Plenty of Patents

4 days ago

For Keyssa, Faster File Sharing Starts with a Kiss

White Paper: Addressing Four Key Issues for Successfull Programmatic Ad Buying

5 days ago

The Same Name Puzzle: Twitter Users Are More Likely to Follow Others With The Same First Name But Nobody Knows Why

5 days ago

Google Glass Failed, but Here’s the Path Its Successors Will Take

6 days ago

Why Sapphire Was So Hard for Apple to Master

1 week ago

Yahoo Labs' Algorithm Identifies Creativity in 6-Second Vine Videos

No comments:

Post a Comment

Blog Archive

About Me

Seven Stories You Shouldn’t Miss
(Week Ending November 29, 2014)