Several months after machine learning scientist Yaroslav Bulatov started working at Google seven years ago, he joined the optical character recognition (OCR) team. This was before the neural net age, when various algorithms based on feature extraction dominated OCR techniques. Bulatov was working on using one of these old technologies that could read English characters to read books in other languages when a major new project came up: Google’s Street View team wanted OCR to identify street signs, addresses, and other text in their huge database of images.
“Originally they just had tons of people looking at Street View images and transcribing everything they saw,” Bulatov said.
Bulatov said his team first attempted to apply their existing technology, but with little success. Looking for alternatives, the team hit on neural networks, which were rising in profile as Google Brain started up within the company. Drawing off of Google’s DistBelief project, Bulatov designed the pipeline for identifying address numbers via neural networks. Implemented as a production service, the technology went on to identify hundreds of millions of numbers from the Street View dataset.
After working at Google Brain for several years, Bulatov left the company and worked on TensorFlow projects at OpenAI, including a memory utilization project (reducing neural net memory consumption from O(n) to O(sqrt(n)) for n layers, similar to this study).
Today, Bulatov is working on developing an open source solution that simplifies distributed TensorFlow on AWS. Bulatov said he’s excited by the potential that distributed computing unlocks.
“If you look at the biggest breakthroughs of AI, there was always some sort of increase in computational power,” Bulatov said. “GPUs can only be perhaps 10% bigger, which means you either have to wait for new tech to increase wafer size, which is quite long-cycle, or you just have to use more chips. We’ve reached how small we can make the transistors, and we’ve almost reached how big we can make them, so the next step is to increase the number of those chips that we use. That will be the next stage in this development of computational power.”
Every major tech company is working on this, Bulatov said: Facebook recently trained ResNet in 1 hour with 256 NVIDIA GPUs, for example. Bulatov cited an even more recent study where researchers trained AlexNet in 20 minutes, but in a sign of the pace of progress, the paper was revised to 11 minute training in the time between Clusterone’s interview and this blog post’s publication.
While all the big players recognize that massive distributed training is the next step, Bulatov said, moving in that direction requires huge resources, including reliable machines and high-performance networking, and also poses implementation challenges.
“Everyone can use a GPU — you just plug it in — but not everyone can do parallel processing,” he said. “You need three months just to get it working.”
To that end, Bulatov hopes his open source project will open the possibility of distributed TensorFlow to more researchers.
“When I left Google, I realized it’s actually really hard to do all the things I did at Google, even if you have the money and even if you have TensorFlow,” he said. “Everyone has their own hacked solutions [for distributed TF] — OpenAI had their own hacked solution.”
Bulatov is a month into the project, which he said has a six-month timeline. He also hopes the project will make it easier to reproduce experiments. For example, he said that to his knowledge no one has yet reproduced TensorFlow’s 8x speedup on ImageNet, despite Google publishing the source code.
“This is really a missing piece,” he said. “Just look at the source code and say reproduce.”
Bulatov said he believes democratizing AI is an important goal — and one that has a high chance of happening.
“Current AI research is very iterative,” he said. “You can look at the progression of ImageNet from the original accuracy to the current accuracy: it’s basically this progressive optimization process. People try thousands of experiments and people improve on it, publish it, and next person improves on it.”
Speeding up this iterative process would result in better models and progress of the field as a whole, Bulatov said. While it might be a bit more expensive, 30x faster training would make a big difference.
And there’s much room for progress to be made: despite the hype, Bulatov noted, AI techniques have succeeded in breaking open a relatively narrow set of problems, such as image or audio labeling and translation. While deep learning has revolutionized these applications, Bulatov said, “That’s really the only task we’ve solved, but that’s not what AI is. There’s definitely a big gap.”
While there’s a chance we achieve AI with general intelligence capabilities in 5–10 years, Bulatov isn’t optimistic. There were bursts of interest in AI in the 1960s and ’80s that didn’t live up to their promise at the time — he recalled seeing a magazine cover from the 1980s that predicted AI would be ‘solved’ in 5 years.
“I think it’s quite possible that we’re in the same kind of state,” he said.
According to Bulatov, there’s a wide range of opinions within Open AI and other AI organizations about when general intelligence will be achieved (many are optimistic “because as AI researchers we are paid to be,” he joked).
“Those who have been around longer and seen optimism before are more pessimistic,” he said.