...

Mohammad Rastegari, a rising star in the artificial intelligence field, works as a researcher at the Allen Institute for Artificial Intelligence and as Chief Technology Officer at XNOR.ai, a startup spun out of his work at AI2. He previously was a visiting scholar at UC Berkeley and recipient of the Facebook Graduate Fellowship while completing his Ph.D.

Clusterone spoke with Rastegari about his current work and thoughts on the future of AI (responses were condensed and edited for clarity).


Tell us about XNOR.ai.

Because I had worked on efficient methods for machine learning during my doctoral studies, I always had on my mind, “How can we make things more efficient?” So I had this idea to binarize all the parameters in a neural network.

Usually the problem with deep networks is that there is this huge demand for massive GPU cluster servers. They are so expensive in terms of required power, memory, and computation. So I was thinking, how can we make this process way more efficient?

I did my Ph.D. on binary features and how to binarize images and models, so I thought I could binarize the entire neural network model, meaning that instead of having floating point operations we could have logical operations.

This problem actually has existed since almost two decades ago — IBM always had this idea to binarize a neural network. But the problem was, no one knew how to train these models with binary values. Training is an issue with these models because the values are discrete, so there is not a well-defined gradient.

So, I came up with an idea of how we can mix a continuous function with a discrete function for backpropagation for those neural networks, creating a learning mechanism that could train the neural network with all the values as binaries.

I showed in a paper almost two years ago that we can achieve close to state-of-the-art results with this type of method.

This brings a whole new opportunity to the industry, because they can move all this massive computation from GPU clusters to small CPUs. As a proof of concept, we showed that we could do one of the most expensive tasks in AI — object detection, which requires categorizing and localizing objects in images — on the CPU of a small phone.

And we’re not stopping there; we went way cheaper: small CPUs, $5 computers like Raspberry Pi 0, and we showed we can run deep learning on those devices.

After that, we received lots of traction from the industry community and we thought there’s a huge commercial value here. So we spun out and started XNOR.ai. We want to hand off computation from the cloud to the device.

We are agnostic towards any AI model, as long as they have any sort of neural network in them. This brings a lot of opportunity to industry: people have more privacy because they don’t need to send their data to the cloud to be processed. For example, right now the high-end computer vision programs in industry are all based on sending your images into the cloud.

We can process all of your computation on your device without leaking any of your data to third parties. We are also touching speech and all other AI applications of machine learning.


What trends in AI or other projects excite you?

The thing that is most exciting these days about AI is it is starting to work. If you think about two years ago, everything was on paper — nothing was actually working. Nobody outside academia could actually use anything we were doing.

There were two things that were the bottleneck those days: no one knew how to work with big sets of data, and no one knew how to train a gigantic model with billions of parameters. When two key elements met each other — big datasets and GPU processing which could handle billions of parameters — the big bang happened.

Things started working. The accuracy of image recognition jumped, and now we are seeing that we are surpassing the human power of recognizing objects.

I hope we can find more applications of AI that we can’t yet do anything about, like general understanding of behaviors or reasoning about the data we have. Seeing an image, can you tell me what’s going to happen if I push this object a little bit: Is it going to fall or not?

If you see a picture of a person, tell me if he’s going to open a door of his car. Is he going to work or not? These are as yet open problems that we as humans can easily do but AI cannot. Understanding abstract knowledge and answering complicated questions: If I draw an emoji on a small piece of paper you can easily tell me, this is a happy person.

This is an abstraction in your mind as a human, but we don’t know how to computerize this kind of understanding.


Which approaches are promising for moving towards that general understanding or reasoning?

At the Allen Institute, we are trying to solve problems of general understanding of the natural environment that a person can perceive, like moving an object.

If I have an object in the air, what will happen if I let go of this object? It will fall down. If I push this object, what’s going to happen? We created a model based on machine learning that can look at images and predict the futures of the objects in the images.

One of the approaches that I think will make reasoning possible in the future is building a massive simulated environment. In the Allen Institute, we’re creating this environment by trying to build multiple different scenes: a room in a house, a kitchen.

We are trying to make them as realistic as possible. You can put a virtual agent, a human, that can interact with that environment. You can open a cabinet, you can grab an object, you can open an object, and the environment has some physical properties of our world.

We have the agent play in that environment, assigning some sort of task — find the TV, open the microwave. Based on the agent’s success, we give a reward and as the agent tries and fails, it starts to learn about how to understand objects and differentiate between objects.

That’s the approach but you can see how difficult it is to scale. You need to create a big set of virtual environments, which is very difficult. Right now we have a few designers in the Allen Institute whose full-time job is to design these realistic scenes for us.

But I think the key approach is that we should have this huge and massive virtual environment dataset, similar to what happened to images. Once this huge dataset of images met massive computational power, the big bang happened.

Once we create an agent that can interact with that environment we can transfer its brains to a robot in our real-world environment.


In 5–10 years from now, do you still see yourself working on these same problems?

I cannot say anything more than 2 years from now. We cannot say anything about 5 years from now. In 2 years, we may have some of those big virtual environment datasets, but other than that it really cannot be predicted.

Maybe a new hardware technology can revolutionize everything, who knows. This is not something we can predict 5 years from now.


Should future efforts focus on developing general AI or just solving a bunch of problems using a narrower approach?

For industrial purposes, many companies like to solve their own problems; they don’t care about general AI. This has just started.

Deep learning showed they can solve problems with AI, and now many companies want to adopt this. But my personal perspective is we should go after general AI, because the main problem is there. General AI is the way we are.