The Future of ML: Unsupervised Learning, Reinforcement Learning, or Something Else?

Yoshua and Samy Bengio, Yann Lecun, Rich Sutton and Sergey Levine talk about the future of machine learning and how unsupervised learning methods will likely get us to human-level intelligence in machines.

4 years ago   •   6 min read

By Craig S. Smith

Supervised learning has been the focus of most artificial intelligence research over much of the past decade, but the future of machine learning likely lies in unsupervised learning methods. I spoke to Yoshua and Samy Bengio, Yann LeCun, Rich Sutton and Sergey Levine about the future of machine learning and what will likely get us to human-level intelligence in machines.

Add speed and simplicity to your Machine Learning workflow today

Get startedContact Sales


Unsupervised-led learning methods were the first ones that allowed us to train deep networks. Then around 2010-2011 we realized that we didn't need these unsupervised learning techniques. We could train directly supervised models that are very deep. Then industrial applications started coming very quickly with computer vision, speech recognition, machine translation and things like that. But it is not going to be enough for human-level AI. Humans don't need that much supervision.


It's not just supervised and unsupervised. There's multiple things in the middle. There's self-supervised, there's reinforcement learning. There are many ways to get supervision cheap from the data you already have. So, it has become a much more complex space. What links all of it is how you represent the data, so representation learning is actually becoming more central.

Yann Lecun, who shared a Turing award with Yoshua and Geoffrey Hinton last year, talked about his bet on self-supervised learning.


There is a limit to what you can apply deep learning to today due to the fact that you need a lot of labeled data. It's only economically feasible when you can collect that data and you can actually label it properly, and that's only true for a relatively small number of applications.
Supervised learning works great for categorizing objects and images or for translating from one language to another, if you have lots of parallel texts. It works great for speech recognition, if you have collected enough data.
But there is some learning process that animals have access to, to acquire all the knowledge they have about the world, that machines don’t have. My money is on self-supervised learning, for machines to learn by observation, or learn without requiring so many labelled samples, perhaps accumulate enough background knowledge by observation that some sort of common sense will emerge.
Imagine that you give the machine a piece of input, a video clip for example. You mask a piece of the video clip and you ask the machine to predict what is next from what it is seeing.
For the machine to train itself to do this, it has to develop some representation of the data. It has to understand that there are objects that are animate and others that are inanimate. The inanimate objects have predictable trajectories, the other ones don't. And so, you train a system in this self-supervised manner with tons and tons of data. There's no limit to how many YouTube videos you can make the machine watch. It will distill some representation of the world out of this. And when you have a particular task, like learning to drive a car or recognizing particular objects, you use that representation as input to a classifier, and you train that classifier.
Could we build machines at some point that will be as intelligent as humans? The answer is, of course, there's no question. It's a matter of time.

Rich Sutton talked about the form of unsupervised learning that he pioneered: reinforcement learning.


I was looking for something that was like reinforcement learning because reinforcement is an obvious idea if you study psychology. There are two basic kinds. Pavlovian conditioning and instrumental or operative conditioning.
Pavlovian conditioning is like, ring the bell and then you give the dog a steak. After a while, just from ringing the bell, he salivates showing that he anticipates the steak’s arrival. So, it's a kind of prediction learning.
And then there's control learning, and control learning is called instrumental conditioning or operative conditioning, at least those two names, where you're changing your behavior to cause something to happen. In Pavlovian conditioning, your salivation doesn't influence what happens. Whereas the canonical operative conditioning is, the rat presses a bar and then gets a food pellet. The act of pressing the bar is instrumental in getting the reward.
So that's really the idea of reinforcement learning. It's modeled after this obvious thing that animals and people do all the time. In supervised learning, the feedback instructs you as to what you should have done. In reinforcement, the feedback is a reward and it just evaluates what you did. So, evaluation versus instruction is the fundamental difference.

And finally, Sergey Levine, one of the world's most prominent researchers at the intersection of machine learning and robotics, talked about taking unsupervised learning into the real world with the robots he works with at the Berkeley Artificial Intelligence Lab.


Our hope in the long run is that our work can be a stepping stone towards a future where you have many networked robots that are out there in the world, and when they're not busy doing something more productive, they'll just play with their environment and learn.
They'll essentially say, ‘okay, if I'm not currently tasked with a job, if my human owner doesn't want me to do anything in particular, I'll just use my free time to practice. I'll play around with objects in my environment, understand more about how the world works and use it to sort of build up my body of knowledge so that when I'm later on placed in some new setting, hopefully I've learned enough from the many past situations I've been in to do something reasonable in this new setting.’ And that will be the transfer. And the transfer, as in all learning systems, comes from sufficient breadth of experience.
So, if you have enough breadth, you have seen enough variety, then you're ready for anything. So that's the dream. The reality right now is that this is an early step in that direction. Right now, the robot learns about one particular environment. It spends a few hours playing with a door, moving it this way and that, and it can open that one door. One of the things we want to do next is actually scale this up. In the lab downstairs we have six different robots, so perhaps we have all of them playing with different kinds of doors and maybe then we'll see that when we give it a new door, it will actually generalize to that new door because it's seen enough variety.
Variety is key to generalization and to transfer, but I also think that in robotics in the long run, that shouldn’t be a problem because robots exists in the real world, the same real world that we exist in. That real world forces diversity on you. You can't escape it.
Our working assumption is that if we build sufficiently general algorithms, then all we really have to do, once that's done, is to put them on robots that are out there in the real world doing real things and the variety of experience will come to the robots because they're in the real world just like we are. So, the robot essentially imagines something that might happen and then tries to figure out how to make that happen. Of course, imagining things that could happen requires some understanding of what are realistic situations in the world and what are not realistic situations.
I can set a goal for myself. I can say I'd like to make this cup levitate. That’s going to be a very difficult goal for me to reach because in this universe that's just not a realistic situation. But if I set myself the goal that I want this cup to be five centimeters to the left, that's something I can learn how to do and I can practice and that will teach me something about the physics of this cup.

This post is adapted from the second half of Episode 30 of the podcast Eye on AI. Check out the first half covering Climate Change, China and AI, find the full recorded episode here, or listen to it below.

Spread the word

Keep reading