We rely almost entirely, on hand curated human labeled data sets.
This means, if the person hasn't spent the time to label something specific in an image.
Even the most advanced computer vision systems, won't be able to detect it at run time.
Because it hasnt seen it, in the training set.
Now for production systems, we're use at Facebook.
We train on tens of millions of hand curated hand labeled images and that sounds like a lot but it's not nearly enough to solve the sort of problems we want to do so we sat down and said how do we not just 10x this but a hundred extra more that the size of the training set we can use to build our computer vision systems?
So we've built some breakthrough technology that takes publicly available hashtag images at a unprecedented scale.
We've trained on 3.5 billion training images using a public set of hashtags without any human curation in that data set.
As a result of training on such a large corpus, we have produced state of the art results That are 1% to 2% better than any known published system out there on the Image Net benchmark system.
More importantly, as I said this is isn't just for papers and demos, we're doing this for the real world.
So, we've taken the concepts of this and it is deployed in production right now, producting people everyday on Facebook.
And you can understand intuitively why this is so helpful to the sorts of problems we need to solve, because not only are we training on 3.5 billion images, it's about 10x more then the published state of the art.
But we're able to do it on a large number of categories because people hashtag a lot of different things.
And so not only is our accuracy improved, so we get the answer right, but we're able to get much more fine-grained examples.
Here's some real world examples of prior system and new system, where you get much more fine-grained labels across a variety of images.
And this is why this is so impactful to our system to protect people against bad content.