Veenome: The future of interacting with video is here
I'm Mollie Wood from CNET.com here at Southwest, Southwest 2012 in Austin, Texas.
I'm Ron Antonio.
Here today we have Kevin (Lename?) is the founder and CEO of (??).
This is a really cool video product.
We talk some of the stuff that we see video is the future.
Can you kind describe to people about what the (??) is doing.
It's really simple.
So (Venom?) just tells you what is the video.
So you could find out the stuff is in the video, the products, the people, the brands and then you could use that data for comers, for target advertising, and for search discovery.
So you say that's really simple.
I mean you guys have built your own technology.
Right to do actual video scanning and pull out not subjects but people.
The concept is simple the machine in the back is not (call?) right so it.
The end of the day you know you're gonna get data basically on what's on your video but the process to get there is actually really complex.
So it's really...
No I didn't mean it.
Just go ahead and keep up on the action.
So basically how we do this is we take all the whole video let's say it's you know ten seconds long as key frames, we find the key frames as where things are changing via certain percentage.
We then take basically tags for each one of these key frames and we then look all that data in yearly and say okay what kind of relationships can we find amongst those tags say if there is you know two tags as a car another one says Mercedes.
It's kinda (??) together calling Mercedes car and then provide more detailed tags and that's really the key is.
If I just tell you there's a car in the video.
It's (me?) but it's something interesting more useful because you know it's a car, you can say it's a car.
You might know it come to this.
Now when I got the chance to go to your website at least play and you have examples of how this technology works.
It's amazing 'cause you're literally dragging your mouse around some items and (??) changes for example there's like a Mac book air in the frame.
It changes in an iPhone.
It (??) and now is this...
There any human in directions and what the (??) you know this files.
This is all you know a computer brain is interpreting these images and determining what's on the screen.
So there is a human element if you want it.
So the way that this product works it is really it's a be to be kind of product in a sense that I don't wanna build a site that people come to watch videos right.
I wanna build a way for people to index the videos and (productize?) the videos and...
So yeah that's a good work.
So it's a harsh tag product that is.
We don't really wanna create this destination site.
So we allow people to basically take this technology and use on their own.
It's a platform and so what they want to they can actually brand out on their own.
So if we are here and we're you know that I know what kind of jeans and were in units.
It's my video on my site.
I can say you know these are Levi jeans in the case that we can't identified them.
So there's an option to kind of manually correct and thus far the people that we're talking to about this is just about APR really interested in actually being able to control that stuff for brand safety reasons and just the idea that you know you might not be able to tell ever exactly what kind of sweater I'm wearing.
It's if you wanna add manual (??) you can.
So you provide you have a catalog and tags already right?
And you just starve sucking all kinds of data about what Mercedes looks like and kind of thing by you're saying like a business contracts with you and then they can.
I mean that...
The problem with it is obviously is that the world of possible things you can shoot a videos infinite right?
And so there is no way for us to always have everything right and so we allow for that and that's...
It can be use for custom control too like if you have...
If you wanna do the product replacement things like that videos and can actually they control that say I just make all the pizza (dominos?) pizza.
So what are some potential users like commerce seems the obvious one right?
I saw that (??) on a TV show.
How do you seek companies implementing this in a consumer basing way?
So originally I was kinda like infatuating with this idea what you saw which just need and just clickable video kind of like just cover over something and by right and this really straight forward.
And so that's like one of those things that I really consumers straight forward really like it but what I found is that the engine that drives it, we have an API that we're launching here at Southwest, Southwest and that API is much more personal because you can use that API to do things like we just take this data and now you can target advertising based on what's in the video.
You can connect videos together now.
So if I know I'm watching one of the video and has you know a bunch of iPhones in it and I know that you know I can connect this on the videos took together so lousy to be able to discover things in video more.
So it's kind of go from one videos the next and actually have it connect with through a line of concepts which I think pretty powerful.
So you actually potentially cracked the not on video recommendation unrelated video.
I mean to just went in the (??) contest?
We just won out here actually.
We did, I mean that one of those things that like I think I became sort of...
I was very interested in this clickable video concept but I think that it'll be as the early stage I think of (Venom?), we're gonna do more business around they can use of the API for that things like that connecting videos together doing ad targeting help people find videos right.
So that same data I can now find it for it's like if you ever look for something on Youtube that doesn't have a 100 thousand user or a million views.
There are no tags for it.
So if no way of finding unless you know what the title of this.
So I've got (??) so right (??) placements I think on TV shows and a lot of these TV networks are trying to find some sort of product or something that can bring them again because we know not a lot of people are interacting on their computers at the same time as they're watching TV shows by they're kind of doing two things at once.
So I've got imagine.
Have you talked to some TV networks or how they use this API is that it...
You know we seen programs like (??) others that in here what's (point?) on a TV and then they say okay you're watching the show.
So my imaginations run a little while.
It's gonna thing where these networks are approaching like okay.
How do we involve not only can we hear or nowhere you are in the show but ads or recommendations are conserved to my screen that knows exactly what's on that screen while they'll be able to listen to it.
Yeah you just they kinda find like another use of the API.
We're just sort of like if you're not gonna make this stuff clickable on the screen maybe you can type that into like a second screen now out right where you can actually sort of like if you're watching this the show you pull out your iPad out right.
And now you know how to sync it with the programming.
So now we (??) watching and we know what time is that so we now can sort of say as you're watching the show here's a sure you can (buy?) and then the network or the video producer can control out experience like it's there (out?) and so that's another piece of the (??).
The reason why the APIs is just strong business case is just that the big publishers have premium content.
They can control the experience right today might not want me to put my little hover tag over there you know $10 million episodes.
So that's one of those things that I think is (??) for them.
So really what you're saying is there's kind of a bidding more for your company between CBS and Google.
'Cause I mean when you described that over the recommending videos for Daisy changing related videos for discovering like that.
Youtube (??) over it.
I know it does and I think I guess our...
Mine theory in this is that like a data of videos of paper right now is the videos is in the site and no one knows what's in it.
So someone is gonna solve that problem.
You can't do anything interesting, really, really truly (instinct?) with experience or on video.
I'm sure you have that data behind it right to be able to connect things and find out more things are.
And so someone gonna do that.
Or concern right?
Yeah exactly search and so someone's gonna do it and we think that we're gonna be the first people to do it and do it well.
So I have to say it's super impressive and it's kind of amazing that you...
Honestly I feel like you were the first credible source that I've talked to doing this and that's 5 or 6 years probably people publicly trying to figure out the video search to index like what kind (??) do you have (what there?).
So well I'm one them you know...
And there's like well hello.
So you know...
Right here in Monday buzz.
I think a lot of it had to do with looking at the problem differently.
I think so there is temptation and this is a little bit of (??) but I think it's pretty short forward.
So there is temptation when you're looking at...
That the way the idea came about was I was building iPhone apps and then Android apps and I was using image recognition to do like alternative (answers?) so let's take a picture of a credit card and pick the data it gets another thing you have to answer or producing an over (??) known for the lots so 16 digits expiration dates (??) code and the address is not very fun.
So you gonna take a picture of the card that saves sometime and what I found was that like experience was really you gave a flash on the camera.
You don't get the right data and for sort of see like okay you know I just need more chances to get this picture just right and so that from that became okay well I need more frames that's video.
Now the challenge is that with image recognitions is all about like getting what you can and from this one frame.
Let's say I wanna glean this pattern from this one frame and this one image and figure out what the products are in that.
What we do is we say you know let's step back and look at all those frames as a whole like look at them (??) and say what we can know about all of these frames and then how we can connect them like so that if I'm looking at you know ten frames maybe I can find a little bit more about what's in those things because I have so many different angle and things like that.
So it's sort of more about like looking at the problem little differently that I think it is sort of raw brain part because the reality is you could stare at the frame all day and never know that it's an Apple iPad we never know that it's you know Mercedes Benz car but thirty frames like you might see that and so if you can connect that stuff that's sort of the insight I think that is helping us.
And we have the process in power now that like the computer horse power is there.
The process that much.
So you're not trying to attach all new data and videos which I think it's like videos are just trying to do it fast.
So they're using the data in the video and then matching up against.
I mean it's...
With the volume of video recognitions you know billions of videos I think Youtube does five billion views a day.
I mean that's you can have people filling in their tags a private (??)...
Unless they really want too.
There's more videos online in our people in the world.
So that seems pretty (staggering?) you know and they're just kind of kinda keep coming right and then they keep that cellphones (half?) every cellphone is camera (??) the video now so it sort like the volume can emerge in the market.
So now the $65 million question is how accurate are you?
So we asked a lot and it's a sort of a hard question 'cause as I always say like we're really accurate and super accurate.
It depends on the video.
But I guess so we have internal QA system.
So everytime we come up with the new algorithm or how we are connecting the data together and how we're kind processing the frames we do a blind QA and we basically need we have one in our company sort of a start up atmosphere.
We are one does QA.
It look in our frame and the tag and you say I'm gonna give this a one to five on a sliding scale of specificity and accuracy.
Right so like I'm looking at a Mercedes Benz you know pop sickle stick is a zero right but Mercedes Benz is a five and then like car is in three.
So there's a (sliding?) scale and in that way we can kind of see if we're approving this with mutual trick in term of this group.
And we think right now we're like at a four or we're like a be of where we can be and I think there's a lot of room that extra miles is a big deal and that's part of reason why we provide that manual option.
Is that someone wants to do just a quick find replace and say you know what these are actual (pepsi's?) just let me just do that.
It is right you know since we were using natural language and process the tag that are ready.
I think it just go away and say you know what make all the cans pepsi cans in it and so forget it.
So you obviously have just amazing intriguing product and algorithm what's for you guys kinda of the (??).
I think it's just sort of we just...
I just really wanna solve this problem.
I think it's one of those things that you know people again like you said it's been...
There's a lot of company that have common gone around this idea of a clickable video and video search and video discovery and I think like there's a lot of stuff that we can instill done on video once we resolve this problem.
I just really wanna knew be part of that solution.
Also the bidding more between...
You know who...
That seems pretty (??).
We don't judging for that.
All right, Kevin.
Super cool technology.
Thank you for talking to us.
You can find (??) Southwest, Southwest interviews and of course like a lot more interesting video that will hopefully soon be scan in index at CNETTV.com.