Everything revealed at Elon Musk's Tesla Bot event

Everything revealed at Elon Musk's Tesla Bot event

10:32

Everything revealed at Elon Musk's Tesla Bot event

Aug 20, 2021

Tech Industry

Speaker 1: Real, uh, the Tesla bot will be real. Um, but, uh, basically if you think about what we're doing right now with the cars, uh, Tesla is arguably the world's biggest robotics company, cuz our cars are like set semi sentient robots on wheels, neural nets, recognizing the world, understanding how to navigate through the world. Uh, it, it kind of makes sense to put that onto a humanoid form. Um, we're also quite good at, uh, senses and batteries and [00:00:30] uh, actuators. So, uh, we think we'll probably have, uh, a prototype sometime next year, uh, that, uh, is basically looks like this. Um, and it's intended to, um, uh, be friendly of course, um, and uh, navigate through a world, uh, built for humans and, uh, eliminate dangerous, repetitive and boring tasks. Um, we're setting [00:01:00] it such that it is, um, at a mechanical level, at a physical level. Speaker 1: Uh, you can run away from it, um, and, and most likely overpower it. So, uh, hopefully that doesn't ever happen, but, um, you never know it's a run, uh, 5 48, um, uh, has sort of a, a screen where the head is for useful information. Um, but as otherwise basically [00:01:30] got the order pilot system in it. So it's, uh, got cameras, got eight cameras and um, yeah, uh, what we want to, uh, show today is that, uh, Tesla is, uh, much more than an electric car company, uh, that we have, uh, deep AI activity, uh, in, um, hardware on the insurance level, on the training level. Um, and, uh, basically we, I think we're, I think arguably the leaders [00:02:00] in real world AI, as it applies to real world, um, um, and those of you who have seen the full self driving, uh, beta, I, uh, can appreciate the rate at which the Tesla neural net is loaning to, to drive. Speaker 2: So here I'm showing the video of the raw inputs that come into the stock and then neural processes that into the vector space. And you are seeing parts of that vector space rendered the instrument cluster on the car. Now, what I find kind of fascinating about this is that we are effectively [00:02:30] building a synthetic animal from the ground up. So the car can be thought of as an animal, it moves around, it senses the environment and, uh, you know, acts autonomously and intelligently. And we are building all the components from scratch in house. So we are building of course, all of the mechanical components, the body, the nervous system, which is all the electrical components and for our purposes, the brain of the autopilot. And specifically for this section, the synthetic visual cortex, we are processing just individual image and we're making a large number of predictions about these images. Speaker 2: So for example, here, you can see predictions [00:03:00] of the stop sign, uh, the stop lines, uh, the lines, the edges, the cars, uh, the traffic lights, uh, the curbs here, uh, whether or not the car is parked, uh, all of the static objects like trash cans, cones, and so on. And everything here is coming out of the net, um, here in this case, out of the hydrant. So that was all fine and great. But as we worked towards FSD, we quickly found that this is not enough. So where this first started to break was when we started to work on smart summon. Here, I am showing some of the predictions of only the curb detection [00:03:30] task, and I'm showing it now for every one of the cameras. So we'd like to wind our way around the parking lot to find a person who is summoning the car. Now, the problem is that you can't just on image space predictions. You actually need to cast them out and form some kind of a vector space around you. Um, so we attempted to do this using C plus plus and developed, uh, what we call, uh, the occupancy tracker at the time. Speaker 2: So here we see that the curb detections from the images are being stitched up across camera scenes, camera boundaries. And [00:04:00] over time now there were two pro two major problems. I would say with the setup. Number one, we very quickly discovered that tuning the occupancy tracker and all of its hyper parameters was extremely complicated. You don't want to do this explicitly by hand in C plus, plus you want this to be inside in neural network and train that end to end. Number two, we very quickly discovered that the space is not the correct output space, uh, want to make predictions in image space. You really want to make it directly in the vector space. So for example, here in this video, I'm showing single camera predictions in orange and multi-camera predictions in blue. [00:04:30] And basically if you, if you can't predict these cars, if you are only seeing a tiny SLI of a car, so your detections are not going to be very good and their positions are not gonna be good, but a multi-camera network does not have an issue. Speaker 2: Here's another video from a more nominal sort of situation. And we see that as these cars in this tight space, cross camera boundaries, there's a lot of Jan that enters into the predictions. And basically the whole setup just doesn't make sense, especially for very large vehicles like this one. And we can see that the multi-camera networks struggle significantly less with these kinds of predictions. So [00:05:00] here we are making predictions about the road boundaries in red intersection areas in blue, um, road centers and so on. So we're only showing a few of the predictions here just to keep the visualization clean. Um, and yeah, this is, this is done by this spatial, uh, R and N. And this is only showing a single clip, single traverse, but you can imagine there could be multiple trips through here. A and basically a number of cars, a number of clips could be collaborating to build this map basically and effectively an HD map, except it's not in a space of explicit [00:05:30] items. Speaker 2: It's in a space of features of a recurring neural network, which is kind of cool. I haven't seen that before. So here's putting everything together. Uh, this is what our architectural roughly like today. So, um, we have raw just feeding on the bottom. They go through rectification layer to correct for camera calibration and put everything into a common, uh, virtual camera. We pass them through, uh, res residual networks to process them into a number of features at different scales. We fuse the multi-scale information with, by FPN. This goes through transformer [00:06:00] module to re represent it into the vector space and the output space. This feeds into a feature queue in time or space that gets processed by a video module like the spatial RM, and then continues into the branching structure of the HDNet with trunks and heads for all the different tasks. Speaker 3: So here, uh, we are planning to do a line change, um, in this case, the car needs to do two back to back lane changes to make the left turn up ahead for this, the car searches over, uh, different menus. Um, so [00:06:30] in the first, the, the first one, it searches is, uh, lane change. That's close by, but the, uh, car breaks pretty harshly. So it's pretty uncomfortable. The next maneuver tries that's the lane change bit late. So it speeds up goes by in the other car, goes in front of the other cars and find it as the lane change, but now it risks missing the left turn. Speaker 3: We do thousands of such searches in a very short time span, um, because these are all physics based models. These features are very easy to simulate. Uh, and in the end we [00:07:00] set of candidates and we finally choose one based on the automat conditions of safety, comfort, and easily making the turn. So now the car has chosen this path and you can see that as the car executes this trajectory, uh, it pretty much matches what we had planned the cion plot on the right side here. Um, that one is the actual velocity of the car and the white line B underneath it is, was a plan. So we are able to plan for 10 seconds here and able to match that, uh, when we see in hindsight, so this is a well-made plan. [00:07:30] So a single car driving through some location can sweep out some patch around the trajectory tree using this technique, but we don't have to stop there. Speaker 3: So here we collect, collect a different clips, uh, from the same location, from different cars, maybe, uh, and each of them sweeps out some part of their road. Cool thing is we can bring them all together into a single giant optimization. So here these 16 different trips are organized, uh, using, uh, align, using various features, such as ROS lane [00:08:00] lines. All of them should agree with each other and also agree with all of the image space observations together. This is this previous, an effective way to label the road surface, not just where the car drove, but also in other locations that it hasn't driven yet. We don't know to stop at just the road surface. We can also reconstruct 3d static obstacles. Um, here, uh, this is, uh, reconstructed, uh, 3d point cloud from our cameras. Um, the main innovation here is the density of the point cloud. Typically these points require texture, uh, to [00:08:30] form associations from one frame to the next frame. But here we are able to produce these points, even on Textless surfaces, like the road surface or walls. Uh, and this is really useful to annotate arbitrary obstacles that, um, we can see on the, see in the world, how many everything together we can produce. These amazing datasets let ate, um, all of the road, texture, all the static objects and all of the moving objects, even through occlusions producing Speaker 4: Excellent thematic, uh, labels. If we put all of it together, [00:09:00] we get training optimized chip R D one chip. This was entirely designed by Tesla team internally all the way from the architecture to GDS out and package. This chip is like a GPU level compute with a CPU level flexibility, and twice the network chip level IO bandwidth, but we didn't stop here. [00:09:30] We integrated the entire electrical, thermal and mechanical piece out here to form our training tile fully integrated, interfacing with a 52 old DC input. It's unprecedented. This is an amazing piece of engineering. Our compute plane is completely agonal to power supply [00:10:00] and cooling that makes high bandwidth compute planes possible. What it is is a nine plop training tile. This becomes our unit of scale for our system and this it's real.

Up Next

Texas Sues Facebook Over Facial Recognition, Apple Could Release 3 New Macs Soon

tt-02-15-2022-00-00-47-13-still047

Up Next

Texas Sues Facebook Over Facial Recognition, Apple Could Release 3 New Macs Soon

01:21

Vaccinated Amazon employees can remove their masks, Samsung offers some high-end phones and tablets

tt-screenshot-021222

Vaccinated Amazon employees can remove their masks, Samsung offers some high-end phones and tablets

01:35

Choosing the best webcam

how-to-webcams-00-22-09-02-still087

Choosing the best webcam

10:14

Apple addresses AirTag tacking concerns, YouTube outlines new creator features

100-apple-airtags-2021

Apple addresses AirTag tacking concerns, YouTube outlines new creator features

01:37

I attended Samsung's Galaxy S22 event in the metaverse. It did not feel great

samsung-metaverse

I attended Samsung's Galaxy S22 event in the metaverse. It did not feel great

01:57

Samsung's Bridgerton spoof makes fun of Apple

Samsung's Bridgerton spoof makes fun of Apple

02:37

Samsung's Unpacked event in 11 minutes

s22-ultra1

Samsung's Unpacked event in 11 minutes

10:42

Oscars Nominations Are In, Apple Announces Tap to Pay on iPhone

apple-apple-pay-transaction-big-jpg-large

Oscars Nominations Are In, Apple Announces Tap to Pay on iPhone

01:18

Spirit and Frontier airlines are merging, Verizon's contracts extended to 3 years

tt-02-08-22-thumb

Spirit and Frontier airlines are merging, Verizon's contracts extended to 3 years

01:05

Joe Rogan apologizes for racial slurs, Amazon rumored to buy Peloton

gettyimages-1367949987

Joe Rogan apologizes for racial slurs, Amazon rumored to buy Peloton

01:23

Tech Shows

apple-core-w

The Apple Core

alphabet-city-w

Alphabet City

cnet-top-5-w

CNET Top 5

The Daily Charge

The Daily Charge

What the Future

what-the-future-w

What the Future

tech-today-w

Tech Today

Latest News All latest news

Trying an AI Exoskeleton in the Real World

dnsysx1

Trying an AI Exoskeleton in the Real World

08:19

Watch a Waymo Driverless Car Speed Down a Freeway

waymo-sanfranisco-hub-feat-holdingstill-cms

Watch a Waymo Driverless Car Speed Down a Freeway

02:17

What to Expect at Apple's iPad Event

What to Expect at Apple's iPad Event

06:18

How to Install Windows 11 on M-Series Mac Computers

240425-site-how-to-install-windows-11-on-an-m3-macbook-air-thumbnail

How to Install Windows 11 on M-Series Mac Computers

06:39

Rabbit R1: Here's What It Can Actually Do

240430-yt-rabbit-r1-review-v06

Rabbit R1: Here's What It Can Actually Do

13:18

How to Access AI on Your Apple Watch

Apple Watch 9 and snakeio app

How to Access AI on Your Apple Watch

00:56

Most Popular All most popular

First Look at TSA's Self-Screening Tech (in VR!)

innovation

First Look at TSA's Self-Screening Tech (in VR!)

03:06

Samsung Galaxy S24 Ultra Review: More AI at a Higher Cost

240123-site-samsung-galaxy-s24-ultra-review-4

Samsung Galaxy S24 Ultra Review: More AI at a Higher Cost

12:23

'Circle to Search' Lets Users Google From Any Screen

circlesearchpic

'Circle to Search' Lets Users Google From Any Screen

05:53

Asus Put Two 14-inch OLEDs in a Laptop, Unleashes First OLED ROG Gaming Laptop

asus-preces-00-00-25-11-still003

Asus Put Two 14-inch OLEDs in a Laptop, Unleashes First OLED ROG Gaming Laptop

02:59

Samsung Galaxy Ring: First Impressions

samsung-galaxy-ring-clean

Samsung Galaxy Ring: First Impressions

02:46

Best of Show: The Coolest Gadgets of CES 2024

240111-site-best-of-ces-2024-1

Best of Show: The Coolest Gadgets of CES 2024

05:24

Latest Products All latest products

Trying an AI Exoskeleton in the Real World

dnsysx1

Trying an AI Exoskeleton in the Real World

08:19

Rabbit R1: Here's What It Can Actually Do

240430-yt-rabbit-r1-review-v06

Rabbit R1: Here's What It Can Actually Do

13:18

Beats Solo 4 Headphones Review: Same Look, but Better Sound and USB-C

beatssolo4still-cms2

Beats Solo 4 Headphones Review: Same Look, but Better Sound and USB-C

09:00

Robosen's Megatron Transformer Is Too Much Fun for an Evil Robot

240419-megatron-v04

Robosen's Megatron Transformer Is Too Much Fun for an Evil Robot

06:56

Battle of the Humanoid Robots: MenteeBot Is Ready

240423-yt-menteebot-ai-robot-v08

Battle of the Humanoid Robots: MenteeBot Is Ready

03:19

2025 Audi Q6, SQ6 E-Tron: Audi's Newest EV Is Its Most Compelling

cnet-audiq6

2025 Audi Q6, SQ6 E-Tron: Audi's Newest EV Is Its Most Compelling

06:58

Latest How To All how to videos

How to Install Windows 11 on M-Series Mac Computers

240425-site-how-to-install-windows-11-on-an-m3-macbook-air-thumbnail

How to Install Windows 11 on M-Series Mac Computers

06:39

Tips and Tricks for the AirPods Pro 2

airpods-pro-2

Tips and Tricks for the AirPods Pro 2

08:27

How to Watch the Solar Eclipse Safely From Your Phone

How to Watch the Solar Eclipse Safely From Your Phone

02:59

Windows 11 Tips and Hidden Features

240311-site-windows-11-hidden-tips-and-tricks-v2

Windows 11 Tips and Hidden Features

05:19

Vision Pro App Walkthrough -- VisionOS 1.0.3

VisionOS 1.0.3

Vision Pro App Walkthrough -- VisionOS 1.0.3

12:11

Tips and Tricks for the Galaxy S24 Ultra

240216-site-galaxy-s24-ultra-tips-and-hidden-features-2

Tips and Tricks for the Galaxy S24 Ultra

06:53