Profile cover photo
Profile photo
Muthu Kannan (Manki)
1,716 followers -
சர்வம் கிருஷ்ண லீலா
சர்வம் கிருஷ்ண லீலா

1,716 followers
About
Muthu's posts

Post has attachment

Post has attachment
Sun sets behind Indian Ocean
Photo

Post has attachment
Kovilpatti market sells pasta. I also hear they sell broccoli and avocados too. I'm impressed.
PhotoPhotoPhoto
12/01/2017
3 Photos - View album

Post has attachment
2017 starts with a healthy breakfast. Sauteed vegetables with fried eggs.
Photo

Post has attachment

Post has attachment
Grilled chicken, asparagus, and sweet potato
Photo

Post has shared content
I've seen a lot of uncritical comments in the tech press lately about how Tesla's massive number of miles driven give them a huge data collection advantage over anyone else when it comes to training self-driving models.

I'd like to see more analysis of this claim.

From http://www.kurzweilai.net/googles-self-driving-car-gathers-nearly-1-gbsec , the Google self-driving car in 2013 gathered 750 MB/s (note: bytes, not bits) of sensor data. It's not clear to me whether this is before- or after-compression of compressible data types (I assume any "image-like" data is probably compressible).

Let's assume this is before-compression. Let's also assume that the overall data stream can be compressed 15:1 without being degraded enough to harm learning from it. (This number is from the "high quality" line on https://en.wikipedia.org/wiki/JPEG#Effects_of_JPEG_compression . JPEG is rather archaic as image compression goes, but then again, not all the data above is image data. This seems like at least a plausible value to do napkin math.) This gives us 50 MB/s of data. That's ~1.5 TB data in an 8 hr. driving day.

Google can easily store this much data onboard and upload it when the cars return to home base. So they can actually use it all for training.

Tesla doesn't have this kind of a "home base" that all customer cars return to, and it's clear they're not storing or uploading anything like 50 MB/s of data. They're not uploading 5 MB/s, or 0.5 MB/s, across their customer base. Neither the network traffic to send this nor the data centers to store it are apparent. So most data Tesla is collecting must be discarded locally, and only a small amount is being sent upstream.

How useful is that small amount? It's hard to say, but given https://twitter.com/elonmusk/status/811738008969842689 , a neural net is involved in at least some areas. From what I know about neural nets, you can distribute a trained model to a local device that can use it with relatively low CPU/memory requirements, but actually training a new model needs to occur in a resource-intensive environment (read: Tesla HQ, not in your car). So the neural nets in question here aren't being trained on the actual car sensor data from all those cars Tesla has sold. It seems unlikely they are being trained on the fragments of data sent back, either, because it's just not going to provide sufficiently high-band input to be useful (you can't train a vision algorithm on a written summary of a movie plot). I'd imagine Tesla is training them on data more like Google's: high-bandwidth data collected from a test fleet.

Note how Musk says "need to get a lot of road time". If all their existing data from their customers' cars was so useful, why couldn't they use that data? The obvious answer is that it's not, in fact, that useful. Either the sensor model is different, or the field data is far too highly compressed to train or validate things. I suspect the latter.

My conclusion is that Tesla's fleet of customer cars likely provides them far less benefit than everyone is uncritically assuming, when it comes to improving their software.

There are a lot of assumptions and leaps in the above. This isn't proof Tesla doesn't have a big advantage. But it does suggest that before the public and press trumpet Tesla's "clear lead" and repeat endlessly how Tesla's millions of miles driven are more than anyone else's, some reporters should dig into the links of the above chain to verify. Has anyone reported on Tesla's data collection bandwidth at various stages in the pipeline, how they compress or sample data, what they transmit back to HQ, how the data is used for model training, etc.? If yes, please link me, as that would be fascinating. If no, then next time you read someone state this data collection advantage as fact, ask yourself how certain that is.

In this as with much of life, more skepticism and critical thinking would be useful.

Post has attachment

You know your life has changed when a nursery rhyme is stuck in your head.

♪ ♫ bake me a cake as fast as you can ♫ ♪

Post has attachment
Wait while more posts are being loaded