How are the big boys using big data?

By sjg / Posted on 07 June 2012

Last night Google held a press event unveiling the recent additions to the Google Map/Earth arsenal, automatic 3D mapping from aerial imagery and a mobile StreetView capture backpack. While we’ve seen this kind of 3D mapping from companies like C3, which Apple acquired a few years back (expect to see that in iOS 6 next week), this wasn’t the highlight of the event for me.

During the “Next Dimension of Maps” event Google announced that the StreetView cars have driven over 5 million unique miles of roads and collected 20 petabytes of data on their travels. Let’s stop and think about how much data that is. 20 petabytes is equivalent to 20,480 terabytes (TB), if you buy a standard PC today it usually comes with a 1TB hard drive. It’s like you have sat in 1 seat of Pittodrie Stadium in Aberdeen and Google has bought out the rest of the stadium. Although putting size aside it wasn’t really clear to me what they were doing with all that data until I read a recent article that made me really start to think.

Pretty lonely up there in Pittodrie Stadium

As Google drives around the world taking images of the streets they collect Lidar data, a high detail 360-degree laser scan of the surrounding area. This gives an accurate point cloud of buildings, signs, surfaces etc. around the car. You may have seen the rectangle box in StreetView showing you the front of buildings. So why are they collecting this data? Google has never said.

Google's self-driving car which I saw in April 2012

Well another little project that hit the headlines was Google’s automatous, self-driving cars. These cars also use Lidar to detect the surrounding traffic, people crossing the street etc., to give the car’s software an accurate image of the cars current location. This is where it gets interesting. Have Google used the StreetView data they collected to train the self-driving cars software?

What the self-driving can see.

It makes perfect sense! Drive around all the roads in the world and gather the data to train the AI in the car to all the possibilities of junctions, road types, crossings, signs, international differences, if you can see it on the road – chances are a StreetView car has saw it as well. The big question is what came first? Was it Google’s intension to use StreetView data to train a car? Or did some sharp engineer say, “I know what to do with this data” in his 20% time? We will never know the answer but we can all learn a lesson from Google.

When you plan a project to collect data then collect as much data as possible. Don’t throw away data that you collect because you think it’s not valuable now. Chances are that somewhere down the line the data you collect may be useful to someone somewhere.





Do you have something to say?