A trio of Stanford computer scientists have developed a deep learning model to geotag Google Street View images, meaning it can usually determine where a photo was taken just by looking at it.
The software is said to perform well enough to beat the top players in GeoGuessr, a popular online guessing game.
That’s not to say that the academics’ model can pinpoint exactly where a street-level photo was taken; it can instead reliably determine the country and make a good estimate, within 15 miles of the correct location most of the time – although more often than not it is farther than that distance.
In a preprinted paper titled “PIGEON: Predicting Image Geolocation”, Lukas Haas, Michal Skreta and Silas Alberti describe how they developed PIGEON.
This is an image geolocation model derived from their own pre-trained CLIP model called StreetCLIP. Technically speaking, the model is complemented by a set of semantic geocells – demarcated areas of land, similar to counties or provinces, which take into account region-specific details such as road markings, quality of infrastructure and road signs – and ProtoNets – a classification technique using only a few examples.
PIGEON recently competed against Trevor Rainbolt, a top GeoGuessr player known simply as Rainbolt on YouTube, and won.
The boffins in their article claim that PIGEON is the “first AI model that consistently beats human players in GeoGuessrranking in the top 0.01% of gamers.” Some 50 million or more people have played GeoGuessr, we’re told.
Alberti, a Stanford doctoral student, said The register“It was a bit like our little Deep Mind Competition“, a reference to Google’s claim that its DeepMind AlphaCode system can write code comparable to human programmers.
I think it was the first time the AI beat the best human in the world at GeoGuessr
“I think it was the first time the AI beat the best human in the world at GeoGuessr,” he said, noting that Rainbolt had prevailed in two previous matches with AI systems.
Image geotagging has become something of an art among open source investigators, thanks to the work of journalistic research organizations like Bellingcat. The success of PIGEON shows that it is also a science, which has important privacy implications.
While PIGEON was trained to geotag Street View images, Alberti believes this technique can make it easier to geotag almost any image, at least outdoors. He said he and his colleagues had tried the system with imagery datasets that didn’t include Street View imagery and it worked very well.
The other kind of intelligence
Alberti recounted a discussion with a representative from an open-source intelligence platform who expressed interest in their geolocation technology. “We think it’s likely that our method can also be applied to these scenarios,” he said.
When asked if this technology would make it even more difficult to conceal where the footage was captured, Alberti said that if you are on any street geotagging will become very likely because there are so many telltale signs of where you are.
“I was asked the other day ‘what if you’re on the street, somewhere in the middle of nature?'” he said. “Even there you have a lot of signs of where you might be, like the way the leaves are, the sky, the color of the ground. These can definitely tell you what country or what part of a country you’re in, but you probably can’t locate the city in question, I think indoor photos will probably still be very difficult to locate.
I think interior photos will likely remain very difficult to locate
Alberti said one of the main reasons PIGEON works well is because it relies on OpenAI CLIP as the base model.
“Many other geolocation models before, they just train the model from scratch or use an ImageNet based model. But we noticed that by using CLIP as the base model, it just saw a lot more images, saw many more small details, and is therefore much better suited to the task.”
Alberti said using semantic geocells has proven to be very important because if you’re just predicting coordinates, you tend to get poor results. “Even with CLIP as your base model, you’ll land in the ocean most of the time,” he said.
“We spent a lot of time optimizing these geocells, for example, making them proportional to population density in certain areas, and making them respect different administrative boundaries at multiple levels.”
Haas, Skreta and Alberti also devised a loss function – which calculates the distance between the output of the algorithm and the expected output – which minimizes the prediction penalty if the predicted geocell is close to the actual geocell. And they apply a meta-learning algorithm that refines location predictions within a given geocell to improve accuracy.
“That way we can sometimes match images up to about a kilometer away,” Alberti said.
Like Skreta noted in the Rainbolt video, PIGEON currently guesses 92% of the countries correctly and has a median mileage error of 44 km, which translates to a GeoGuessr score of 4,525. According to the research paper, the bird-themed model places approximately 40% of guesses within 25 km of target.
Game on. ®