Gravatar API lets you scrape millions of user profiles
Yesterday, I reported on BleepingComputer how Gravatar API makes it easy for web scrapers and bots to automatically enumerate Gravatar profiles and download user data.
While the official documentation indicates that users should only be able to access Gravatar profiles if they know the usernames or account email, the method discovered by a security researcher finds a hidden route which renders that guideline moot.
Italian security researcher, Carlo Di Dato discovered that the hidden URL API routes as those shown below can allow a web scraper or bot to enumerate every single Gravatar profile and collect its public data.
http://en.gravatar.com/1.json
http://en.gravatar.com/2.json
http://en.gravatar.com/3.json
Similar endpoints exist for obtaining data in other formats too, e.g. my own Gravatar profile (ID 178860042) has an XML representation in addition to JSON’s:
http://en.gravatar.com/178860042.xml
According to a YCombinator’s Hacker News user, there are upwards of 194.13 million Gravatar profiles up for scraping because of this intentional design flaw.
This figure is of course, an approximate estimate as using profile IDs beyond this number continued to bring up valid Gravatar user data.
It is unlikely that Gravatar will be patching this design flaw.
The fact that the user data being returned by the API is already public means this does not constitute a data leak or privacy violation.
However, as Di Dato told BleepingComputer, this design choice makes it easier for anyone to enumerate user accounts and engage in mass data collection:
Referring to a Gravatar user whose data Di Dato was able to view, the researcher said, “Of course, Mr. Stephen knows that registering on Gravatar, his data will be publicly accessible. What I’m almost sure he doesn’t know, is that I was able to retrieve this data querying Gravatar in a way which should not be possible.”
He continued telling BleepingComputer, “As Gravatar states in its guides, I should have Mr. Stephen’s email address or his Gravatar user name to perform the query. Without this information, it should have been almost impossible for me to get Mr. Stephen’s data, right?”
Design choices that enable user enumeration of public data may not seem like an obvious security issue, however, what is preventing a spammer from abusing this design choice and build a marketing list of over a million contacts – especially for profiles which have their email address and phone number exposed in them.