Kaggle fixes vulnerability that disclosed private leaderboard data via API
Kaggle, an online community of data scientists and machine learning (ML) practitioners, has been exposing private competition data due to a misconfigured API.
Kaggle, a Google subsidiary frequently hosts competitions for its community users with monetary rewards.
This week, multiple Kaggle users have reported being able to view private leaderboard JSON fields being returned as part of HTTP responses in one of the biggest data leaks.
This has many contestants concerned with regards to the legitimacy of contests held thus far, and the monetary rewards that have been earned by winners.
Private leaderboard results exposed in HTTP responses
As first reported by Analytics India Magazine, researchers observed Kaggle had been exposing private leaderboard results via its HTTP API.
Ideally, these fields should be kept private up until the end of a competition to avoid any cheating from taking place.
In a forum thread titled, “Is private leaderboard accessible?“ more community members shared their experience.
Luca Massaron, a data science and modelling senior expert demanded, “At this point Kaggle should make a public statement explaining for how long that has been exposed and if that has been exploited by someone.”
“When such a thing happens in a company, we call it a ‘data breach,’ and this is truly a data breach for the importance of the critical information that leaked out without anyone being aware of it,” said Massaron.
But Kaggle has told Security Report, a data breach never took place.
Their official response considers this a vulnerability due to which users were able to view their own private leaderboard scores and ranks (but not of others) prior to a competition deadline.
Kaggle users could use the web browser’s inbuilt “devtools” and inspect Network requests, such as in Chrome.
When viewing the asynchronous requests and responses (under XHR) tab, the getUser
API response revealed JSON fields with private data.
This means, contestants in a competition could view their own private data and potentially use it to cheat their way up to the top, thereby rendering the hard work put in by honest contestants moot.
“This is a very serious and unfortunate issue. Winning or getting at top of Kaggle competitions [requires a] huge workload and dedication. Knowing that people could easily probe the private results to adjust their model and submissions can be discouraging,” said Sarigne.
Kaggle aware and working on a fix
A Kaggle representative responded within the same thread:
“Please know we’re aware of this, looking into it, and working on it! It may take a little bit of time for us to release a full statement as we evaluate the full breadth and depth of the issue, including which competitions may have been impacted and the full extent of the impact on leaderboard results,” said staff member Addison Howard.
The bug appears to have been fixed according to multiple users, who are now seeing null
being returned as opposed to confidential data.
Although the bug may have been discovered yesterday, and patched rapidly, it does cast doubt on the legitimacy of competitions held so far, and whether Kaggle has been handing out awards thus far to the deserving and honest winners.
“An investigation into the origin of the issue revealed this bug may have been live as early as November 18, 2019. Our preliminary findings indicate that this was not widely accessed. We have identified one individual who we believe used this bug to gain an advantage in a past competition and will continue our investigation and take appropriate actions if warranted. Thankfully, we expect the results of the majority of closed competitions to remain unchanged,” stated Kaggle in their statement.
“There is no sugarcoating how long this issue was live and we are profoundly disappointed this bug slipped past our checks and tests. Your hard work depends on the integrity of our platform and we do not take this responsibility lightly,” the statement continued.
Update 31-December-2020: Included Kaggle’s official response above and corrected that this issue occurred from a vulnerability, not a data breach.