Image: Bernd Dittrich (unsplash)
The information of 2.6 million Duolingo users appeared on a hacking platform, providing malicious actors with the opportunity to orchestrate targeted phishing schemes leveraging the revealed details.
Duolingo stands as one of the premier global language-learning platforms, boasting a user base of over 74 million monthly active participants.
Back in January 2023, an individual had put up the information of 2.6 million Duolingo users for sale on the since-decommissioned Breached hacking platform at a price tag of $1,500.
This data set comprised both public details like login and real names, and more confidential records, including email addresses and proprietary details linked to the Duolingo platform.
Even though a user’s real name and login credentials are accessible through their Duolingo profile, the presence of email addresses in this dataset escalates the potential risk, since these emails can be utilized for malevolent campaigns.
Upon the data being available for purchase, Duolingo communicated to TheRecord, highlighting that the data was extracted from public profiles. The platform was then exploring the necessity for additional protective measures. Yet, Duolingo refrained from commenting on the inclusion of email addresses in the data, a detail that is not for public viewing.
Initially identified by VX-Underground, this dataset containing details of 2.6 million users reappeared on a revamped version of the Breached hacking platform, with a modest price of 8 site credits, approximately $2.13. A post on the platform announced the availability of the Duolingo data set for download.
The collection of this data was facilitated by a publicly accessible application programming interface (API). The public nature of this API has been acknowledged since March 2023, with several individuals broadcasting its existence and offering guidelines on its operation.
Through this API, a simple username submission would yield JSON results displaying a user’s public profile. Intriguingly, the same API could also be manipulated to verify if an email address corresponded to an active Duolingo account.
BleepingComputer validated that despite reports of its misuse, this API remains freely accessible online. By using this API, a vast quantity of email addresses, presumably unveiled in preceding data compromises, was verified against Duolingo accounts. Subsequently, these emails contributed to the formation of the dataset that merged public and private details.
Another malicious entity provided insights from their own data extraction through the API. They suggested that those interested in using this information for phishing should prioritize specific data fields. These fields identify Duolingo users with elevated permissions, making them prime targets.
Upon reaching out to Duolingo regarding the continued accessibility of the API, BleepingComputer has yet to receive a response.
The Perspective on Scraped Data
Often, businesses perceive scraped data as a minor concern, primarily if the data is public, despite the complexities of its compilation.
However, the infusion of public data with private details, like phone numbers and email addresses, amplifies the risk associated with the exposed information and could infringe upon data protection regulations.
Case in point: In 2021, a major data compromise hit Facebook. An API glitch in the “Add Friend” feature was manipulated to connect phone numbers with accounts, impacting 533 million users. In the aftermath, the data protection authority in Ireland levied a hefty fine on Facebook, amounting to €265 million ($275.5 million), as a repercussion of the data breach.
A more recent incident involved a Twitter API anomaly, which led to the extraction of public details and email addresses from countless user accounts. This incident caught the attention of the data protection regulators.