Report: World-Renowned Education Company Exposes Details of Over 100,000 Students Worldwide in Massive Data Breach
vpnMentor's research team has discovered a data breach related to McGraw Hill, an education publishing company based in the USA.
McGraw Hill’s online education platform is used by universities across the USA and Canada to host and facilitate online classes. As a result, 100,000s students were potentially exposed to a range of online attacks. Their private data – including their grades and personal information – was made available to anyone with a web browser.
Our team discovered two misconfigured Amazon Web Services (AWS) S3 buckets apparently belonging to McGraw Hill; one production bucket with more than 47 million files and 12TB+ of
data, as well as one non-production bucket with more than 69 million files and 10TB+ of data. In total, the buckets contained more than 22 TB of data and over 117 million files.
Data Breach Summary
|Headquarters||New York City, USA|
|Industry||Education, Publishing, E-learning|
|Size of data||22+ TB|
|Suspected no. of files||Over 117,500,000|
|No. of people exposed||100,000s|
|Types of data exposed||PII data (names, email addresses, grades, and more) from students and staff, course content, McGraw Hill’s backend files (incl. PDFs, spreadsheets, and image files), and much more|
|Potential impact||Privacy violation, phishing attempts, doxing, violation of FERPA regulations.|
|Data storage format||AWS S3 Bucket|
McGraw Hill is one of the big three American education content publishing companies. According to their website, they’ve been in the publishing industry since McGraw Publishing Company and Hill Publishing Company merged in 1917.
The company has been building online education software for years, but the Covid-19 pandemic accelerated this branch of the company’s business. Students used this software to access lectures, upload homework, and much more.
Timeline of Discovery and Owner Reaction
- Date discovered: June 12, 2022
- Date of 1st contact attempt with McGraw Hill: June 13, 2022
- Date of 2nd contact attempt and follow up to previous contacts: June 15, 2022
- Date of 3rd contact attempt and follow up to previous contacts: June 20, 2022
- Follow up to previous contacts: June 22, 2022 and June 24, 2022
- Date of 4th contact attempt: June 27, 2022
- Dates we contacted USA CERT (Via email and reporting form): June 27, 2022; June 29, 2022; July 01, 2022; July 04, 2022
- Date of 5th contact attempt and follow up to previous contacts: June 29, 2022
- Follow up to previous contacts: July 01, 2022 and July 04, 2022
- Date we contacted hosting service (AWS): July 07, 2022
- Date of Response: July 09, 2022
- Follow up sent to AWS: August 16, 2022
- Contact made with McGraw Hill through their website’s live chat. Received contact information for senior cybersecurity director and sent email: September 8, 2022
- Follow up sent to senior cybersecurity director: September 19, 2022
- Follow up sent to senior cybersecurity director, and response received: September 21, 2022
- Date of Action: Sensitive files removed from the public buckets on July 20, 2022, according to McGraw Hill senior cybersecurity director
It appears that McGraw Hill was using AWS S3 buckets to store data collected from their online education service.
In this case, our team originally discovered two unsecured Amazon Web Services (AWS) S3 buckets containing over 22TB of files and data. Upon investigating, we determined that the data belonged to McGraw Hill’s online learning platform, connected to the AWS account.
Once we confirmed that McGraw Hill was responsible for the data breach, we contacted the company to notify them and offer our assistance. After our initial attempt at outreach failed, we tried the company again and expanded our outreach to more departments within McGraw Hill, including their chief information security officer.
With no reply from anyone we contacted at McGraw Hill, we emailed the United States Computer Emergency Response Team (US-CERT).
In total, we messaged McGraw Hill nine times between June 13 and July 4, but we never received a reply. Additionally, we reached out to US-CERT four times between June 27 and July 4, but did not receive a reply.
On July 7, we contacted Amazon AWS to inform them about their customer and followed up on August 16. It is important to note that Amazon isn’t responsible for the misconfiguration.
After getting no response, we tried other ways to contact the company. By using McGraw Hill’s live chat on their website, we received contact information for their senior cybersecurity director. We emailed him on September 8, then sent follow up emails on September 19 and 21 after receiving no response.
On September 21, McGraw Hill’s senior cybersecurity director responded to our messages and told us the sensitive files had been removed from the public buckets on July 20.
Example of Entries in the S3 Bucket
This breach from McGraw Hill was significant in both the amount of data exposed, as well as the number of people and organizations it could affect. If malicious or criminal actors discovered the exposed data, it could bring harm to students, teachers, universities, and McGraw Hill itself.
Types of files we saw in the breach include:
- Excel sheets listing student names, email addresses, and grades;
- Files showing students’ completed assignments, grades, and performance reports;
- Files showing syllabi from teachers;
- Reading material for certain courses;
- Private digital keys from McGraw Hill;
- Source code from McGraw Hill.
Leaked digital keys mean bad actors could decode encryption on data from McGraw Hill, or even access their servers. Meanwhile, leaked source code allows bad actors to more easily search for other vulnerabilities.
We estimate that this exposure potentially affected 100,000s students. In the limited sample we researched, we could see that the amount of records varied on each file from ten to tens of thousands students per file. Due to the amount of files exposed and because we only review a small sample following ethical rules, the actual total number of affected students could be far higher than our estimate.
This breach exposed students from universities across the US, Canada, and elsewhere, including:
- Johns Hopkins University
- University of California, Los Angeles
- University of Toronto (Canada)
- University of Michigan
- McGill University
- University of Illinois
- Washington University in St Louis
The following screenshots are samples of the types of data exposed, which we’ve redacted in order to protect the identities of the people affected. In order to confirm that the data related to real individuals instead of data from a platform test, we used publicly available information to verify a small sample of records in the database. Taking the PII data from numerous records,
we found the social profiles of students on various social media platforms that matched the records in McGraw Hill’s open buckets.
Data Breach Impact
We are unable to determine if any malicious hackers found the unsecured buckets before McGraw Hill deleted the sensitive files. The exposed data would have been enough for skilled hackers to commit many of the most common forms of fraud or online attack against the students exposed, including:
- Identity theft
- Phishing campaigns
- Doxing and harassment
- Many more…
However, even if the exposed data wasn’t sufficient to exploit for criminal gains, it could also be used to carry out complex phishing campaigns.
In a phishing campaign, criminals send victims fake emails imitating real businesses and organizations. By building the victim’s trust, they hope to trick them into any of the following actions:
- Provide additional PII data (i.e., social security numbers) or private information (i.e., bank account details) that can be used in the fraudulent activities listed above.
- To input debit or credit card details into a fake payment portal so they can be scraped and used by criminals or sold on the dark web.
- To click a link embedded with malicious software that infects a user’s device, such as malware, spyware, and ransomware.
Due to the number of people exposed in this data breach, cybercriminals would only need to successfully scam a small fraction for any criminal scheme to be considered successful. Furthermore, once this information is out in the open, it may be used against the victim repeatedly for the rest of their life.
The data breach is also a violation of privacy for students using McGraw Hill’s software and services. It exposed sensitive information like completed assignments and performance reports.
An important part of the university experience is having a safe environment to learn and progress. However, a data breach like this violates that trust by exposing students’ efforts and failures to anyone: family, friends, future education facilities, potential employers, and so on.
Furthermore, under US Federal law, student education records are official and confidential documents, by virtue of the Family Educational Rights and Privacy Act (FERPA). A student’s grades may not be released or posted in any personally identifiable way without prior written permission from the student. As a result, by exposing these records, McGraw Hill may be in direct violation of FERPA, and could face enforcement actions from the relevant US government bodies.
Impact for McGraw Hill and its Clients
On top of exposing data from the students, this data breach also exposed some sensitive information for McGraw Hill.
In the files, we saw items such as digital keys and source codes which can cause additional harm in the future for McGraw Hill. For example, bad actors could use digital keys to access McGraw Hill’s servers and cause further breaches of data. Meanwhile, an exposed source code makes it much easier for hackers to find vulnerabilities in a product or database and gain access to highly sensitive areas which data security protocols would typically protect.
In addition, a breach like this can damage consumer trust in the McGraw Hill brand and the brand of the universities which use its products.
Advice from the Experts
It’s important to note that open, publicly viewable S3 buckets are not a flaw of Amazon Web Services. They’re usually the result of an error by the owner of the bucket. Amazon provides detailed instructions to AWS users to help them secure S3 buckets and keep them private.
In the case of McGraw Hill, the quickest way to fix this error would be to:
- Make the bucket private and add authentication protocols.
- Follow AWS access and authentication best practices.
- Add more layers of protection to their S3 bucket to further restrict who can access it from every point of entry.
For The Public
If you think you’ve interacted with McGraw Hill recently and are concerned about how this breach might impact you, contact the company directly to find out what steps it's taking to protect your data.
About Us and Previous Reports
vpnMentor is the world’s largest VPN review website. Our research lab is a pro bono service that strives to help the online community defend itself against cyber threats while educating organizations on protecting their users’ data.
Our ethical security research team has discovered and disclosed some of the most impactful data breaches in recent years.
This has included an enormous data breach exposing over 10,000 students from a document verification program in Israel and India. We also revealed that personal information like financial
documents and health records were exposed when a public agency in Makati, Philippines experienced a data breach. You may also want to read our VPN Leak Report and Data Privacy Stats Report.
The purpose of this web mapping project is to help make the internet safer for all users. We never sell, store, or expose any information we encounter during our security research.