Have I Been Pwned Alerts You to Data Breaches Even When the Websites Won’t

Gail Lobel Rand Updated on July 02, 2023 Technical Editor and Interviewer

Despite advances in data security, many of the world’s largest companies still fall prey to data breaches. Troy Hunt, the founder of Have I Been Pwned, tells us how these breaches occur, how stolen data is used, its impact on corporations and individuals, and most importantly – gives us the tools to know if our personal data has been compromised and how to best protect ourselves after the fact.

How did you get into online security?

My background is in software development, later specializing in application architecture. When I worked for a large pharmaceutical company, they were outsourcing all their development to low-cost vendors in cheap Asia-Pacific markets and getting the sort of terrible software you'd expect from the lowest bidder. I saw loads of different security vulnerabilities which to me were really obvious but to them, not so much. So, as a way of educating others, I started blogging about security vulnerabilities and how to address them in application code. At the time, there really wasn't much good stuff out there for software developers, so this fulfilled the need and the rest is history.

What are the most common security vulnerabilities you uncovered?

All the obvious stuff like SQL injection, insecure direct object references, poor password storage, lack of transport layer encryption. Basically, everything in the OWASP Top 10, the most canonical document about how to do application security right, was consistently being done wrong.

But companies spend a lot of money on data protection. How is it so easily compromised?

It's an interesting thing. We see large companies with big security budgets focusing on things like critical systems and internal development, but they don't necessarily extend that focus to everything else in the periphery. Then there is the question of when security is applied. Many organizations still have their software “built” (makes air quotes!), finished, and tested by a vendor, and only apply security once it’s delivered. So, their security spin will identify a whole bunch of security vulnerability issues at the worst possible time - the end of the project.

The further down the project lifecycle you go, the more expensive it is to fix these defects by going back, rewriting code, redoing integration tests, user acceptance testing, etc. I mean that's just a terrible way to do security. I want to help companies avoid that vulnerability in the first place.

How do companies know when they've been breached and what data has been stolen?

(Laughs) Well, very often, companies know they've been breached when I tell them! It is not unusual that the first an organization knows of a data breach is when someone like me approaches them with their database. Most organizations, even some of the world's largest or most significant websites, are just ill-equipped to identify when an intrusion is taking place, and data is being exfiltrated.

Think of Sony Pictures. I mean that attack took place over an extended period, siphoned out a massive amount of data from lots of different systems and the first they knew of it was when employees saw “hacked by guardians of peace” on their screens.

What type of data are hackers after?

Depending on their motives, anything on the internet is a target. Usernames and passwords are particularly valuable because password reuse means you potentially have the key to lots of other sites. Obviously, anything of a financial nature, such as credit card details and bank account information is an old favorite. Increasingly, information which is digitized within organizations, especially anything that's state-backed, is very advantageous for corporate espionage. But there's plenty of people out there, mostly kids, that only want a trophy. They don't care how valuable the data is; they don't care what sort of website it is, they’re happy just to get in and deface a site.

How does stolen data impact companies and individuals?

It depends on the type of data. If the information is not sensitive, it may have a minor impact on the individual but a great impact by damaging the reputation of the breached organization. For Sony Pictures, it was unreleased movies which is valuable intellectual property.

Credentials can be used to break into accounts, for sending spam, and for very targeted phishing. In the case of Ashley Madison, we saw blackmail attempts which were effectively just mail merges using data from the breach. Unfortunately, this resulted in resignations, divorces, and even suicides, so obviously, the impact can be very severe.

Does a company have to notify individuals of data breaches?

That depends on your jurisdiction. In Australia, for example, we have pretty light mandatory disclosure laws; in fact, we only got our first mandatory disclosure laws last year. If a company has revenue of less than 3,000,000 Australian dollars a year, which is more than 90% of Australian businesses, or if the breached data is unlikely to cause serious harm, then they are not required to disclose.

Under the GDP in Europe, the guidelines are much more stringent. There's a more privacy-centric sentiment that personal data belongs to the individual rather than the company, and any misuse should be disclosed. I think that organizations, regardless of where they are located, should recognize that even if it's just my email address, it's MY email address and if you lose or expose it, I should be notified.

Why did you start Have I Been Pwned, and what service does it provide?

I was doing a lot of data breach analysis back around 2013 and saw some interesting patterns such as an email address that appeared in multiple data breaches often had the same password. We already knew that people use the same credentials to log on to multiple sites, but it was interesting to actually see the data. Another thing I found curious was that a lot of people who had been in data breaches were unaware. Therefore, I wanted to create an aggregation service where people could get a better view of their overall footprint and how much of their data has been exposed. That was the genesis for Have I Been Pwned.

How do you acquire breached databases?

Initially, I went out and grabbed publicly available information. Then over time, as the project started to get a lot of public support, more and more people came forward and provided me with data from a particular breach. Just this weekend I had someone pop up out of the blue and send a huge amount of data from seven different breaches, most of them containing tens of millions of records each.

It does sort of beg the question you know who are these people? Are they the bad guys? Are they people who are just trading information? The reality of it is it's a bit of everything. I'm sure that in some cases they are the people who actually compromised the system pool of data, but in most cases, it's people that make a hobby of anonymously collecting and monitoring data breaches.

Numerous professional white hat security researchers indeed exist who proactively contact organizations to inform them about their exposed data, often offering guidance on future protection measures. After the data breach has been rectified, they supply me with the data and ask to be acknowledged as the source, as a means of publicizing their work.

What is a paste?

A paste is text which is literally just pasted onto a website. You can go to a service like Pastebin, paste in some text, save it, and you've got a URL you can share with others. Data from breaches often appear first in pastes, since hackers will paste a segment of the data as proof that they’ve accessed the system.

While this is a good early indicator of breached data, there's often a lot of junk in pastes as well, so whenever an email address is found, we provide a link to the paste so you can decide if you need to do anything.

What actions do you recommend for an individual whose data has been stolen?

It really depends on the nature of the data. If it's a password that’s been reused in other places, then it needs to be changed on every single site. A password manager is a good tool to make them all unique. If it's a password on just that single site, it’s a lot easier. Additionally, once sites are aware of a data breach, they usually reset all passwords anyway. If it's something more personal like a credit card, go ahead and cancel the card. If it's your home address, your phone number or your date of birth, well those are things that you're not going to change just because of a data breach, but it’s a good idea to utilize a credit monitoring service since that is the sort of data used for identity theft.

What is the “Hack Yourself First” workshop?

I’ve run the “Hack Yourself First” 2-day workshop about 80 times all over the world during the last four years. It’s designed to help software developers, sysadmins, IT pros, etc. understand how their security vulnerabilities are exploited so they can better protect themselves. I take participants through the full lifecycle; here’s how SQL injection works - here's how attackers get data out – and of course, here's how to write code, so it doesn't happen to you.

We review vendors based on rigorous testing and research but also take into account your feedback and our affiliate commission with providers. Some providers are owned by our parent company.

About the Author

Gail Lobel Rand Technical Editor and Interviewer

Gail’s first PC was a TRS-80 which required a cassette tape to boot up. In the decades that followed, she created and developed websites, emails, and banners as the perfect way to combine her love for design, technology, and writing.

Follow our experts:

Did you like this article? Rate it!

I hated it! I don't really like it It was ok Pretty good! Loved it!

out of 10 - Voted by users

Thank you for your feedback

Please, comment on how to improve this article. Your feedback matters!

This field must contain more than 50 characters

The field content should not exceed 1000 letters

Sorry, links are not allowed in this field!

Name should contain at least 3 letters

The field content should not exceed 80 letters