LinkedIn Explains Information Scraping Amid Stories of Extra Information Hacks and Breaches
In the past few months there have been various reports of major LinkedIn data hacks selling huge databases of user information on the dark web for the highest bidder to use.
In April, Cyber News reported that 500 million LinkedIn users’ personal information was put up for sale on various hacking forums, while just last month another set, which allegedly contains information from 700 million LinkedIn profiles, also became available online.
In any case, LinkedIn has denied that this indicates a breach of its security and instead pointed to the culprit of “data scraping,” the (mostly legal) process of gathering publicly available information from platforms on a large scale to create larger amounts of data Sets by combining this material with other sources.
As LinkedIn explained in response to the most recently reported leak:
“Our teams investigated a number of alleged LinkedIn data that was put up for sale. We want to make it clear that this is not a data breach and that no private LinkedIn member data has been disclosed. Our initial research found that this data came from LinkedIn and other various websites and is the same data that we received earlier this year Scraping update from April 2021. “
However, despite these explanations, a certain level of user anxiety remains. That’s why, as part of an effort to provide more context about what actually happened and what action to take, LinkedIn today released an overview of how data scraping works and what users can do to better protect their LinkedIn profiles in the future .
According to LinkedIn:
“Scraping has been around since the beginning of the Internet, but it has increased dramatically in scope and complexity. Today we hear most of all about scraping, which uses unauthorized scraping, which uses code and automated collection methods to bypass (up to) thousands of queries per second and technical blocks to ingest data without permission. Deleted data can be collected from multiple websites, combined and sold in bulk for phishing and other campaigns designed to trick you into disclosing private information. “
LinkedIn has worked for years to prevent third parties from destroying their user data and is even going to the Supreme Court to prevent a particular company from collecting public information from LinkedIn profiles for their own purposes. But this case hasn’t been in LinkedIn’s favor so far – even if it wanted to legally block data scraping completely, it can’t, which in some ways limits its responsiveness.
An important consideration here is how much data LinkedIn makes publicly available. LinkedIn could further restrict the ability to access user information, which would also limit scraping, but would reduce the ability to find it in the app, search engines, and other ways, which would limit the platform’s broader uses.
For example, LinkedIn currently shows your name and job title to all seekers unless you’ve made your profile private. That data is then accessible to search engines, which can help increase findability – so LinkedIn could limit this further, but if you ever want to be found for relevant searches both inside and outside the platform, which is a major value proposition of the app it has to be keep some level of this information available to users and search tools.
As such, it is in a way in between as it manages how much profile data it makes publicly available and how much it hides behind privacy settings. Users also have the choice of how much of their personal data they make publicly available.
“Spend some time looking at what information you’ve added, from contact details to your work history, and familiarizing yourself with yours the settings. Also, check out yours public profile pageto understand what information may be public and to make sure it is exactly what you want to display to search engines and other off-LinkedIn services. You can narrow down or adjust the selection if you want. “
Even then, unauthorized scraping is not a violation or “hack”.
“S.Craping does not mean that an attacker has succeeded in breaking into secure systems, circumventing firewalls or accessing protected network information. Unauthorized scraping can mean malicious actors gather a lot of data and use it in ways you did not expect. “
LinkedIn uses bot detection tools and rate limiting tools to limit such activity, but the most important point LinkedIn wants to highlight is that these reported violations are not the result of hacking or data breaches as such. Users can further restrict their data to avoid concerns, but scraping in some forms is likely to always be around.
LinkedIn is still pursuing a lawsuit against hiQ Labs for using LinkedIn member data, which could set a precedent that would give platforms more power over data scraping. The fact is, however, that some data will always be publicly available and when it does, third parties will try to use these sources to build databases that they can sell on to marketing firms.
This is an important technical distinction and a good example of the evolving digital landscape and that laws are still catching up in many ways.
But to be clear, these records are not the result of data hacking on LinkedIn and you can limit your exposure through your own profile settings.