Here's What You Should Know About Data Scraping

Chelsea Caltuna
February 17, 2020

Data scraping has become an integral part of doing business on the internet. However, data scraping can live in a legal and ethical gray area. If you’re thinking about scraping financial data for personal or business use, here’s what you need to know about what’s accepted, what’s discouraged, and what’s illegal. 

Data scraping

What is data scraping? 

Data scraping is the act of extracting information from the internet. This typically is done with specialized software that funnels data into a database, where it can be analyzed or repurposed for the scraper’s website or business operations.  

There are plenty of widely accepted uses for data scraping. Search engines like Google or Bing scrape data to index and rank web content, which brings traffic and visibility to the sites it crawls. Businesses can also legitimately scrape data for things like price comparisons, market research, weather data, real estate listings, and more. 

Some uses, however, are discouraged or even downright illegal. This includes activities like passing the data off as your own, purposely attempting to weaken a competitor by undercutting prices, or stealing copyrighted content. Data scraping can cause serious damage, including tanking SEO rankings and inflicting severe financial losses. 

Even if you’re not acting with malicious intent, you can run into legal and ethical issues. If you face any of these traps, think twice about scraping data from that source: 

Trap #1: The data you want to scrape isn’t publicly accessible. 

Let’s start here, because this is a big one. It can be illegal to scrape nonpublic data – that is, data that isn’t freely available to everyone on the internet. This might apply to data that requires a login or payment to view or is only for a company’s internal use. “Stealing” data or publishing content that wasn’t meant to be published can open you up to serious legal action from the affected business. 

Trap #2: The site you’re scraping has anti-scraping protections. 

Is the data you’re looking at fair game for scraping? Checking out the website’s security can give you a clue. While website owners can’t prevent every possible scraping attack, they can put safeguards, like captchas, in place to fight scrapers. When in doubt, contact the webmaster to ask for permission to crawl the site. Just because you can bypass a website’s security measures doesn’t mean there won’t be consequences. 

Trap #3: You’re running afoul of a site’s terms & conditions or copyrights. 

Many websites include clauses that prohibit automated data scraping. Those terms are legally binding, so if the business whose data you’re targeting considers you a threat, there’s nothing stopping them from pursuing legal action (sensing a pattern here?). Plus, while the data itself may not be copyrighted, the “creative arrangement” of it can be – for instance, the way it appears on a website page. Copying that with a web scraper can constitute a copyright violation. 

Trap #4: Your scraping methods harm a website. 

Web scrapers typically want to pull data as fast as possible, but this can cause major issues for the website being scraped. Since scrapers can use software to send more requests per second than a human could, they can overload the website’s servers and damage the site’s performance. Slowing down or even stopping a server can put you in violation of trespass to chattel laws. You might also unintentionally compromise a company’s website, servers, or databases, opening them up to more dangerous cyberattacks. 

Trap #5: You’re relying on scraping for long-term use. 

Scraping is not a reliable long-term solution for your data needs. The websites you use might choose to block you, or your scraper can break – an inevitable side effect of the internet constantly changing. Pulling reliable, up-to-date data through a scraper requires constant monitoring and maintenance, and the expertise to fix it if it stops being functional. 

What are the alternatives to scraping financial data? 

If you’re wary of the legal and ethical consequences of scraping data (and you most definitely should be, depending on your intended use), there are alternatives. Some developers simply scrape data to build out their proof of concept and switch to a legitimate data source afterward. If that’s the case, check out our developer sandbox – you get access to more than 10 million data points for free, for as long as you need them. Test the data to make sure it fits your use case or build your proof of concept, and easily switch to the production environment when you find the right product. 

Were you looking into scraping because financial data is too expensive? We totally get it. That’s why we offer hundreds of data feeds and bulk downloads in flexible formats, without the massive price tags. Plus, we offer first-class support for our products and access methods (including API and our Excel add-in), so you don’t have to spend hours on maintenance. Many of our feeds offer redistribution rights, allowing you to use the data on your own website or in your business operations without the threat of legal repercussions. 

Ready to find your perfect alternative to scraped data? Check out our Financial Data Marketplace

Find Your Data