Why Work with Intrinio? #1: Data Quality

October 20, 2020

Data quality is, to use the technical term, pretty dang difficult. Businesses come to us all the time because they’ve received poor quality data from other providers – sometimes even from the biggest names in data.

Bad data quality is also one of the many answers to a question we hear often: “Why shouldn’t I just get data from a free (or super cheap) source?”

In this blog, we’re going to share why data quality matters, why it’s so difficult, and how Intrinio is tackling it for our customers.

Why is data quality important?

You may have heard that data is the new oil (or the new gold, or a similar analogy). Regardless of the comparison, there’s no denying that data is extremely valuable in today’s business climate. Our customers leverage Intrinio’s fundamental data and market data for quantitative investing, robo-advisor platforms, and much more.

Low-quality data is, essentially, bad information. Whether you’re using financial data for internal investments or an external, public-facing app (our most common use cases), poor data quality is a major issue. It affects the accuracy of your decisions – or those of your users – which can translate to a very real monetary loss.

For example, incorrect fundamentals data can make a company seem like a much sounder investment than it is – or cause you to miss out on a strong opportunity. There are large-scale consequences to poor data quality, which is why we place such a priority on improving our quality over time.

Why is good quality data so hard to find?

Data is often compared to oil and gold because it’s not necessarily valuable in its raw form – it needs to be processed to be usable. All financial data providers have roughly the same raw data to start with, but their processing methods can cause a big divergence in quality.

Many firms still use manual mapping. Processing is outsourced, often overseas, to hundreds or thousands of people who map SEC data to templates by hand. Think about how often you make typos; this approach is extremely vulnerable to human error. Firms that use manual mapping may have data quality infrastructure in place to check this data and flag issues, but:

1) Cheap providers probably won’t have any data quality infrastructure

2) Big providers will, but it’s going to be wildly expensive for your business

Intrinio doesn’t use manual mapping – we leverage advanced machine learning to process data more quickly and much more cost-effectively. This doesn’t completely eliminate data quality concerns; the huge variation in the way public companies file guarantees that things won’t always line up correctly. Hence the need for our own data quality infrastructure.

How does Intrinio manage data quality?

Since there’s far too much data for our human team to pore through by hand, we built a proprietary data quality system that automates most of the process. After we standardize our data, our system flags potential errors and suggests fixes. One of our data experts reviews the solution and can verify and apply it as needed.

What gets flagged as an error? Here a few examples:

Net income equals $1 million. If the line items we put into the net income bucket don’t add up to the “parent,” or the total net income number, the system flags it as a data quality issue. Basically, A+B must equal C.
A stock is priced at $20. The standard deviation is 10 to 50 cents. If the stock price falls to $19.50, that’s fine – but if it goes up to $1,000, something’s wrong. Likewise, the system would flag a company whose net income appeared to jump from $100 million to $100 trillion.

Rather than constantly trying to fix one-off data quality issues (which is time-consuming and ineffective), we take a systematic approach to resolving problems (which is still time-consuming, but effective). First off, we practice “conservative publishing.” We won’t publish data with known, unresolved errors. When you retrieve data from our API, you’ll know it’s high quality and won’t cause issues with your systems down the road.

Recently, we deployed a strategy that automatically restandardizes historical fundamentals with our most advanced code and machine learning models. This decreases the number of uncovered fundamentals due to our conservative publishing, without sacrificing quality.

Each filing season presents new challenges. We’re constantly evolving to keep up with how companies file and improving our data quality processes and infrastructure. We make it easy for our customers to flag potential data quality issues through our built-in ticketing system. Our data quality team will either fix the issue immediately or roll it up into a more comprehensive data quality update.