If you are a fintech founder, engineer, data scientist, quant, investor, market data manager, or project lead at any financial institution - or if you are building an investment style app, this blog is for you.
You are most likely going to need to acquire data for whatever it is that you are building, stock price history, streaming options quotes, fundamental metrics - you name it. These data feeds may seem expensive (depending on who you are talking to) and can be tempting to think you can just scrape it off the internet or go straight to the source yourself. This is a common mistake, because most people don’t know what goes on behind the scenes to get reliable data delivered to end users - it’s a whole business, in and of itself, and it’s worth investing in good data so that you don’t distract yourself from your mission by trying to wrangle it on your own.
I wrote this blog so that you can see exactly what you are getting when you work with a data partner like Intrinio, and can feel confident that you are getting your money's worth by investing in good data.
So, what really goes on behind the scenes at a financial data company? You’re about to find out.
Here is a list of functions that a good financial data partner will take care of for you:
The very first step in the process of providing a great data feed is sourcing the data. This may seem trivial because you know exactly what data you need. But it takes time to stay on top of the latest trends, identify exactly where it's most efficient to source data from, and maintain relations with any of the source providers. Its also important to remember that data sourcing is an ongoing process - exchanges, data providers, and vendors change their rules on a regular basis and SEC regulations might mean you need to find a new data source in a hurry. It takes a lot of work to stay on top of sources.
Next up is the ETL process. This stands for extract, transform, and load. This is the technical term for the data integration process. You definitely need an experienced engineer for this, since it involves connecting to the source and getting the data loaded into a database. It’s not wildly complex, from an engineering perspective, but it does take time, talent, and resources.
Like data sourcing, you can’t just build an integration and forget about it unless you work with an experienced provider. Field names, update frequencies, FTP or API protocols, and scraped sources change frequently. A good provider will avoid these changes, but free or cheap sources will not, and you will need to invest in developer resources to make sure your integration does not fail.
After the data is loaded into a database, there are multiple ways that it needs to be validated.
For example, are there any duplications that need to be removed? Are the right dates attached? Does everything add up how it should? Are there any crazy outliers? Did we correctly infer associations between different pieces of data? This can often be automated, but it requires fine tuning, updating, energy, talent, time and resources. An experienced data provider will handle this process before the data gets to your team, reducing the time and money you need to spend making sure data is correct before it reaches your end users.
Validation is just one piece to the puzzle after getting your hands on the data. Standardization is the next step. This is especially important for complex historical datasets like fundamental data, but it applies to all datasets.
For example, has a company changed the way they file their income statement, and is reporting a line item in a different place? Operating revenue not adding up? That data is going to need to be standardized so it’s comparable across companies and time periods. Another example you might see crazy outliers in a stock price data set that don’t make sense. Trust me when I say that to keep standardization and cleaning of data up and running requires specialty engineering skills and advanced tools like machine learning and complex algorithmic processing. It’s a big lift. Even keeping up with ticker changes can be a full time job. Every month, hundreds of companies are acquired, go out of business, or change tickers. A data provider will update these tickers and other metadata for you, but if you try to manage it yourself, you will need to dedicate daily staff hours to making sure your data is standardized, classified, and tagged properly.
Sourcing, storing, validation, standardizing and cleaning data is great, but that needs to happen constantly. The data needs to be updated on a continuous basis. Again, this can be automated in many cases, but does require engineering work, money, and time. If something breaks in the integration, you sure as hell want to know that an engineer or a process will make sure the file comes through.
We just covered how it’s necessary to continuously update the data - this means that the data can change at any time. Meaning having a robust data quality system in place is paramount.
For example, did a stock price come through wonky? Maybe it’s a corporate action, a missed split or an adjustment? Engineering solutions like an alert for a data point that shows up outside of a standard deviation are needed or you’re in big trouble. You must constantly monitor data that falls outside what’s expected, and either react with automated systems or provide feedback to those who can handle fixing it manually.
Keep in mind that the work required in every step we’ve gone through so far is required for every type of data you are consuming. Which can be dozens if not hundreds of data feeds. All of that data is going to need to be stored, which costs money. I’m talking about terabytes of cloud storage. Not cheap. In addition, the CPUs required for sourcing, ETL, validation, standardization, continuous updates, and data quality are nothing to balk at.
Storage, compute, and networking can sink your time, energy, and bank accounts if it’s not a part of your core business. Intrinio is wise to these challenges and invests in preprocessing data and other methods to reduce our clients’ processing and storage costs. If you do it alone or work with a low quality provider, you might end up spending more on servers than you do on licensing.
This is all a lot of engineering work, and you do not want your engineers wasting their time sourcing data in house. These steps in the data management process require specific engineering skills. If your tech talent has to learn and focus on data management it just distracts you from your own mission.
Ok, now that we’re through the backend engineering parts of the journey, let’s talk about following the rules. Just getting your hands on the right data, in the right format, isn’t enough to keep you from lawsuits or worse. Financial data can be a heavily regulated space and if you aren’t an expert in navigating it, you will need help.
There are three extremely important concepts in this category that a good data partner will handle for you: Permissioning, Entitlements, and Exchange/Vendor reporting.
If your data provider doesn't handle permissioning on your behalf you need to handle it internally, making sure only the correct users or teams have access to the data you are pulling. This can be a big deal if you are only supposed to allow access to certain individuals or clients. Most data vendors or sources have different pricing and rules for internal and external use, and for individual and business use. Sometimes this is called entitlement, and it’s very important in the data industry. A data provider won't let you mess this up but if you handle it yourself, you are on the hook to make sure you get it right
The TLDR? Working with stock exchanges is complicated. If the data you need originates from a stock exchange, you are most likely going to have to navigate exchange agreements (they are massive legal documents), getting approved to access the data, paying exchange fees, paying per user fees, & more. You’ll also have to monitor for any legal or regulatory changes. For example, the stock exchange MEMX just started requiring and charging exchange and per user fees. If you weren’t paying attention, your business would have been out of compliance.
A data partner can help walk you through these steps, own the relationship with the exchange, and make sure you are following the rules. Doing this on your own is needlessly complicated, time consuming, and risky.
Still not convinced that it’s worth working with a data partner to handle all of this? Let’s talk about developer tools.
If you manage to get a handle on everything we covered, you’ll probably need to wrap it up into internal APIs. Real-time data? You’ll need something like a WebSocket in order to deliver this data to your platform, website, or model. Developer tools are critical for data integrations, and a good data partner will also have that covered for you. If you go it alone, you’re going to have to build all of these tools yourself.
There’s a reason engineers love Intrinio, and if you choose a good data partner like us, your engineers will love you too.
Documentation is also critical, and any of your own developer tools that you build will have to be documented, otherwise if you have any turnover your next team member will have to start from scratch. Trust me when I tell you that the SEC doesn’t have their data warehouse documented. Neither does Yahoo Finance, if you are attempting to scrape the data. It’s not pretty. If you try to source on your own this way, the engineers navigating it for you will be flying blind, without any source docs to work from. Good documentation can make all the difference when integrating data.
This includes Software Development Kits, or SDKs. Easy ways to get started pulling data in seconds, they make developers very happy. Code samples, code tutorials, sample data, it’s all important.
Another critical process happening behind the scenes at a data company is the continuous enhancement of data products. New product features roll out from time to time that can be valuable additions to your investment models or provide new insights to your users. Without a team consistently focused on improving data feeds, your app, platform, or algo can run stale.
Last and not least, is support. Things break. A lot. Most founders, engineers, product managers, and executives know this. Doing data on your own? There’s nobody to call. Your team will be derailed, sometimes for days, figuring out where a missing file is, why an SDK stopped working, whether your code can be optimized for lower latency, or how to fill out your exchange paperwork. With the right data partner (like Intrinio) you have instant chat support, and a dedicated account manager and sales engineer to help you navigate day-to-day issues. We even chat with some of our customers in slack, over text, and our ticketing response time is awesome.
Would you design your own project management software or rebuild quickbooks to keep your business going? When you take a trip, you don’t build an airplane, you buy a ticket. The same goes for data. If your business needs data, buy it, don’t build it. Let’s divide and conquer so you can keep innovating and building the future of finance. Intrinio is an excellent data provider that specializes in custom data solutions. If you are in need of a data partner to help with all of this, please reach out to one of our data experts today!