We often hear prospects (particularly those at financial startups) insist that they don’t need to pay for fundamental data because they can get financial statements for free from the SEC.
Is that true? Sure! It’s also a prime example of “you get what you pay for.” In this blog, we’ll outline three reasons why free SEC data doesn’t cut it for any real business need.
While companies are required to file in XBRL, that’s only the very first step to making data more transparent and accessible. There are a huge variety of ways that companies can file statements with the SEC, and while that flexibility may benefit companies and their accountants, it doesn’t make life any easier for you.
Long story short, the data that comes straight from the SEC isn’t very clean or similar. In fact, the SEC does not validate most of the data it receives to ensure that companies are using the right XBRL concepts or the correct values.
You can download XBRL files from the SEC, but in their raw form, they’re essentially gibberish. You have to write huge amounts of code to process it. We know because we’ve spent years building our processor (more on that later).
While you can try to parse that data, you’ll soon run into the real obstacle: what works for one filing won’t work for all of them. Since every company files differently, your code will cause issues with some filings that you’ll have to fix – which may cause issues elsewhere in the process.
Processing SEC data is complex, and there’s no way around it. Imagine the process of manufacturing steel. It requires raw ingredients to be processed and separated, then processed and separated again. Each process has several sub-processes, and nothing operates independently. If processes don’t occur in the right order, there can be huge complications down the line.
Processing XBRL data is so nuanced that even after years of refining our infrastructure, we don’t get every filing right the first time. Our system has built-in data quality checks that will alert us if something doesn’t seem right. We also practice “conservative publishing,” which means we won’t publish data that has known, unresolved errors. Every time we upgrade our code, we rerun all uncovered data to see if we can improve the results.
If you’re making investment decisions off this data, or guiding your customers to make investment decisions, it has to be accurate. Hedge funds have large teams that work through this data. Other data providers outsource cleaning and standardization to thousands of employees who do the work manually. Smaller companies simply don’t have the manpower to understand XBRL data and make it usable for their purposes.
The SEC delivers data, but not much else. If you experience an issue, there’s no one to turn to for help. When you’re in the early stages and have a small team, this is especially challenging.
Intrinio provides a suite of tools to help businesses get the most out of their fundamental data. You can choose from multiple access methods, including our Web API, direct database access via Snowflake, bulk file downloads, and FTP. We also offer software development kits in six languages, including Python and R, as well as documentation and help articles to get you up and running.
Our engineers walk your team through the integration process and provide ongoing technical support. When you partner with Intrinio, you get:
Our fundamentals product includes much more than as-reported and standardized financial statement data. You also get access to:
Imagine sourcing, cleaning, and maintaining all of that data by yourself, and you can see why pulling data from the SEC isn’t the easiest or cheapest solution for your company.
We’ve spent years honing our process to deliver SEC filings in a usable format, and we’re constantly making improvements. Here’s a high-level overview of our process:
The SEC has an RSS page that alerts us to new filings. For each filing we cover (including 10-K, 10-Q, 8-K, 20-F, and 40-F filings), we download five to six documents from the SEC. These are all unique to the company that filed. We upload the XBRL files to our cloud storage platform and flag them for processing.
We download the files from cloud storage and run them through our XBRL Processor. We built our processor from scratch, which allows us to upgrade our code over time and fix any bugs we find. The XBRL Processor turns the XBRL files into normalized data for our SQL database. Now, we can join data and make queries, which is difficult to do with raw XBRL.
In this step, we take the raw data and match it to “roles,” or XBRL concepts. The XBRL taxonomy doesn’t specify which data belongs to which role, so we infer it through machine learning.
This can be much trickier than it sounds. A fundamental is a unique combination of a company ID and a period (for balance sheets) or duration (for income sheets and cash flow statements). Some filings have multiple comparable periods – for instance, if they report their income statement from the previous year. There are numerous subtleties that require advanced infrastructure and some human review to navigate.
There are 17,000 concepts in the US GAAP taxonomy that companies can use to report values. Since they can also create custom tags, we’ve seen more than 30,000 distinct concepts reported. Our mapping strategy takes these thousands of XBRL concepts and fits them into roughly 400 Intrinio concepts.
This enables direct comparison across companies. No two companies report the same way, and even a single company can change the way they report over time. It’s almost impossible to use XBRL data in a comparative way across time periods and between companies. You can use the XBRL data to view values if you understand the 17,000 concepts and their subtle differences, but it’s much smoother to compare 400 well-defined concepts.
We have multiple strategies for standardization. Our team uses machine learning to run each strategy and chooses the most effective result. Once we pick a strategy, we standardize that data by taking the value from the strategy and adding it to the fundamental (along with the correct data tag). On any given filing, there are six to eight fundamentals (unique combinations of time period and statement) that we have to validate.
For each standardized fundamental, we have to review the following checklist:
Due to differences in the way companies report, fiscal periods may not align. For example, one company may have a fiscal year that aligns with the calendar year, while another company’s fiscal year may end in June. Without this data quality check, you could be comparing different fiscal periods without even realizing it.
We also set the most recently stated data for a fundamental as the “primary.” Since companies often report previous periods in each filing, if any historical data is restated in a given 10-K or 10-Q, we replace the existing data in our system with the updated values.
Lastly, we calculate 100+ metrics and ratios for our users, such as dividend yield, market capitalization, and price to earnings.
Our clients have saved years of development time by relying on our XBRL knowledge, advanced processing infrastructure, and first-class support. Request a consultation with our team to learn how we can deliver clean, high-quality fundamental data for your business.