What Is XBRL (eXtensible Business Reporting Language)?

July 21, 2022

Perhaps a better way to answer this question: How did we get to XBRL?

XBRL is an acronym for eXtensible Business Reporting Language (well, sort of, using "X" instead of "E" just sounds more cool).

XBRL is based strongly on XML (eXtensible Markup Language), a standard that's been around for almost 25 years. And just for fun, because it will be referenced later, the acronym HTML stands for HyperText Markup Language.

The simplest way to think of the differences between HTML and XML is this: HTML is a means of transmitting data visually, while XML is a means of transmitting the data itself.

That gets us to the point of XBRL: To transmit business information in a format easily consumable by computers. Given the complexity of financial reporting, XBRL is an ideal format for doing this. In effect, a set of XBRL files represents a standalone database of information describing financial condition, risk, profit and loss, equity, liabilities, assets, events, explanations (excuses?) and more.

Why have XBRL at all?

In two simple words: transparency and consistency. Before financial reports were filed in XBRL, they were just PDF, HTML, or text displays of data. Before that they were printed and mailed to shareholders.

Analyzing mailed reports would be incredibly time-consuming and subject to wide differences in the way companies described accounting concepts, even in the same company from year to year.

Once electronic reports (HTML and PDF) were available, it solved part of the problem (the filings at this point could be consumed by computers), but not all of it. Too much variation in reporting still existed, but even worse, teasing data out of a PDF document or HTML table is almost impossible to do reliably.

XBRL solved most of these problems -- it represents a standardized way for companies across the world to file data that could be extracted, processed, and analyzed with a much higher degree of automation and efficiency.

How to consume this data

While it's certainly easier to analyze data from an XBRL filing than it is from a PDF, it's still not easy.
Viewing the data, from production to consumption, imagine an investor who wants to research and analyze a wide range of companies' performance over several years:

Companies' accountants create and file reports in XBRL
Various agencies make these reports available (the SEC in United States)
The investor downloads the reports and then...

Herein lies the problem: how does one read data out of an XBRL filing?

You'd need to write an application that transforms XBRL into something useful. For example, suppose you wanted to compare gross profit for each company. Since gross profit is a measure made over a duration (each quarter, year to date, etc.), you'd have to identify overlapping periods between each of the companies, figure out which concepts they use to report revenue, figure out which currency the value is and convert it to a common currency, determine if the company reported additional revenue from subsidiaries that's already included in the "main" revenue....there's a lot to it.

Four pillars of XBRL

XBRL creates associations between different items much like a database does, but does not have the rigid referential integrity and ease of querying that traditional databases have.

As a result, extracting data from an XBRL report involves a lot of deduplication, inference and disambiguation. It really can't be used on its own, it's a source to create usable data.

Oversimplifying things, think of each of these items as pillars: everything else is built on top of them. They are the equivalent of tables in a database.‍

Roles: Statements, Notes, Disclosures, Documents

Concepts: Revenue, Gross Profit, Short Term Investments (and so many more)

Contexts: Instants (point in time) or durations (period of time)

Facts: A value that is reported particular concept & context

Once these elements have been created, associations between them can be confidently identified (roles have concepts, concepts have facts, facts have context, contexts have dates). Many associations must be inferred. For example, fiscal year and quarter.

In a nutshell: even though extracting the raw data is not particularly difficult, establishing the associations between roles, concepts, context, and facts presents many challenges. The data is not very usable until then.

Not mentioned yet, but also worthy of noting, XBRL has a concept of discoverable taxonomies, which is where concepts are defined. A new US-GAAP taxonomy comes out once a year that includes more than 15,000 distinct concepts, with hundreds and even thousands that are either added or removed each year. American Depository Receipts (ADRS) that file with the SEC also use the IFRS taxonomy. In fact, some ADRs use both US-GAAP and IFRS in the same filing. Last but not least, the "eXtensible" part of XBRL lets companies create their own taxonomies that include concepts that describe something that maybe doesn't quite fit into the US-GAAP or IFRS. Looking at Intrinio's data, the entirety of all filings at the SEC in a calendar year, the combination of US-GAAP, IFRS, and custom taxonomies, more than 45,000 distinct XBRL concepts were used.

Standardization

Assuming that all the data has been retrieved, deduplicated, disambiguated, and inferred, that data can now be used to compare one company to another, or a company's current filing to those prior. Except...

It can't. For any given "generalized" accounting idea, "Operating Revenue" for example, there might be a dozen or more different ways a company reports that data.

Intrinio uses AI and machine learning to create this generalization, and has a fixed set of Intrinio data tags, one of which is for operating revenue (conveniently named "operatingrevenue"). Looking 1,000 recent concepts that were categorized into this tag, the most-common XBRL concept ("RevenueFromContractWithCustomerExcludingAssessedTax") still only accounts for 20% of actual revenue. Other XBRL concepts used include "Revenue", "Revenues1", and even "MassivelyMultiPlayerOnlineRolePlayingGamesRevenue".

Complicating standardization even further - many companies have a line item for total combined Operating Revenue and provide additional detail that adds up to that total. For example, the concept "RentalIncomeOperating". This is concept that legitimately describes operating revenue, but if the company also reported a true "Operating Revenue" line item, they'd get double-counted.

It gets further muddied by dimensions (which have not been mentioned yet). Dimensions in XBRL let a company offer further detail into a particular fact. Apple's income statement includes separate dimensions for products and services along with a line item that combines the two. Not all companies that report dimensions have a "non dimensionalized" counterpart, and even when they do they don't always add up.

The bottom line: Those 45,000 XBRL concepts ultimately get compressed into fewer than 1,000 standardized tags. This is akin to lossy compression with images used by many graphic formats. It's important for this type of compression in graphics for speed and cost, but with standardization, it's important for comparability. The loss of resolution is not anywhere near the apparent cost (1000 tags from 45,000 concepts would seem to imply 97.8% loss) because Intrinio is ultimately standardizing these concepts into extremely similar groups of things. We can't compare Apple's revenue from oil drilling with ExxonMobil's sale of phones, but we can put both of these things in the same bucket and compare those buckets, and it turns out to be very effective.

An easier way

Intrinio's standardized fundamentals completely handle this process:

As XBRL filings are made available at the SEC, "as reported" data is extracted, associations created, and inferences made.
The "as reported" data goes through Intrinio's standardization process, which uses human-guided AI and ML to converted those reported tags (XBRL, IFRS, custom) to Intrinio standardized tags.
Extensive validation occurs, using calculation definitions provided in the filing to ensure, for example, that totals match and no double-counting has occurred. Problems are flagged and reviewed.
This data is made available through web APIs and SDKs -- fully standardized data for every filing for more than ten years.

‍