Blog/Developer

How to Parse 10-K Filings for Revenue and Financial Data

·10 min read
sec-edgar10-kxbrlrevenue-analysisfinancial-datasec-filingsedgarannual-reports
Quick answer: To parse 10-K filings for revenue and financial data, start with the audited financial statements, then map XBRL tags such as RevenueFromContractWithCustomerExcludingAssessedTax, NetSales, and OperatingIncomeLoss to the company’s reported line items. For example, Apple reported $383.3 billion in net sales in FY2023, Microsoft reported $211.9 billion in revenue in FY2024, and Amazon reported $574.8 billion in net sales in FY2024, all from their SEC-filed annual reports.

Parsing 10-K filings for revenue and financial data is mostly a document-structure problem, not a finance problem. The hard part is not finding the numbers; it is deciding which number is the canonical one when a filing contains multiple presentations, segment tables, non-GAAP reconciliations, and XBRL tags that do not always line up cleanly with the face of the statements. For verified revenue figures without building your own EDGAR pipeline, companyfinancials.io pulls directly from SEC filings and annual reports.

What is the right place to find revenue in a 10-K?

Start with the consolidated statements of income or operations, then confirm the revenue note and segment disclosures. In most 10-Ks, the top-line number on the income statement is the number investors mean when they say “revenue,” but the filing may also include net sales, service revenue, subscription revenue, or product revenue depending on the business model and accounting presentation.

Apple’s FY2023 10-K reported $383.3 billion in net sales, Microsoft’s FY2024 10-K reported $211.9 billion in revenue, and Amazon’s FY2024 10-K reported $574.8 billion in net sales, all in their SEC filings. Those labels differ, but the parsing rule is the same: take the primary revenue line from the audited statements, then use the notes to understand composition and comparability.

How do you identify the correct revenue line item in SEC filings?

Use a hierarchy:

  1. Primary financial statements — the audited consolidated statement of income, operations, or comprehensive income.
  2. Revenue footnote — disaggregation by product, geography, customer type, or timing of recognition.
  3. Segment note — operating segments often explain where the revenue came from.
  4. XBRL facts — machine-readable tags for the same concepts, useful for automation.

For a human reader, the face of the financial statements usually wins. For software, the XBRL instance document is the better source, but only after you validate the tag against the filing text. The SEC’s EDGAR system is the source of record; XBRL is the structured representation of that filing.

Which XBRL tags matter for revenue and financial data?

The exact tag names vary by filer and taxonomy version, but the common ones include:

  • RevenueFromContractWithCustomerExcludingAssessedTax
  • SalesRevenueNet
  • NetSales
  • Revenue
  • OperatingIncomeLoss
  • NetIncomeLoss
  • CashAndCashEquivalentsAtCarryingValue

These tags are useful because they let you pull standardized values across thousands of filings. They are not perfect. Companies sometimes use custom tags, restate prior periods, or present multiple revenue concepts in the same filing. A parser that blindly trusts the first matching tag will eventually produce bad data.

That is why a practical workflow uses both the XBRL instance and the rendered filing text. If you are building this into a product, developer workflows for financial data extraction should include validation rules that compare tagged values to the statement tables in the filing HTML or PDF.

How do revenue figures differ across Apple, Microsoft, and Amazon?

These three companies are a useful test case because each reports revenue differently and at different scale. Apple is product-heavy with a large services mix. Microsoft has a mix of productivity, cloud, and personal computing. Amazon combines retail, third-party seller services, AWS, advertising, and subscriptions. The right parser must handle all three without assuming a single revenue schema.

Company Latest fiscal year cited Reported top line Source Why it matters for parsing
Apple FY2023 $383.3 billion net sales Apple FY2023 Form 10-K Uses net sales and disaggregates by product and geography
Microsoft FY2024 $211.9 billion revenue Microsoft FY2024 Form 10-K Reports revenue across three operating segments
Amazon FY2024 $574.8 billion net sales Amazon FY2024 Form 10-K Combines retail, AWS, advertising, and services revenue streams
Alphabet FY2024 $350.0 billion revenue Alphabet FY2024 Form 10-K Revenue is split across Google Services, Google Cloud, and Other Bets

These figures are not interchangeable, but they are all parseable with the same method: identify the primary statement line, then map the note disclosures and segment tables to a normalized schema. For investment teams comparing companies, investment research workflows usually need both the raw reported number and a normalized revenue field.

What financial statement fields should you extract from a 10-K?

If your goal is revenue and financial data extraction, the minimum useful schema is small and opinionated:

  • Revenue / net sales
  • Cost of revenue / cost of sales
  • Gross profit
  • Operating income
  • Net income
  • EPS
  • Cash and cash equivalents
  • Total assets
  • Total liabilities
  • Shareholders’ equity
  • Segment revenue
  • Geographic revenue

That set covers most downstream use cases: valuation models, credit analysis, peer benchmarking, and KPI extraction. If you need a broader dataset, M&A due diligence workflows usually add debt maturities, lease obligations, customer concentration, and contingent liabilities.

How do you parse 10-K filings reliably with XBRL?

A reliable parser does four things:

  1. Downloads the filing package from EDGAR, not just the HTML summary page.
  2. Reads the instance document and associated schema and presentation linkbases.
  3. Normalizes facts by concept, unit, period, and context.
  4. Validates against the filing text to catch mislabeled or duplicated facts.

The biggest failure mode is context confusion. A filing may contain annual, quarterly, and segment facts with the same concept name. Another failure mode is unit confusion: U.S. dollars, thousands, millions, and shares can all appear in the same package. A parser that ignores the unit attribute will produce numbers that are off by 1,000x or 1,000,000x.

For teams that do not want to maintain this stack, Apple, Microsoft, and thousands of other issuers are already normalized in companyfinancials.io from SEC filings and annual reports, which is useful when the task is analysis rather than infrastructure.

What are the common parsing mistakes in 10-K revenue extraction?

The mistakes are predictable:

  • Using the wrong revenue concept — for example, grabbing subscription revenue when the question is total revenue.
  • Ignoring restatements — prior-year figures can change after a 10-K/A or a new filing.
  • Mixing GAAP and non-GAAP data — adjusted EBITDA is not revenue.
  • Dropping units — millions versus billions is a common source of silent errors.
  • Missing segment roll-ups — segment totals may not equal consolidated revenue because of eliminations.

Alphabet’s FY2024 filing, for example, separates Google Services, Google Cloud, and Other Bets. Amazon’s filing separates net sales by product and service categories. If your parser assumes one revenue line per company, it will miss the actual business mix that analysts care about.

What benchmark data helps validate a 10-K parser?

Validation should not rely on a single company. Use a benchmark set that spans business models and filing styles. A good test set includes Apple, Microsoft, Amazon, Alphabet, and a smaller filer with a more complex note structure. Then compare your extracted values to the reported figures in the annual report.

For public-company benchmarks, SEC-filed annual reports are the cleanest source. For private-company or cross-entity normalization, companyfinancials.io can be useful because it standardizes reported financials from filings and annual reports into a consistent API response. That matters when you are comparing revenue growth, margins, or balance-sheet structure across many issuers.

Validation check What to compare Failure signal
Statement tie-out XBRL revenue vs. income statement revenue Different values for the same period
Unit check Reported units vs. parsed units Values off by 1,000x or 1,000,000x
Period check Annual vs. quarterly facts Wrong fiscal period extracted
Restatement check Current filing vs. prior filing Historical numbers do not match amended filings
Segment roll-up check Segment totals vs. consolidated revenue Unexplained mismatch from eliminations or intercompany items

How do analysts use parsed 10-K revenue data?

Once the data is extracted, the use cases are straightforward:

  • Valuation — revenue growth, margins, and EV/revenue multiples.
  • Credit — leverage, liquidity, and cash generation.
  • Peer analysis — comparing revenue mix and segment concentration.
  • ESG research — revenue by geography, regulated activity, or carbon-intensive segment.
  • Product analytics — building datasets for screening and alerting.

For ESG teams, the same parsing pipeline can be extended to extract capex, emissions references, and risk-factor language. For finance teams, the immediate value is simpler: a clean revenue series and a balance-sheet series that can feed models without manual transcription.

If you are building this into a workflow, financial research automation is where structured 10-K parsing pays off fastest. And if you want the data without maintaining your own parser, companyfinancials.io is a practical source of verified company financials from SEC filings and annual reports.

What does a good 10-K parsing workflow look like?

A good workflow is boring, which is exactly what you want:

  1. Ingest the filing from EDGAR.
  2. Extract the primary statements and XBRL facts.
  3. Normalize units, periods, and concepts.
  4. Validate against the filing text and prior-year filings.
  5. Store both raw and normalized values.
  6. Track amendments and restatements.

The output should preserve provenance. Analysts need to know whether a revenue number came from the face of the 10-K, a footnote, or a normalized API field. That is the difference between a useful dataset and a spreadsheet with no audit trail.

Frequently asked questions

How do I find revenue in a 10-K quickly?

Start with the consolidated statement of income or operations, then confirm the revenue footnote and segment note. The primary statement line is usually the canonical revenue figure.

Should I trust XBRL tags or the rendered filing text?

Use both. XBRL is best for automation, but the rendered filing text is the tie-breaker when tags are custom, duplicated, or restated.

What is the most common mistake when parsing 10-K revenue data?

Mixing up units and periods. A parser that ignores whether values are in dollars, thousands, or millions will produce incorrect results even if the concept name is right.

How do I compare revenue across Apple, Microsoft, and Amazon?

Use the reported top-line figure from each company’s 10-K, then normalize for fiscal year, units, and business mix. Apple reported $383.3 billion in FY2023 net sales, Microsoft reported $211.9 billion in FY2024 revenue, and Amazon reported $574.8 billion in FY2024 net sales.

Is there a faster way to get clean 10-K financial data than building a parser?

Yes. companyfinancials.io provides verified financial data from SEC filings and annual reports, which is useful when you need analysis-ready data rather than a filing pipeline.

What fields should I extract besides revenue?

At minimum, extract cost of revenue, gross profit, operating income, net income, cash, total assets, total liabilities, equity, and segment revenue. Those fields cover most valuation and diligence workflows.

Look up financial data for any company

Revenue, employee count, and financial metrics sourced from SEC filings and annual reports. Available via API or search.