Unlock Insights from Financial Texts
A public repository of parsed textual data from SEC filings, including major sections and financial statement notes from 10-K and 10-Q reports, designed to accelerate business research.
Based on the paper: Codesso, M., Hoitash, R., & Hoitash, U. (2025). Textual Financial Data Repository and Python Code for Machine Learning, Al, and Textual Analyses.
A Better Way to Work with Financial Data
We solve the most common roadblocks in textual analysis so you can focus on research, not data wrangling.
Save Time and Effort
Stop reinventing the wheel. Our data is pre-parsed, eliminating the need for you to write and maintain complex extraction scripts.
Ensure Consistency
Using a standardized data source enhances the comparability and replicability of research across different studies and teams.
Comprehensive Data
Access key sections (MD&A, Risk Factors) and all financial notes from 10-Ks and 10-Qs, covering all firms from 2008 onwards.
Pre-calculated Metrics
Jumpstart your analysis with ready-to-use metrics like readability scores, sentiment, word counts, and more, without any programming.
Novel Word Lists
Utilize new dictionaries for COVID-19 and Human Capital, developed using a unique, event-based methodology for higher relevance.
Open-Source Python Code
Access the full codebase to understand our methodology, verify results, or extend the tools for your own custom research needs.