Financial 10-K Text Agent
An auditable financial NLP research pipeline for testing whether SEC 10-K disclosures contain out-of-sample predictive information about future volatility and abnormal-return targets.
This project sits at the intersection of financial NLP, empirical asset pricing, rolling out-of-sample validation, and research audit. It is not a RAG demo, generic sentiment classifier, or AI trading bot.
Scale
Current Release Metrics
Main finding
Preregistered Primary Prediction
The preregistered primary prediction specification uses Ridge on future 20-day realized volatility and evaluates ALL_SPLITS Rank IC.
This supports exploratory out-of-sample evidence that 10-K text features contain ranking information about future 20-day realized volatility. The claim is prediction evidence, not tradable alpha.
Model comparison
Best Observed Exploratory Prediction
The strongest observed model-comparison result is reported separately from the preregistered primary claim.
| Model | Target | Rank IC | NW t-stat | RMSE |
|---|---|---|---|---|
| XGBoost | realized_volatility_1_20 |
0.3133 | 6.8479 | 0.00834 |
This is exploratory model-comparison evidence, not the preregistered primary claim.
Pipeline
From SEC Filing to Audited Result
Official 10-K filings and timestamps.
Business, Risk Factors, Legal Proceedings, MD&A.
Future volatility and abnormal-return targets.
Rolling train / validation / test windows.
LM tone plus train-window-only TF-IDF/SVD.
Baselines, Ridge, and XGBoost.
Rank IC, Newey-West, and portfolio diagnostics.
Coverage, manifests, registry, checksums.
Research controls
Audit-Backed Workflow
Leakage Control
Rolling splits, filing-time alignment, label-window checks, and train-window-only vocabularies reduce look-ahead bias.
Model Comparison
Historical mean, industry mean, Ridge, and XGBoost are compared under rolling out-of-sample evaluation.
Multiple Testing
568 tested specifications are disclosed with Bonferroni, Holm, and Benjamini-Hochberg FDR adjustments.
Audit Trail
The package includes audit reports, coverage waterfalls, manifests, vocabulary hashes, prediction-scale checks, and checksums.
Artifacts
Inspect the Evidence
The homepage links to compact public artifacts. Raw filings, API responses, full price panels, and private intermediate datasets are intentionally not committed.
Usage Boundary
This is an applied-grade exploratory research run. It does not claim formal CRSP/WRDS-equivalent asset-pricing evidence, a survivorship-free research-grade universe, a production trading system, proven tradable alpha, or investment advice.
Portfolio outputs are diagnostic only. The preregistered primary portfolio specification did not establish formal tradable alpha.
Contribution
My Contributions
Defined the empirical question, primary target, and preregistered result structure.
Built document, price, label, and split artifacts across a 50-firm 10-K panel.
Implemented dictionary tone and train-window-only TF-IDF/SVD features.
Compared baselines, Ridge, and XGBoost under rolling OOS evaluation.
Added coverage, leakage, multiple-testing, and artifact-integrity reports.
Reported volatility-prediction evidence while treating portfolio results as diagnostics.