A comprehensive tool for analyzing and summarizing SEC 10-K filings using TF-IDF text summarization techniques. This application extracts, processes, and visualizes financial data from major tech companies (AAPL, GOOG, NVDA, TSLA etc.) directly from the SEC EDGAR database.
- Automated SEC Data Scraping: Fetch 10-K filings directly from SEC EDGAR database
- Intelligent Text Summarization: TF-IDF based summarization with customizable compression levels
- Interactive Web Interface: Streamlit-powered UI for easy interaction
- Financial Data Visualization: Dynamic charts showing key financial metrics over time
- Multi-Section Analysis: Process different sections of 10-K filings (Part I, Item 7A, Item 9A)
- Data Processing Pipeline: Complete workflow from raw HTML to structured analysis
git clone https://github.com/omkarsoak/Financial-Report-Summarizer.git
cd Financial-Report-Summarizerpip install streamlit pandas matplotlib nltk requests beautifulsoup4 html2textimport nltk
nltk.download('punkt')
nltk.download('stopwords')streamlit run src/app.pyThe application will start on http://localhost:8501
- Select Company: Choose from AAPL, GOOG, NVDA, or TSLA
- Set Summarization Level:
0.1-0.9: Large summary (more detailed)1.0: Medium summary (balanced)1.1-2.0: Small summary (highly compressed)
- Generate Summaries: Automatic processing of different 10-K sections
- View Financial Charts: Interactive visualizations of key metrics
- Navigate: Use built-in navigation to move between sections
The application follows this data flow:
SEC EDGAR → HTML Files → Text Extraction → TF-IDF Processing → Summary Generation → Visualization
A simplified version of the app is available at src/tf_idf_summarizer_simplified.py. To run:
python tf_idf_summarizer_simplified.py --ticker NVDA --n 1.5The output is generated in the directory ./tf-idf-summary


