Skip to content

US Oil Stocks vs EIA Energy Data correlationΒ #2

@kchandan

Description

@kchandan

πŸ“ Architecture Diagram

          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β”‚       Data Sources        β”‚
          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚                   β”‚                    β”‚
Stocks (Yahoo Finance) β”‚   Energy Prices    β”‚
 CSV (XOM, CVX, COP)   β”‚  (EIA API / Excel) β”‚
                       β”‚  WTI, Natural Gas  β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜                    β”‚
                 β”‚                          β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”
         β”‚        Databricks Notebook (ETL)         β”‚
         β”‚------------------------------------------β”‚
         β”‚ - Fetch stock data (CSV)                 β”‚
         β”‚ - Fetch energy data (JSON / Excel)       β”‚
         β”‚ - Normalize schema (date, values)        β”‚
         β”‚ - Handle missing weekends/holidays       β”‚
         β”‚ - Join stock & energy tables on date     β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚
                 β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚ Databricks Delta    β”‚
        β”‚   Managed Tables    β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                  β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚   Analysis & Correlation   β”‚
        β”‚----------------------------β”‚
        β”‚ - Compute returns (stocks) β”‚
        β”‚ - Compute daily deltas     β”‚
        β”‚ - Correlation (Pearson)    β”‚
        β”‚ - Aggregate weekly/monthly β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                  β”‚
     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
     β”‚  Visualization Layer     β”‚
     β”‚--------------------------β”‚
     β”‚ - Line charts (Stock vs  β”‚
     β”‚   Oil/NatGas prices)     β”‚
     β”‚ - Scatter plot (Returns) β”‚
     β”‚ - Heatmap (Correlations) β”‚
     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ”Ή How the Flow Works

  1. Data Sources

    • Stock data from Yahoo Finance (CSV download).
    • Energy prices from EIA API (JSON) or EIA XLS reports.
  2. Databricks Notebook (Python)

    • Ingests both datasets.
    • Standardizes dates & formats.
    • Resamples to daily/weekly granularity.
    • Joins stock & energy data on date.
  3. Delta Tables

    • stocks_raw β†’ raw daily stock prices.
    • energy_raw β†’ raw oil/natural gas prices.
    • finance_energy_corr β†’ correlation results.
  4. Analysis

    • Calculate daily returns for stocks.
    • Compare with daily % change in oil/natgas prices.
    • Run Pearson correlation for each stock vs commodity.
  5. Visualization

    • Time-series line plots (stock vs oil).
    • Scatter plots (daily change vs stock returns).
    • Correlation heatmap across all companies.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions