How to parse hierarchical revenue segments? #675

abdukhashimov · 2026-03-04T08:03:20Z

abdukhashimov
Mar 4, 2026

I was able to get the revenue segments from xbrl reports and I want to assign parent/child relationship to revenue segments.

My code is like below:

def get_xbrl_data(ticker: str, filing: Filing):
    if filing.period_end_date is None:
        raise ValueError("period_end_date is none")

    company = Company(ticker)
    filings = company.get_filings(
        form=filing.form,
        filing_date=filing.filing_date,
        amendments=False,
        date=filing.period_end_date,
        accession_number=filing.accession_number,
    )

    if filings.empty:
        raise ValueError(f"No filings found for {ticker}")

    filing_obj = filings.get_filing_at(0)
    xb = filing_obj.xbrl()
    if xb is None:
        raise ValueError("xbrl is empty")

    result = (
        xb.query(
            include_contexts=True,
            include_dimensions=True,
            include_element_info=True,
        )
        .by_statement_type("IncomeStatement")
        .by_date_range(
            start_date=calculate_start_date(
                filing.period_end_date,
                filing.form,
            ),
            end_date=filing.period_end_date,
        )
        .by_custom(concept_filter(included_keys=REVENUE_CONCEPTS_PRIORITY))
        .sort_by(
            column="numeric_value",
            ascending=False,
        )
    )

    df = result.to_dataframe().drop_duplicates()

    df.to_csv(f"{ticker}.csv")
    ```



it outputs the revenues like below:
```text
                      dimension                                   label  numeric_value
0                       NaN                                Revenues   2.809500e+10
1  srt:ProductOrServiceAxis  Total revenues from sales and services   2.753200e+10
2  srt:ProductOrServiceAxis                     Automotive Revenues   2.120500e+10
3  srt:ProductOrServiceAxis                        Automotive sales   2.035900e+10
4  srt:ProductOrServiceAxis                      Services and other   3.475000e+09
5  srt:ProductOrServiceAxis           Energy generation and storage   3.415000e+09
6  srt:ProductOrServiceAxis     Energy generation and storage sales   3.281000e+09
7  srt:ProductOrServiceAxis                      Automotive leasing   4.290000e+08
8  srt:ProductOrServiceAxis           Automotive regulatory credits   4.170000e+08
9  srt:ProductOrServiceAxis   Energy generation and storage leasing   1.340000e+08

I want to generate it in a way something like below

                  dimension                                   label  numeric_value
0                       NaN                                Revenues   2.809500e+10
1  srt:ProductOrServiceAxis  Total revenues from sales and services   2.753200e+10
2  srt:ProductOrServiceAxis                     Automotive Revenues   2.120500e+10
3  srt:ProductOrServiceAxis                        Automotive sales   2.035900e+10
4  srt:ProductOrServiceAxis                      Services and other   3.475000e+09
5  srt:ProductOrServiceAxis           Energy generation and storage   3.415000e+09
6  srt:ProductOrServiceAxis     Energy generation and storage sales   3.281000e+09
7  srt:ProductOrServiceAxis                      Automotive leasing   4.290000e+08
8  srt:ProductOrServiceAxis           Automotive regulatory credits   4.170000e+08
9  srt:ProductOrServiceAxis   Energy generation and storage leasing   1.340000e+08

I want it something like below - with parent child relation

🔷 TOTAL REVENUES: $28.095B
│
├─ 🚗 AUTOMOTIVE SEGMENT
│  │
│  ├─ 🔹 Automotive Revenues: $21.205B [Parent]
│  │  ├─ • Automotive sales:              $20.359B [Child]
│  │  ├─ • Automotive leasing:             $0.429B [Child]
│  │  └─ • Automotive regulatory credits:  $0.417B [Child]
│  │
│  └─ 🔹 Services and Other: $3.475B [Child of Automotive Segment]
│     (used vehicles, non-warranty maintenance, collision repair, 
│      parts, paid Supercharging, insurance, retail merchandise)
│
└─ 🔋 ENERGY GENERATION & STORAGE SEGMENT: $3.415B [Parent]
   │
   ├─ • Energy generation and storage sales:   $3.281B [Child]
   └─ • Energy generation and storage leasing: $0.134B [Child]

In the documentation I saw some output like below

Revenue:
  Products                           $298,085M
    iPhone                           $201,183M
    Mac                               $29,357M
    iPad                              $26,694M
    Wearables, Home and Accessories   $40,851M
  Services                            $92,950M

how it is possible to build such hierarchy?

Answered by abdukhashimov

Mar 19, 2026

Later than never

and following is an example file for LLY
LLY_revenue_segments.csv

Workflow is something like below

Group by Dimension
- Group DataFrame rows by the dimension column
- Initialize report counter
Process Each Dimension (loop):
Print Report Header
- Show report number and dimension name
Filter by Ticker
- Split group into:
  - Rows where member starts with <ticker>:
  - All other rows
- Print counts for each
Check for Ticker Data
- If no ticker rows: print "No members found" and continue to next dimension
Build Tree
- Parse full_dimension_label paths from ticker rows
- Construct hierarchical TreeNode structure
- Nest nodes according to path depth
Geographic Check
- If dimensi…

View full answer

dgunning · 2026-03-04T17:43:50Z

dgunning
Mar 4, 2026
Maintainer

Great question! The key insight is that xbrl.query() returns flat facts with no hierarchy. To get the parent-child tree structure, use Statement.to_dataframe() instead — it preserves the XBRL presentation tree and gives you hierarchy columns.

Quick Solution

from edgar import Company

company = Company("TSLA")
filing = company.get_filings(form="10-K").latest()
xbrl = filing.xbrl()

# Use the Statement object — hierarchy is preserved automatically
income = xbrl.statements.income_statement()
df = income.to_dataframe()

The DataFrame includes these hierarchy columns:

Column	What it tells you
`level`	Nesting depth (0=root, 1=section, 2=line item, 3=sub-item)
`abstract`	True for section headers, False for actual values
`parent_concept`	The calculation parent this rolls up to
`parent_abstract_concept`	The presentation section header

Build the Tree

date_cols = [c for c in df.columns if c.startswith("20")]
latest = date_cols[0] if date_cols else None

for _, row in df.iterrows():
    indent = "  " * int(row["level"])
    label = row["label"]
    if row["abstract"]:
        print(f"{indent}{label}:")
    elif latest:
        value = row[latest]
        if value and value == value:  # not NaN
            print(f"{indent}{label:50s} ${value:>15,.0f}")

Find Revenue Children

Use parent_concept to find all line items that roll up to a total:

# Find the total revenue concept
revenue_rows = df[df["label"].str.contains("revenue", case=False, na=False) & ~df["abstract"]]
total_concept = revenue_rows.iloc[-1]["concept"]

# Get all children
children = df[df["parent_concept"] == total_concept]
print(children[["label", latest]].to_string(index=False))

For dimensional breakdowns (segment/geography data), use the detailed view:

df_detailed = income.to_dataframe(view="detailed")

Documentation & Tutorial

We just added docs covering this exact use case:

Guide: Understanding Statement Hierarchy — full walkthrough of hierarchy columns
Notebook: Parse Revenue Segment Hierarchies — runnable Colab notebook with Apple and Microsoft examples
API Reference: to_dataframe() metadata columns

Thanks for raising this — it helped us identify a documentation gap!

1 reply

abdukhashimov Mar 5, 2026
Author

I see the issue, it is TSLA reporting multiple items and probably we are parsing it all as it's related to revenue

(Dollars in millions)	2025	2024	2023	2025 vs 2024 $	2025 vs 2024 %	2024 vs 2023 $	2024 vs 2023 %
Automotive sales	$65,821	$72,480	$78,509	$(6,659)	(9)%	$(6,029)	(8)%
Automotive regulatory credits	1,993	2,763	1,790	(770)	(28)%	973	54%
Automotive leasing	1,712	1,827	2,120	(115)	(6)%	(293)	(14)%
Total automotive revenues	69,526	77,070	82,419	(7,544)	(10)%	(5,349)	(6)%
Services and other	12,530	10,534	8,319	1,996	19%	2,215	27%
Total automotive & services	82,056	87,604	90,738	(5,548)	(6)%	(3,134)	(3)%
Energy generation and storage	12,771	10,086	6,035	2,685	27%	4,051	67%
Total revenues	$94,827	$97,690	$96,773	$(2,863)	(3)%	$917	1%

My intention is to parse only root or root > sub root segments that generate the revenue, not quite sure if it's technically possible.

One more example, with apple case:

        4:Products                                           $113,743,000,000
        4:iPhone                                             $ 85,269,000,000
        4:Mac                                                $  8,386,000,000
        4:iPad                                               $  8,595,000,000
        4:Wearables, Home and Accessories                    $ 11,493,000,000
        4:Services                                           $ 30,013,000,000

I would like to have the products as parent of Mac, Iphone and iPad or only the iPhone, Mac, iPad line with the rest without products

(Dollars in millions)	2025	2024
iPhone®	$85,269	$69,138
Mac®	8,386	8,987
iPad®	8,595	8,088
Wearables, Home and Accessories	11,493	11,747
Services	30,013	26,340
Total net sales	$143,756	$124,300

I know I am asking a lot 😃, can you guide me with regarding to get such response. I have tried couple of different ways, I have one more idea which is to ask LLMs to do the hierarchy but I am not much sure about it yet, as it's taking very long time for thinking

Thanks for answer

dgunning · 2026-03-05T23:38:05Z

dgunning
Mar 5, 2026
Maintainer

Great question — you've found a real gap in edgartools. Here's what's happening and a workaround.

Why the segments come out flat

When edgartools builds dimensional rows (segment members) for a statement, it puts them all at the same nesting level. The XBRL definition linkbase does define a member hierarchy — "Automotive sales" is a child of "Automotive Revenues" — but edgartools doesn't use that hierarchy when building the DataFrame. We have an issue open to fix this properly.

In the meantime, here's a workaround.

Workaround: value-math tree builder

SEC filings must balance — a parent's value always equals the sum of its children. We can use that to reconstruct the tree:

from edgar import Company
from itertools import combinations


def revenue_segments(income_statement, period=None):
    """Build a revenue segment tree from an income statement."""
    df = income_statement.to_dataframe()
    date_cols = [c for c in df.columns if c.startswith('20')]
    period = period or date_cols[0]

    total = df[~df['abstract'] & ~df['dimension']].iloc[0]
    segments = df[(df['concept'] == total['concept']) & df['dimension'] & df[period].notna()]

    items = [(total['label'], float(total[period]))]
    items += [(r['label'], float(r[period])) for _, r in segments.iterrows()]

    tol = max(abs(v) for _, v in items) * 1e-6
    decomps = {}
    for i, (_, vi) in enumerate(items):
        if vi <= 0: continue
        cands = [(j, vj) for j, (_, vj) in enumerate(items) if j != i and 0 < vj < vi]
        for size in range(2, min(len(cands) + 1, 8)):
            for combo in combinations(cands, size):
                if abs(sum(v for _, v in combo) - vi) < tol:
                    decomps.setdefault(i, []).append(frozenset(j for j, _ in combo))

    children = {}
    assigned = set()
    def assign(parent):
        for child_set in sorted(decomps.get(parent, []), key=len):
            if not child_set & assigned:
                children[parent] = sorted(child_set, key=lambda j: items[j][1], reverse=True)
                assigned.update(child_set)
                for c in children[parent]: assign(c)
                return
    assign(0)

    def build(i):
        label, val = items[i]
        return {'label': label, 'value': val,
                'children': [build(c) for c in children.get(i, [])]}
    return build(0)


# Usage
company = Company("TSLA")
filing = company.get_filings(form="10-K").latest()
income = filing.xbrl().statements.income_statement()

tree = revenue_segments(income)

Tesla output:

Revenues: $46.801B
├── Automotive Revenues: $37.256B
│   ├── Automotive sales: $34.990B
│   ├── Automotive regulatory credits: $1.332B
│   └── Automotive leasing: $934M
├── Services and other: $4.896B
└── Energy generation and storage: $4.649B
    ├── Energy generation and storage sales: $4.388B
    └── Energy generation and storage leasing: $261M

Apple output:

Net sales: $383.285B
├── Products: $298.085B
│   ├── iPhone: $200.583B
│   ├── Wearables, Home and Accessories: $39.845B
│   ├── Mac: $29.357B
│   └── iPad: $28.300B
└── Services: $85.200B

The function returns a nested dict with label, value, and children — easy to feed into a tree component or convert to a DataFrame with level/parent columns.

We'll fix edgartools to surface this hierarchy natively from the definition linkbase so you won't need the workaround.

4 replies

abdukhashimov Mar 6, 2026
Author

Thanks for an amazing suggestion, I was lacking knowledge to build the above hierarchy with summation, and probably the python skills too

I have modified your code a bit. I know it works perfectly fine with AAPL and TSLA however it fails with JNJ, that's why I recommend my solution later at night.

abdukhashimov Mar 19, 2026
Author

Later than never

and following is an example file for LLY
LLY_revenue_segments.csv

Workflow is something like below

Group by Dimension
- Group DataFrame rows by the dimension column
- Initialize report counter
Process Each Dimension (loop):
Print Report Header
- Show report number and dimension name
Filter by Ticker
- Split group into:
  - Rows where member starts with <ticker>:
  - All other rows
- Print counts for each
Check for Ticker Data
- If no ticker rows: print "No members found" and continue to next dimension
Build Tree
- Parse full_dimension_label paths from ticker rows
- Construct hierarchical TreeNode structure
- Nest nodes according to path depth
Geographic Check
- If dimension name contains "geographic": skip validation (continue to next dimension)
Validate Revenue
- Sum all tree node values
- Compare to total_revenue within tolerance (1e-6)
- If sums match: print "No overflow detected"
Decompose (if overflow)
- Find combinations of child nodes that sum to parent value
- Build parent-child relationships
- Reconstruct tree with proper hierarchy
- Print resulting tree structure

import pandas as pd
import sys
from dataclasses import dataclass, field
from itertools import combinations
from typing import List, Tuple, Any, Optional, Dict


@dataclass
class TreeNode:
    """Tree node with id, label, value and children."""

    id: str
    label: str
    value: float = 0.0
    children: List["TreeNode"] = field(default_factory=list)

    @staticmethod
    def overflow_exist(nodes: List["TreeNode"], total_revenue: float) -> bool:
        """Check if sum of node values overflows total revenue."""
        parent_revenues = sum(node.value for node in nodes)
        tol = abs(total_revenue) * 1e-6
        return abs(parent_revenues - total_revenue) > tol

    def print(self, indent: int = 0, prefix: str = "") -> None:
        """Print the tree node in a hierarchical format."""
        print(f"{prefix}{'  ' * indent}{self.label}: ${self.value:,.0f}")

        for child in self.children:
            child.print(indent + 1, prefix)


def print_dict_tree(node: Dict[str, Any], indent: int = 0, prefix: str = "") -> None:
    """Print a tree represented as a dictionary."""
    label = node.get("label", "Unknown")
    value = node.get("value")
    children = node.get("children", [])

    if value is not None:
        print(f"{prefix}{'  ' * indent}{label}: ${value:,.0f}")
    else:
        print(f"{prefix}{'  ' * indent}{label}")

    for child in children:
        print_dict_tree(child, indent + 1, prefix)


def parse_full_dimension_path(label: str) -> List[Tuple[str, str]]:
    """Parse full_dimension_label into a list of (key, value) tuples."""

    if pd.isna(label) or not label.strip():
        return [("_others", "_others")]

    def parse_part(part: str) -> Tuple[str, str]:
        if ":" in part:
            key, value = part.split(":", 1)
            return (key.strip(), value.strip())
        return (part, part)

    result = [parse_part(p.strip()) for p in str(label).split(",")]
    return result or [("_others", "_others")]


def build_tree(df: pd.DataFrame, max_depth: Optional[int] = 6) -> List[TreeNode]:
    """Build nested tree from DataFrame with TreeNode structure."""
    root_nodes: List[TreeNode] = []

    def get_or_create_child(children_list: List[TreeNode], node_id: str) -> TreeNode:
        """Find or create a child node by id."""
        for child in children_list:
            if child.id == node_id:
                return child
        new_child = TreeNode(id=node_id, label=node_id)
        children_list.append(new_child)
        return new_child

    def insert(
        path: List[Tuple[str, str]], label: str, value: float, depth: int = 0
    ) -> None:
        if not path or depth != 0:
            return

        _, val = path[0]
        root_node = get_or_create_child(root_nodes, val)

        if len(path) == 1 or (max_depth is not None and depth >= max_depth):
            root_node.value = value
            root_node.label = label
            return

        insert_into_node(root_node, path[1:], label, value, depth + 1)

    def insert_into_node(
        node: TreeNode,
        remaining_path: List[Tuple[str, str]],
        label: str,
        value: float,
        depth: int,
    ) -> None:
        """Insert into a node's children."""
        if not remaining_path or (max_depth is not None and depth >= max_depth):
            node.value = value
            node.label = label
            return

        key, val = remaining_path[0]
        intermediate = get_or_create_child(node.children, key)

        if len(remaining_path) > 1:
            insert_into_node(intermediate, remaining_path[1:], label, value, depth + 1)
            return

        leaf = get_or_create_child(intermediate.children, val)
        leaf.value = value
        leaf.label = label

    for _, row in df.iterrows():
        label_value = row.get("full_dimension_label")
        path = parse_full_dimension_path(
            str(label_value) if label_value is not None else ""
        )
        label = path[-1][1] if path else "Unknown"
        numeric_value = float(row.get("numeric_value", 0) or 0)

        insert(path, label, numeric_value)

    return root_nodes


def decompose(nodes: List[TreeNode], total_revenue: float) -> TreeNode:
    items = [("total_revenue", float(total_revenue))]
    items.extend((node.label, node.value) for node in nodes if node.value > 0)

    tol = total_revenue * 1e-6
    decomps = {}

    for parent_idx, (_, parent_value) in enumerate(items):
        if parent_value <= 0:
            continue

        eligible_children = [
            (child_idx, child_value)
            for child_idx, (_, child_value) in enumerate(items)
            if child_idx != parent_idx and 0 < child_value < parent_value
        ]

        for combo_size in range(2, min(len(eligible_children) + 1, 8)):
            for child_combo in combinations(eligible_children, combo_size):
                if abs(sum(val for _, val in child_combo) - parent_value) > tol:
                    continue

                decomps.setdefault(parent_idx, []).append(
                    frozenset(child_idx for child_idx, _ in child_combo)
                )

    children = {}
    assigned = set()

    def assign(parent):
        for child_set in sorted(decomps.get(parent, []), key=len):
            if child_set & assigned:  # Invalid - skip early
                continue

            # Valid - process and return
            children[parent] = sorted(
                child_set,
                key=lambda idx: items[idx][1],
                reverse=True,
            )

            assigned.update(child_set)
            for c in children[parent]:
                assign(c)
            return

    assign(0)

    def build(idx):
        label, val = items[idx]
        return TreeNode(
            id=str(idx),
            label=label,
            value=val,
            children=[build(child_idx) for child_idx in children.get(idx, [])],
        )

    return build(0)


def main():
    if len(sys.argv) < 3:
        print("Usage: python parse.py <csv_file> <ticker>")
        sys.exit(1)

    csv_file = sys.argv[1]
    ticker = sys.argv[2]
    ticker_prefix = f"{ticker.lower()}:"

    try:
        items = pd.read_csv(csv_file)
    except FileNotFoundError:
        print(f"Error: File '{csv_file}' not found.")
        sys.exit(1)
    except Exception as e:
        print(f"Error reading file: {e}")
        sys.exit(1)

    total_row = items[~items["is_dimensioned"]]
    total_revenue = 0

    if not total_row.empty:
        total_revenue = total_row.iloc[0]["numeric_value"]

    # Numerate reports before grouping
    grouped = items.groupby("dimension")
    report_num = 1

    for dimension, group in grouped:
        print(f"\n{'=' * 70}")
        print(f"Report {report_num}: Dimension: {dimension}")
        print(f"{'=' * 70}")

        # Split group by ticker prefix in member column
        ticker_mask = group["member"].str.startswith(ticker_prefix, na=False)
        ticker_group = group[ticker_mask]
        other_group = group[~ticker_mask]

        print(f"\nTotal items in dimension: {len(group)}")
        print(f"  - {ticker} members: {len(ticker_group)}")
        print(f"  - Other members: {len(other_group)}")

        # Build tree only from ticker members
        if not ticker_group.empty:
            tree = build_tree(pd.DataFrame(ticker_group))

            is_geographic = "geographic" in str(dimension).lower()
            if not is_geographic:
                print("\nValidation:\n")
                overflow = TreeNode.overflow_exist(tree, total_revenue)
                if not overflow:
                    print("  No overflow detected")
                else:
                    print("  Overflow detected - decomposing...\n")
                    result = decompose(tree, total_revenue)
                    result.print()
            else:
                print("\n  Skipped validation (geographic dimension).")
        else:
            print(f"\n  No {ticker} members found in this dimension")

        report_num += 1


if __name__ == "__main__":
    main()

Answer selected by abdukhashimov

abdukhashimov Mar 19, 2026
Author

total_revenue: $17,600,800,000
  Cardiometabolic Health: $13,177,900,000
    Mounjaro: $6,515,100,000
    Zepbound: $3,588,100,000
    Other cardiometabolic health: $1,063,900,000
    Trulicity: $1,051,800,000
    Jardiance: $959,000,000
  Oncology: $2,407,600,000
    Verzenio: $1,470,200,000
    Other oncology: $937,400,000
  Immunology: $1,362,400,000
    Taltz: $901,500,000
    Other immunology: $460,900,000
  Other: $337,200,000
  Neuroscience: $315,700,000

Above is the example output

PS: I have the inner details as well, just did not add to the output

abdukhashimov Mar 19, 2026
Author

@dgunning thanks a lot for the help, it was an amazing guidance for me

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to parse hierarchical revenue segments? #675

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

How to parse hierarchical revenue segments? #675

Uh oh!

Uh oh!

abdukhashimov Mar 4, 2026

Workflow is something like below

Replies: 2 comments · 5 replies

Uh oh!

dgunning Mar 4, 2026 Maintainer

Quick Solution

Build the Tree

Find Revenue Children

Documentation & Tutorial

Uh oh!

abdukhashimov Mar 5, 2026 Author

Uh oh!

dgunning Mar 5, 2026 Maintainer

Why the segments come out flat

Workaround: value-math tree builder

Uh oh!

abdukhashimov Mar 6, 2026 Author

Uh oh!

Uh oh!

abdukhashimov Mar 19, 2026 Author

Workflow is something like below

Uh oh!

abdukhashimov Mar 19, 2026 Author

Uh oh!

abdukhashimov Mar 19, 2026 Author

abdukhashimov
Mar 4, 2026

Replies: 2 comments 5 replies

dgunning
Mar 4, 2026
Maintainer

abdukhashimov Mar 5, 2026
Author

dgunning
Mar 5, 2026
Maintainer

abdukhashimov Mar 6, 2026
Author

abdukhashimov Mar 19, 2026
Author

abdukhashimov Mar 19, 2026
Author

abdukhashimov Mar 19, 2026
Author