E-Commerce Analytics

Executive Dashboards in Excel · Data Cleaning in Python

An end-to-end analytics project: raw Kaggle dataset through to interactive executive dashboards in Excel.

Key Insights

• £9.46M revenue analysed across 541k transactions
• 64.1% repeat customers generate 92.3% of revenue
• 14.8% order cancellation rate impacting sales

Dashboards

Dashboard 1 — Executive Commercial Overview

A slicer-driven dashboard that lets you drill into any month or view the full year at a glance. KPI cards with month-on-month change indicators, revenue trend, customer activity, AOV tracking, a 3-driver revenue decomposition, operational leakage metrics, and a retention donut.

Select a month, and everything updates. The KPI cards show that month's figures, the change indicators flip between green and red based on trend direction, and the revenue driver equation recalculates to decompose that month's revenue into Customers × Frequency × Basket Size.

Dashboard 2 — Customer & Product Deep Dive

A static full-year view focused on customer segmentation and product concentration. Repeat vs one-time revenue split, top 10 products by revenue, revenue per customer segment, and order frequency distribution.

The Problem

I had a raw transactional dataset from a UK online retailer (541,909 rows) and wanted to answer the kinds of questions a commercial team would actually ask:

How is revenue trending, and what is driving changes month to month?
How much of our revenue depends on repeat customers?
Where are we leaking revenue through cancellations and returns?
Is product revenue concentrated or diversified?
What does the customer frequency distribution look like?

The goal was to build something that looked and felt like a professional BI tool, but built entirely in Excel to demonstrate that you do not always need Tableau or Power BI to deliver strong analytical outputs.

The Dataset

The Online Retail Dataset from Kaggle. All transactions for a UK-based online retailer between December 2010 and December 2011. Each row is a line item on an invoice, with fields for product, quantity, price, customer ID, and country.

The raw data is messy. Cancellations mixed in with sales, returns coded as negative quantities, missing customer IDs, test transactions, bad debt adjustments, and non-product stock codes like postage and bank charges. It needed substantial cleaning before any analysis could happen.

The Process

1. Data Cleaning & Preparation (Python)

The full notebook is in data-cleaning/data_cleaning_notebook.ipynb and a detailed write-up of every decision is in cleaning-notes.md.

The key principle was flag, don't delete. Instead of dropping cancellations and returns, I created boolean flag columns (is_sale, is_cancelled, is_return, is_financial_adjustment, is_non_product, is_guest) so every row is preserved and the data can be filtered dynamically in Excel. This means I can calculate cancellation rate and clean revenue from the same dataset.

Highlights of the cleaning process:

CamelCase column names converted to snake_case (with a bug fix for acronyms like CustomerID which initially converted to customer_i_d)
Non-breaking space characters stripped from product descriptions
~25% of transactions had no customer ID — flagged as guest rather than deleted, preserving ~15% of revenue
5,268 genuine full-row duplicates identified and removed after manual verification
Transaction classification framework built using description pattern matching and stock code format to separate real product sales from financial adjustments, postage, and admin entries
Three revenue columns engineered: net_revenue, return_value, gross_sale_value

2. Data Modelling (Excel Power Pivot)

The cleaned data was loaded into Excel's Data Model through Power Query as fact_transactions. A small number of transformation steps were applied in Power Query to ensure correct data types (booleans, dates, text for ID columns) and to rename the revenue column to net_revenue.

Multiple pivot tables were built on a KPIs sheet, each filtered to different slices:

Monthly revenue breakdown with calculated columns for AOV, Revenue per Customer, Orders per Customer, and MoM % changes
Product revenue pivot showing the top 10 products by net revenue
Customer cohort pivot breaking customers into Repeat vs One-Time segments
Cancellation and return rate calculations from the flag columns

A CUBEVALUE reference table was added to provide slicer-proof monthly data. This reads directly from the Data Model, completely independent of the pivot tables, so charts and subtitle formulas always show the full 12-month trend regardless of which month the slicer has selected.

3. Dashboard 1 — Executive Commercial Overview

The interactive dashboard. Key technical details:

KPI Cards are rounded rectangle shapes with text boxes layered on top. Labels are typed in statically. Values are text boxes linked to GETPIVOTDATA helper cells that return the Grand Total of the filtered pivot — when the slicer selects a month, the Grand Total becomes that month's value.

Change indicators use Excel's icon set conditional formatting (green up-arrow, red down-arrow) applied to cells holding MoM % values. Separate conditional formatting rules change the font colour to match the icon. Fully automatic, no VBA needed. I used Claude to help work through the conditional formatting logic for dynamically colouring the text alongside the icon sets.

Subtitle formulas like "vs £0.51M prior" use INDEX/MATCH against the CUBEVALUE reference table. The formula detects whether a month is selected and shows "--" when no month is filtered. Working through the CUBEVALUE syntax for the Data Model pivots involved some troubleshooting with Claude, particularly getting the dimension member strings and filter criteria to match correctly.

Charts reference the CUBEVALUE column so they always display all 12 months. The revenue trend is a combo chart (area + line) with gradient fill, a dashed target line, and a highlighted endpoint marker.

The revenue driver equation recalculates dynamically. When a single month is selected, it shows that month's values from the reference table. When no month is selected, it divides the Grand Totals (total revenue / total invoices for AOV, total invoices / total customers for frequency) rather than summing monthly averages, which would be mathematically incorrect.

4. Dashboard 2 — Customer & Product Deep Dive

A simpler static dashboard for the full year, deliberately isolated from Dashboard 1's slicers. It pulls from the Cohort Analysis sheet (no pivot tables, no slicer connections) and the product pivot. Stacked bars, horizontal bar charts, and a histogram — all standard Excel charts formatted with the same dark colour palette.

What I Learned

GETPIVOTDATA with Data Model pivots is essential when pivot tables collapse under slicer filtering. Fixed cell references break because rows physically shift. GETPIVOTDATA finds values by field name regardless of layout.

CUBEVALUE is the answer when you need data completely independent of pivot filters. It reads the Data Model directly, and you can make it dynamic by concatenating cell references into the dimension strings. Getting the syntax right for boolean filter fields (is_sale, is_cancelled) took some experimentation.

You cannot sum averages. When the revenue driver equation shows the full year, AOV and Orders per Customer must be recalculated from the totals (total revenue / total invoices), not summed across 12 monthly values. This is a basic but easy-to-miss analytical principle.

MoM % comparison direction matters. The change indicators compare the current month's MoM % minus the prior month's MoM %. If the current month growth is -13% and last month was +0.5%, the difference is -13.5pp, so the arrow points down. Simply showing last month's +0.5% as a green up-arrow would be misleading.

Separating data from presentation makes everything easier. Keeping the raw data, pivot calculations, and visual dashboard on separate sheets means you can update source data without breaking the layout.

Tools Used

Python	Data cleaning and enrichment (pandas). Full notebook in the repo.
Power Query	Loading the cleaned CSV into Excel's Data Model with correct data types.
Power Pivot	Data Model for distinct counts, CUBEVALUE reference table.
Excel	Pivot tables, GETPIVOTDATA, CUBEVALUE, INDEX/MATCH, conditional formatting, shapes, charts.

No VBA or macros in the final dashboards. Everything runs on native Excel formulas and features.

Repository Contents

├── README.md
├── Dashboards.xlsx
├── screenshots/
│   ├── dashboard1-full-year.png
│   ├── dashboard1-filtered-march.png
│   ├── dashboard2-customer-product.png
│   ├── power-query-columns-left.png
│   ├── power-query-columns-right.png
│   └── dashboard-demo.gif
├── data-cleaning/
│   ├── data_cleaning_notebook.ipynb
│   └── cleaning-notes.md
├── data/
│   └── cleaned_data_sample.csv
└── docs/
    ├── data-dictionary.md
    └── methodology.md

Note: The Dashboards Excel workbook is included with the raw data removed to reduce file size. It contains all pivot tables, KPI calculations, the CUBEVALUE reference table, the Cohort Analysis sheet, and both dashboards with their shapes, charts, and formulas intact. The raw dataset is freely available on Kaggle and can be reloaded through the existing Power Query connection. A sample of the cleaned data from the Python output has also been added.

About Me

I am a Junior Analyst based in Nottingham, transitioning into data analytics after seven years as a self-employed personal trainer. I hold a Level 3 Diploma in Data Analytics with Power BI and DataCamp certifications, and I have been building self-directed projects in SQL, Power BI, and Excel to develop practical skills. I am ready to apply these skills in a work environment.

This project represents the kind of work I want to do professionally: taking messy real-world data, cleaning it properly, modelling it for analysis, and presenting it in a format that decision-makers can actually use.

Connect with me on LinkedIn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

E-Commerce Analytics

Executive Dashboards in Excel · Data Cleaning in Python

Key Insights

Dashboards

Dashboard 1 — Executive Commercial Overview

Dashboard 2 — Customer & Product Deep Dive

The Problem

The Dataset

The Process

1. Data Cleaning & Preparation (Python)

2. Data Modelling (Excel Power Pivot)

3. Dashboard 1 — Executive Commercial Overview

4. Dashboard 2 — Customer & Product Deep Dive

What I Learned

Tools Used

Repository Contents

About Me

About

Uh oh!

Releases 1

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
data-cleaning		data-cleaning
data		data
docs		docs
screenshots		screenshots
Dashboards.xlsx		Dashboards.xlsx
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

E-Commerce Analytics

Executive Dashboards in Excel · Data Cleaning in Python

Key Insights

Dashboards

Dashboard 1 — Executive Commercial Overview

Dashboard 2 — Customer & Product Deep Dive

The Problem

The Dataset

The Process

1. Data Cleaning & Preparation (Python)

2. Data Modelling (Excel Power Pivot)

3. Dashboard 1 — Executive Commercial Overview

4. Dashboard 2 — Customer & Product Deep Dive

What I Learned

Tools Used

Repository Contents

About Me

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors

Uh oh!

Languages