In this project I used a grocery database to answer five business questions. By using SQL to clean, change, and output tables from the database I was able to get the accurate and required data samples to answer the questions. After this the tables were exported to Excel where I created charts and graphs to visualize the data.
- Which product category generates the highest total sales across all countries over the months?
- Which 10 United States cities had the highest sales?
- Which Month had the highest sales?
- Which Day had the highest sales?
- What percentage of items sold include discounts?
https://www.kaggle.com/datasets/andrexibiza/grocery-sales-dataset
- queries/ -> written SQL code (queries & schema)
- output tables/ -> table outputs from code
- finished database screenshots/ -> screenshots of the database after all of my changes
- Grocery Dataset Project Charts & Tables -> Excel workbook containing tables, graphs, and charts
SELECT new_Products.Category, SUM(sales.Quantity * (new_Products.Price * (1 - sales.Discount))) AS total_sales FROM new_Products LEFT JOIN sales ON new_Products.ProductID = sales.ProductID GROUP BY new_Products.Category ORDER BY SUM(sales.Quantity * (new_Products.Price * (1 - sales.Discount))) DESC;
WITH s AS( SELECT new_Products.Category, sales.CustomerID, SUM(sales.Quantity * (new_Products.Price * (1 - sales.Discount))) AS total_sales FROM new_Products LEFT JOIN sales ON new_Products.ProductID = sales.ProductID GROUP BY new_Products.Category, sales.CustomerID ), d AS( SELECT * FROM cities LEFT JOIN customers ON cities.CityID = customers.CityID ) SELECT d.cityName, SUM(s.total_sales) AS sales FROM d LEFT JOIN s ON d.CustomerID = s.CustomerID GROUP BY d.cityName ORDER BY SUM(s.total_sales) DESC LIMIT 10;
- SQL: JOINs, CASE, GROUP BY, and aggregates
- Database: cleaned, created, and altered data
- Excel: conditional formatting, filtering, creating charts, and formulas
- SQL(SQLite)
- DB Browser for SQLite
- Excel