Retail Financial Analytics Pipeline (Python & Parquet)
Project Type
Personal project
Date
Mar 2025
Location
Hosted on Github
Role
Data Analyst
Client Industry
Retail
Tech Stack
Python 3.10+
parquet
pandas
matplotlib
seaborn
numpy
The Client
This is a personal project that i did to exercise my Python and data analysis skills. The dataset used in this analysis was modeled after a real retail dataset but has been fully synthesized for the purpose of this project. All data points are fictional and do not represent any real company. The source CSV files were combined to a PARQUET file to accomodate the volume
The Challenge
This repository contains Python scripts for analyzing retail general ledger (GL) and financial data. The analysis covers data cleaning, aggregation, and visualization of top/bottom GL codes and brands, profit metrics (Gross Profit, EBIT, Net Profit), and key financial ratios (ROA, ROE, GP Margin).
The Solution
Data cleaning and preprocessing of GL transactions
Top 10 GL codes by post amount and monthly trends
Gross Profit, EBIT, Net Profit analysis over time
Profit analysis by brand
Key financial ratios: ROA, ROE, GP Margin
Correlation analysis: Sales vs COGS, Marketing Spend vs Sales
Visualizations: bar charts, line charts, stacked area charts, lollipop charts, scatter plots with regression
Click on the Github image on the right to access the full project repository which includes the dataset, python scripts, the Jupyter notebook and the README files