Retail Financial Analytics Pipeline (Python & Parquet)

Project Type

Personal project

Date

Mar 2025

Location

Hosted on Github

Role

Data Analyst

Client Industry

Retail

Tech Stack

  • Python 3.10+

  • parquet

  • pandas

  • matplotlib

  • seaborn

  • numpy

The Client

This is a personal project that i did to exercise my Python and data analysis skills. The dataset used in this analysis was modeled after a real retail dataset but has been fully synthesized for the purpose of this project. All data points are fictional and do not represent any real company. The source CSV files were combined to a PARQUET file to accomodate the volume

The Challenge

This repository contains Python scripts for analyzing retail general ledger (GL) and financial data. The analysis covers data cleaning, aggregation, and visualization of top/bottom GL codes and brands, profit metrics (Gross Profit, EBIT, Net Profit), and key financial ratios (ROA, ROE, GP Margin).

The Solution

Data cleaning and preprocessing of GL transactions

  • Top 10 GL codes by post amount and monthly trends

  • Gross Profit, EBIT, Net Profit analysis over time

  • Profit analysis by brand

  • Key financial ratios: ROA, ROE, GP Margin

  • Correlation analysis: Sales vs COGS, Marketing Spend vs Sales

  • Visualizations: bar charts, line charts, stacked area charts, lollipop charts, scatter plots with regression

Click on the Github image on the right to access the full project repository which includes the dataset, python scripts, the Jupyter notebook and the README files

Previous
Previous

Workflow-Driven Governance