Skip to main content

Welcome To My

Portfolio Website

www.shafayatsaad.vercel
Back
ProjectsPredictive Modeling for Hospital Readmission Risk Among Cardiovascular Patients

Predictive Modeling for Hospital Readmission Risk Among Cardiovascular Patients

This project presents a comprehensive data-driven framework to predict six-month hospital readmission risk for cardiovascular patients, specifically addressing the challenges of data scarcity in low-resource environments. It documents a complete machine learning workflow, beginning with the manual collection of a novel dataset from 3,867 patients that integrates demographic, clinical, and behavioral variables. The project develops, compares, and validates several predictive models, including gradient boosting (XGBoost), deep neural networks (TabNet), and stacked ensemble classifiers. A strong emphasis is placed on clinical interpretability and transparency through the integration of Explainable AI (XAI) tools like SHAP and LIME, which are used to identify the key factors driving readmission risk. The result is a replicable and interpretable framework designed to aid healthcare providers in identifying high-risk individuals and enabling early intervention. Predicting six-month hospital readmission risk for cardiovascular patients in low-resource environments with data scarcity. Developed a comprehensive machine learning framework using a novel dataset of 3,867 patients, implementing XGBoost, TabNet, and stacked ensemble classifiers with XAI tools (SHAP/LIME) for interpretability. Provided a replicable risk stratification framework to segment patients into low, medium, and high-risk groups, enabling early intervention and aiding clinical decision-making.

11

Tech Stack

7

Key Features

Technologies Used

XGBoost (eXtreme Gradient Boosting)
TabNet (Attentive Interpretable Tabular Learning)
Stacked Ensemble Classifiers
SHAP (SHapley Additive exPlanations)
LIME (Local Interpretable Model-Agnostic Explanations)
SMOTE (Synthetic Minority Oversampling Technique)
MICE (Multivariate Imputation by Chained Equations)
KNN (K-Nearest Neighbors) Imputation
PCA (Principal Component Analysis)
Logistic Regression
Random Forest
Predictive Modeling for Hospital Readmission Risk Among Cardiovascular Patients - image 1

Key Features

  • Based on a novel, manually collected dataset of 3,867 cardiovascular patients, integrating demographic, clinical, and behavioral data.
  • Implements a robust data preprocessing pipeline, including MICE-inspired KNN imputation for missing values and SMOTE to address class imbalance.
  • Develops and compares multiple machine learning models, including XGBoost, TabNet, and advanced stacking ensembles.
  • Employs advanced feature engineering techniques, such as Principal Component Analysis (PCA) and the generation of interaction features.
  • Integrates Explainable AI (XAI) tools (SHAP and LIME) to provide both global (feature importance) and local (patient-level) model interpretability.
  • Provides a framework for risk stratification to segment patients into low, medium, and high-risk groups to support clinical decision-making.
  • Includes analysis of model fairness and generalizability, with evaluations of performance across different patient subgroups.