Code

Risk modelling using tree-based models

Predicting whether a patient will die in 10 years

In this project, patients' medical data like age, blood pressure, sedimentation rate, race, etc.. are analyzed, visualized, and used to build a model using DT and RF using different techniques to deal with missing values like complete analysis and mean & iterative imputation. all models variations are evaluated using c-index

keywords: risk-model, DT, RF, imputation, c-index, SHAP

code

Democracy, unemployment, and other things

working on Gapminder datasets to draw conclusions

Gapminder is a great global project that aims to gather datasets on literally everything. In this project, I use datasets unemployment rate, democracy index, and internet users numbers to answer questions and understand trends.

keywords: pandas, numpy, matplotlib, ILO, Gapminder

Simson's paradox

When numbers lie!

In this project, simpson's paradox is explored. It is a phenomenon in probability and statistics, in which a trend appears in several different groups of data but disappears or reverses when these groups are combined. The dataset used is a university admission dataset where female vs males admission rates are explored on different levels (overall rate vs major specific rate) and results change from one level to another