In this project, patients' medical data like age, blood pressure, sedimentation rate, race, etc.. are analyzed, visualized, and used to build a model using DT and RF using different techniques to deal with missing values like complete analysis and mean & iterative imputation. all models variations are evaluated using c-index
keywords: risk-model, DT, RF, imputation, c-index, SHAP
Gapminder is a great global project that aims to gather datasets on literally everything. In this project, I use datasets unemployment rate, democracy index, and internet users numbers to answer questions and understand trends.
keywords: pandas, numpy, matplotlib, ILO, Gapminder
In this project, simpson's paradox is explored. It is a phenomenon in probability and statistics, in which a trend appears in several different groups of data but disappears or reverses when these groups are combined. The dataset used is a university admission dataset where female vs males admission rates are explored on different levels (overall rate vs major specific rate) and results change from one level to another