How Data Preprocessing Impacts Machine Learning Models in Clinical Prediction

1 / 3

How Data Preprocessing Impacts Machine Learning Models in Clinical Prediction

DEV Community·Carlos Peñalver Pérez·19 days ago

#mqMXSrES

#python #datascience #machinelearning #dataset #clinical #class

Reading 0:00

15s threshold

One of the ideas I wanted to explore in this project was simple: how much does data preprocessing really affect the performance of Machine Learning models? In clinical prediction problems, this question becomes especially relevant. A model may achieve good overall accuracy, but still fail to detect the most important cases: patients at risk. For that reason, I wanted to focus not only on accuracy, but also on metrics such as recall, F1-score and the behaviour of the model on minority classes. The datasets For this project, I worked with three public clinical datasets: Diabetes Dataset : used to predict diabetes from variables such as glucose, blood pressure, insulin, BMI and age. Healthcare Stroke Dataset : focused on predicting stroke risk using demographic, clinical and lifestyle-related variables. Thyroid Disease Dataset : related to thyroid disease detection using clinical, hormonal and categorical features. Each dataset presented different challenges.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

How Data Preprocessing Impacts Machine Learning Models in Clinical Prediction