The "Robust" Data Scientist: Winning with Messy Data and Pingouin - KDnuggets

📰

The "Robust" Data Scientist: Winning with Messy Data and Pingouin - KDnuggets

KDnuggets·https://www.facebook.com/kdnuggets·about 1 month ago

#datascience #ai #careeradvice #computervision #languagemodels #test

Reading 0:00

15s threshold

Image by Editor   #  Introduction   A harsh truth to begin with: textbook data science usually becomes a lie in the real world. Concepts and techniques are taught on finely curated, beautifully bell-curved data variables, but as soon as we venture into the wild of real projects, we are hit with lots of outliers, unduly skewed distributions, and indomitable variances. A previous article on building an exploratory data analysis (EDA) pipeline with Pingouin showed how to detect, through tests, cases when the data violates a variety of assumptions like homoscedasticity and normality. But what if the tests fail? Throwing the data away isn't the solution: turning robust is. This article uncovers the craftsmanship of using robust statistics in data science processes. These are mathematical methods particularly built to yield reliable and valid results even when the data does not meet classical assumptions or is pervaded by outliers and noise.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

The "Robust" Data Scientist: Winning with Messy Data and Pingouin - KDnuggets