Building Modern EDA Pipelines with Pingouin - KDnuggets

📰

Building Modern EDA Pipelines with Pingouin - KDnuggets

KDnuggets·https://www.facebook.com/kdnuggets·26 days ago

#datascience #ai #careeradvice #computervision #languagemodels #normality

Reading 0:00

15s threshold

#  Introduction   Anyone who has spent a fair amount of time doing data science may sooner or later learn something: the golden rule of downstream machine learning modeling, known as garbage in, garbage out (GIGO). For example, feeding a linear regression model with highly collinear data, or running ANOVA tests on heteroscedastic variances, is the perfect recipe... for ineffective models that won't learn properly. Exploratory data analysis (EDA) has a lot to say in terms of visualizations like scatter plots and histograms, yet they aren't sufficient when we need rigorous validation of data against the mathematical assumptions needed in downstream analyses or models. Pingouin helps do this by bridging the gap between two well-known libraries in data science and statistics: SciPy and pandas . Further, it can be a great ally to build solid, automated EDA pipelines. This article teaches you how to build a holistic pipeline for rigorous, statistical EDA, validating several important data properties.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Building Modern EDA Pipelines with Pingouin - KDnuggets