Menu

Post image 1
Post image 2
Post image 3
1 / 3
0

I built 'dfxpy' to reduce repetitive Pandas + ML preprocessing workflows

DEV Community·Sayantan Patra·26 days ago
#4oUcj8Oi
Reading 0:00
15s threshold

Every data project starts with excitement. Then comes: missing values duplicate rows inconsistent column names encoding leakage checks skew analysis outlier handling repetitive preprocessing pipelines After rebuilding the same workflow across notebooks and projects, I decided to create something reusable. So I built dfxpy — an open-source Python package focused on accelerating DataFrame workflows for machine learning, analytics, and research. What dfxpy does Automated Cleaning smart type inference missing value imputation duplicate removal snake_case normalization currency/percentage/date detection categorical encoding ML Preparation feature/target splitting optional scaling target encoding date feature extraction class balancing Diagnostics & Research leakage detection skewness + multicollinearity audits statistical profiling dataset lineage hashing publication-ready LaTeX exports Workflow Utilities reusable transformation pipelines dataframe comparison tools schema validation standalone HTML EDA…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More