How I Built a Masking Tool Without Showing AI Any Real Data: Column-wise Shuffling as the Scaffold

1 / 2

How I Built a Masking Tool Without Showing AI Any Real Data: Column-wise Shuffling as the Scaffold

DEV Community·J.S_Falcon·23 days ago

#GFvSqLlS

#phase #ai #python #privacy #column #batch

Reading 0:00

15s threshold

TL;DR I never write code or send real data to LLMs — but I built a complete data-masking tool through AI collaboration. The technique: column-wise independent shuffling (Japan PPC's official anonymization method) plus Faker replacement. Four phases: send column names → run shuffling batch → manually craft sample CSV → send sample for Faker batch + structural review. Key discipline: survey naive ideas in industry terminology before having AI implement — that alone compresses code 10x. The output is a tool I trigger by double-click. I never read the Python. 1. The "Can't Send to LLM" Wall Across my field notes, I've kept saying the same things: "Don't send business data to LLMs." "Only sanitized samples go to AI." But how exactly do I sanitize the data? That methodology has never been spelled out. So here it is — a self-asked, self-answered post. I wanted to build a new masking tool.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

How I Built a Masking Tool Without Showing AI Any Real Data: Column-wise Shuffling as the Scaffold