Menu

Post image 1
Post image 2
1 / 2
0

How I Built a Masking Tool Without Showing AI Any Real Data: Column-wise Shuffling as the Scaffold

DEV Community·J.S_Falcon·23 days ago
#GFvSqLlS
#phase#ai#python#privacy#column#batch
Reading 0:00
15s threshold

TL;DR I never write code or send real data to LLMs — but I built a complete data-masking tool through AI collaboration. The technique: column-wise independent shuffling (Japan PPC's official anonymization method) plus Faker replacement. Four phases: send column names → run shuffling batch → manually craft sample CSV → send sample for Faker batch + structural review. Key discipline: survey naive ideas in industry terminology before having AI implement — that alone compresses code 10x. The output is a tool I trigger by double-click. I never read the Python. 1. The "Can't Send to LLM" Wall Across my field notes, I've kept saying the same things: "Don't send business data to LLMs." "Only sanitized samples go to AI." But how exactly do I sanitize the data? That methodology has never been spelled out. So here it is — a self-asked, self-answered post. I wanted to build a new masking tool.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More