Why the Variable Name Is the Most Important Feature in Secrets Detection

1 / 2

Why the Variable Name Is the Most Important Feature in Secrets Detection

DEV Community·Patience Mpofu·19 days ago

#jCRhNTn9

#pattern #machinelearning #variable #name #secrets #feature

Reading 0:00

15s threshold

ere's a question that sounds trivial until you think about it carefully. Are these two lines of code equally dangerous? checksum = " d8e8fca2dc0f896fd7cb4cb0031ba249 " password = " d8e8fca2dc0f896fd7cb4cb0031ba249 " Enter fullscreen mode Exit fullscreen mode The string value is identical. The entropy is identical. Every character-level feature is identical. A regex scanner treats them the same. A pure entropy scanner treats them the same. A human security engineer does not treat them the same — not even slightly. The first is almost certainly a file integrity hash. The second is almost certainly an exposed credential. The only difference is the four characters before the equals sign. When I trained my secrets detector and examined the feature importances, the variable name risk score came out at 0.28 — higher than Shannon entropy, higher than all character distribution features, higher than string length. The single most predictive signal for whether a string is a secret is not the string itself.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Why the Variable Name Is the Most Important Feature in Secrets Detection