Menu

📰
0

Need feedback on Two-stage ML approach for detecting and correcting mislabeled entity relationships (meters ↔ transformers)

Reddit r/datascience·u/Zestyclose_Candy6313·about 1 month ago
#sN6SDpzZ
Reading 0:00
15s threshold

Need feedback on Two-stage ML approach for detecting and correcting mislabeled entity relationships (meters ↔ transformers) Hey everyone, I am working on a real-world data quality problem and would appreciate feedback on my modeling approach. Context: I have a dataset of meters and their associated transformers (utility infrastructure). Some of these associations are incorrect, and the goal is to both detect and correct them. Training data: I’m using \~20,000 manually reviewed meter–transformer associations: \- Correct association → label = 1 \- Incorrect association → label = 0 For incorrect cases, I also augment the data with the correct transformer, e.g.: Meter1 | Trans1 | 0 (incorrect) Meter1 | Trans2 | 1 (corrected) Meter2 | Trans3 | 1 (correct) Current baseline: I started with a logistic regression model (class\_weight="balanced" due to \~37% incorrect vs 63% correct). Using a 0.20 threshold gives strong true negative performance (\~98%), but only moderate recall.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More