I thought this was going to be easy. Take data from TikTok, Instagram, YouTube, Reddit, LinkedIn, Threads, X, and Facebook. Map it into one clean shape. Ship it. Move on. That lasted about a day. By day two, I had already learned the first ugly truth of social data engineering: the hard part is not collecting the data. The hard part is deciding what the same thing even means across platforms. "Likes" are not always likes. "Views" are not always views. Some platforms expose shares publicly, some do not. One platform gives you createTime , another gives you ISO timestamps, another gives you nested objects with three different possible IDs depending on the endpoint. If you're building a social dashboard, creator analytics tool, moderation workflow, or competitor monitoring system, you hit this wall fast. So this post is the version I wish I had read earlier: what broke, what schema I kept, what I stopped trying to normalize, and how I now build a social media JSON schema without lying to myself.…