I built an API for Korean chemical substance regulations. I wrote about the why and the architecture in a previous post . This post is about how I actually collected the data. The source is data.go.kr , Korea's public data portal — the equivalent of data.gov but with Korean-language documentation and some quirks that took weeks to work through. The target dataset: every chemical substance registered under K-REACH (Korea's chemical regulation framework). About 47,000 substances with regulatory classifications, CAS numbers, Korean/English names, and GHS hazard data. There is no bulk download. The API only supports search queries. You send a search term and get back matching results, paginated at 100 per page. To get everything, I had to figure out a search strategy that would cover the entire database without missing substances and without burning through the daily API call limit. The search problem The API has one useful parameter: searchGubun . Set it to 1 and you can search by substance name.…