Menu

Post image 1
Post image 2
1 / 2
0

How I Collected 47,000 Chemical Substances From a Korean Government API

DEV Community·Tagg·29 days ago
#8eRDsssl
Reading 0:00
15s threshold

I built an API for Korean chemical substance regulations. I wrote about the why and the architecture in a previous post . This post is about how I actually collected the data. The source is data.go.kr , Korea's public data portal — the equivalent of data.gov but with Korean-language documentation and some quirks that took weeks to work through. The target dataset: every chemical substance registered under K-REACH (Korea's chemical regulation framework). About 47,000 substances with regulatory classifications, CAS numbers, Korean/English names, and GHS hazard data. There is no bulk download. The API only supports search queries. You send a search term and get back matching results, paginated at 100 per page. To get everything, I had to figure out a search strategy that would cover the entire database without missing substances and without burning through the daily API call limit. The search problem The API has one useful parameter: searchGubun . Set it to 1 and you can search by substance name.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More