Idea While reproducing top solutions of a chemistry data competition , I started building a DuckDB community extension for handling chemistry data directly in SQL. What it can do Parse SMILES, InChI, PDB and other chemistry formats directly — no pandas, no RDKit on the side Plug into DuckDB's native CSV/Parquet/Iceberg/S3/HTTP readers, so ingestion + light preprocessing happens in one query Background What is chemistry data, anyway? One of the canonical forms is SMILES , a notation that encodes a molecular structure as a plain string. The standard library for reading and processing SMILES is RDKit . For example, ethanol is CCO in SMILES, and RDKit will give you the molecular formula C2H6O from it. RDKit doesn't stop there — molecular weight, fingerprints, descriptors, substructure search, and so on. It's basically the must-have library for chemistry data work. (Internally, the algorithms come from a bunch of different papers...) So you might say: do we even need a new extension?…