Menu

Post image 1
Post image 2
1 / 2
0

HERMES++ answers language queries while predicting roads

DEV Community·Papers Mache·19 days ago
#yFhODzq1
Reading 0:00
15s threshold

The prevailing view has been that autonomous‑driving world models must choose between two extremes: a perception‑only pipeline that reconstructs the current bird’s‑eye‑view (BEV) layout, or a generative model that rolls forward future geometry without a semantic grasp of the scene. HERMES++ demonstrates that a single network can inhabit both roles, answering natural‑language queries while extrapolating the road ahead. Previously, scene‑understanding systems relied on dense BEV encoders tuned for detection and segmentation, whereas future‑prediction work such as point‑cloud roll‑outs treated the problem as a pure geometric sequence, often ignoring high‑level intent. Large language models, meanwhile, excel at reasoning over text but have no built‑in notion of spatial dynamics, leaving a gap between semantic instruction and physical simulation. HERMES++ closes that gap with three key mechanisms.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More