borgesian autoencoders

Posted December 04, 2024

borges story where interpretability researchers keep raising d_model on their SAE trying to attain true monosemanticity, because e.g. "leapard" isn't monosemantic we need a feature for the specific leopard Dante saw, until there's a feature for all actual and potential entities past and future, but then they realize the SAE can't have a feature for each of its own features so they quit ML to be gnostic hermits.

twitter link

Previous entry: possible lives
Next entry: the most dangerous object