borgesian autoencoders
Posted
borges story where interpretability researchers keep raising d_model
on their SAE trying to attain true monosemanticity, because e.g. "leapard" isn't monosemantic we need a feature for the specific leopard Dante saw, until there's a feature for all actual and potential entities past and future, but then they realize the SAE can't have a feature for each of its own features so they quit ML to be gnostic hermits.