A starting point for making sense of task structure (in machine learning)
Toward A Mathematical Framework for Computation in Superposition
Decomposing Activations into Features: How Many and How do we Find Them? — A Survey
Searching for a model’s concepts by their shape – a theoretical framework
See my LessWrong profile and Google Scholar page for more.