SAEs trained on the same data don’t learn the same features

In this post, we show that when two TopK SAEs are trained on the same data, with the same batch order but with different random initializations, there are many latents in the first SAE that don’t have a close counterpart in the second, and vice versa. Indeed, when training only about 53% of the features are shared Furthermore, many of these unshared latents are interpretable. We find that narrower SAEs have a higher feature overlap across random seeds, and as the size of the SAE increases, the overlap decreases.

SAEs trained on the same data don’t learn the same features

The Kingdom’s digital transformation showcased at Smart Data & AI Summit

Anthropic launches Claude AI models for US national security

Cursor AI Rockets to $9.9 Billion Valuation with Massive $900 Million...

The AI Control Dilemma: Risks and Solutions

Recomended

The Kingdom’s digital transformation showcased at Smart Data & AI Summit

Anthropic launches Claude AI models for US national security

Cursor AI Rockets to $9.9 Billion Valuation with Massive $900 Million Raise

The AI Control Dilemma: Risks and Solutions

How to Get ChatGPT to Talk Normally

SciSummary Review: I Summarized a Study in Seconds