Introduction
Transformer networks (Vaswani et al. 2017) have a remarkable ability to capture complex patterns and structures in their training data. Understanding how these neural networks work is not only an inspiring research problem, but also a practical necessity given their widespread deployment to millions of people.
Transformer models consist of two major components: attention layers and MLP layers. While significant progress has been made in understanding attention layers, such as the work on Transformer circuits by (Elhage et al. 2021), the understanding of MLP layers remains limited.
Interestingly, MLP layers are one of the few places in transformer networks where privileged bases can be found (Elhage et al. 2021). These vector bases are favored by the model due to their pointwise non-linear computation. They are referred to as neurons, while the outputs of the activation function (see Figure 1) are referred to as neuron activations. Neural networks tend to use neurons to represent important features (Karpathy, Johnson, and Fei-Fei 2015; Geva et al. 2020, 2022), making them a good starting point for understanding transformers.
In this work, we release a dataset of text snippets that strongly activate MLP neurons in the Llama-3-8B model. We chose the Llama-3-8B model1 for its strong evaluation performance and real-world usefulness.
We show examples of meaningful features discoverable with the dataset, and expect that many more can be found. We also anticipate that automated systems using LLMs could greatly help uncover features from the dataset, as shown in (Bills et al. 2023; Bricken et al. 2023). By open-sourcing our work,2 we enable others to easily create similar datasets for other transformer models. To facilitate exploration of Llama-3 features, we create a simple web interface available at https://neuralblog.github.io/llama3-neurons/neuron_viewer.html.
Dataset
Overview. The dataset includes a total of more than 14 million text snippets from the FineWeb-Edu dataset (Guilherme et al. 2024). There are 32 snippets for each of the 458,752 MLP neurons in the Llama-3-8B model. Each snippet is 64 tokens long and strongly activates the corresponding neuron at the token in the middle of the snippet. See examples at here and here. We go into detail how we construct the dataset in the next section.
Open access. The data can be freely accessed on the Hugging Face llama3-8b-mlp-neurons dataset. Please note that we do not claim any copyright over the text snippets.
Example features. We found interesting features at all layers of the network. At lower layers, we discovered neurons triggered by a single word or subword, such as here and here. In higher layers, we observed neurons activated by more abstract concepts. For example, a neuron in layer 15 is triggered by the idea of “something being removed or relieved to alleviate a negative situation.” Another neuron in layer 24 activates when discussing highly intelligent, smart, and successful individuals. Interestingly, we found a neuron that is highly active when encountering a broken word.
Click to explore other neurons by changing the layer and neuron index.
Method
We used the open-weight Llama-3-8B
base model from Meta, which is a powerful model that can be run on a single GPU. Since Meta did not release their training and evaluation dataset, we instead used the recently released FineWeb-Edu dataset (Guilherme et al. 2024), specifically the sample-10BT
subset, which provides approximately 10 billion tokens.
To collect neuron activations, we randomly sampled segments of 128 tokens from the dataset and fed them into the network. We then recorded the MLP neuron activations for the last 64 tokens of each input segment. To reduce memory and disk usage during data collection, we only kept the top 32 snippets whose 96th token triggered the highest activation for each neuron. Additionally, we only stored the position of each sequence instead of the entire sequence itself, which also helped to significantly reduce memory usage.
Note that the Llama-3-8B model has 32 layers, and each layer contains 14,336 MLP neurons, resulting in a total of 458,752 neurons in the entire network.
In total, we fed 4 million examples into the model and collected the top 32 examples for each MLP neuron across the network. The entire process took approximately 12 hours to complete on an A100 GPU with 48GB of RAM.
In the end, for each neuron, we obtained 32 snippets, each 64 tokens long (32 prefix tokens, the token with the high activation, and 31 suffix tokens). The result is a dataset of more than 14 million text snippets that strongly activate MLP neurons in the middle of each snippet.
Conclusion
We hope the release of a text snippet dataset that highly triggers MLP neurons in the Llama-3-8B model will facilitate research into understanding real-world large language models. We also expect that by using this dataset in conjunction with LLM assistance, we can greatly extend our understanding of Llama-3 neurons. We encourage everyone to visit our neuron viewer page to explore the neurons themselves, which can greatly improve the intuition into how these models work internally.
References
Footnotes
Citation
@misc{nguyen2024neurons,
author = {Nguy\hat{e}n, Th\hat{o}ng},
title = {Llama-3-8B {MLP} {Neurons}},
date = {2024-06-09},
url = {https://neuralblog.github.io/llama3-neurons},
langid = {en},
abstract = {We release a dataset of text snippets that strongly
activate MLP neurons in the Llama-3-8B model. Using this dataset, we
demonstrate examples of meaningful features that can be discovered.
We open-source our work, enabling others to easily create similar
datasets for other transformer models and uncover important
features. We also create a simple web interface to facilitate the
exploration of MLP neurons.}
}