One of the major attractions of large language models (LLMS) is that they encoding information about the real world. But the world is constantly changing, and an LLMS information is only as frown the data it was trained on.
Training an LLM can take months even when the task is parallelized across 1,000 servers, so AI researchers have sought alternative ways to update LLMS ‘nowledge. One of these is to directly edit targeted layers of an LLM to improve its performance on a particular knowledge -based task. This is a task safety solution, not a general solution, but it takes hours to implement rather than months.
Existing direct layers editing techniques generally require either manual selection of layers to be edited or a time -consuming procedure to determine the layers where editing will the most good. Last week, at the 2024 meeting of the European Chapter in Association for Computational Linguistics, we presented a new method of automatically selecting layers to be edited, providing more unhappy updates than previously automated methods.
Compared to the preceding method of selecting manual layer it also restricts Regressionor post-upy decline on data that the model has previously handled correctly. On some data sets, our method, which we call Salem (for prominent storage model), reduced regression with an order of magnitude, while offering similar accairy about new data.
Identification of layers
We consider the case where an LLM is fine-tuned on a particular task, such as determining one input-phrase logical or coutnts as proof of or against other. In such cases, the model input is typically a few texts, and output is a decision like “beginning” or “supported”.
In the prior approach to manual choice of layers, known as causal tracking, the first token is taken for each training study to the model, then the first and second, then the first, second and third, and so on. Then the process of one of the model layers is repeated. This two-step analysis, in turn, must be repeated for each layer of the network, a time-consuming procedure.
In our case, we are instead preparing an “editing data set” consisting of input-output-buddies drawn from three groups: (1) the pass Samples for which the existing model emits the correct answers; (2) The fail Samples that the existing model emits the wrong answers; and (3) The adapt Samples that are semantically equivalent with the Fail tests, but differently formulated.
For each sample we calculate the loss between the existing model’s output and the target output and the corresponding Gradients – The changes of model weights that make correct output more likely. Then we average the gradients over each layer of the model and across all training samples. The layer of the highest average gradient – the layer that requires the big change to explain new facts about the world – is what we edit.
Discontinuation
To edit the selected layer, we use the Mend method suggested by Stanford University researchers in 2022. With Mend, another machine learning model, the editor model is trained to essentially take gradits such as input and output parameter editing.
But rather than the raw grades, the model’s input is a low rank approaching the graduers, which reduces the dimension of the data by identifying the axes along which most of the variance occurs. This is something like teasing the underlying causes of the larger gradients, which helps the model generalize better. We also protect against overfitting by aggregating gradies in batches of 10 before calculating their low rank approximation.
We use two training goals to train the editor, one that maximizes the likelihood of correct answers to the input from fail and adapt Set and a minimizes output Divergence on input from pass set. This helps taking regression.
In the original Mend paper, the Stanford scientists used this approach to edit the top three layers of a fine-tuned LLM, a reasonable heuristic for trade in the office’s effectcy, correction of output and prevention of regression. Becuse Salem identifies one layer most implicated in the new model update, it can match Mends performance on new data. But because it changes parameters in a layer rather than three, regression.
Experience
We evaluated Salem on six data sets used to fine-tune LLMs on natural-linguistic-projicing tasks. Four of the data sets had to do with natural language, one of which was a matter of question-strain data set, and one was a data set for the standard LLM task with the next token prediction. For questions and generational tasks, we compared Salem and the baselines of oven different LLM architectures. We measured the benefit using both editing acciMAIR or post-editing accuracy on the new data and drawing, which measures regression on the old data.
On the infernity tasks, Salem matched the editing accuracy of the best artists, but had significantly better features and times better than the second best, which performed two of the data sets. On the other two tasks, Salem finized second on both measures for an approach called edible neural networks (ENN). But ENN requires two copies of an LLM to run at the same time, which is intensive resource. For two of the four LLM architectures we tested, we were unable to run one Becaus of its calculation requirements.
In ongoing work, we (1) explore to enrich the editing data set with better failed samples and their semantic and counterfactual equivalents, (2) a better weight update mechanism to inform the editor loading the full model in memory that we currently do.