Using the Teacher Age at the time of inference to improve student model
Knowledgeillation (KD) is one of the most effective ways to insert large language models around how low latency is important. KD involves the transfer of knowledge contained in large models (“teachers”) to smaller models (“students”). Sorry about their size, student models are typically more effective than teacher models, but they are often less powerful. In … Read more