In deep learning, fine-tuning is an approach to transfer learning in which the parameters of a pre-trained neural network model are trained on new data.[1] Fine-tuning can be done on the entire neural network, or on only a subset of its layers, in which case the layers that are not being fine-tuned are "frozen" (i.e., not changed during backpropagation).[2] A model may also be augmented with "adapters" that consist of far fewer parameters than the original model, and fine-tuned in a parameter-efficient way by tuning the weights of the adapters and leaving the rest of the model's weights frozen.[3]
For some architectures, such as convolutional neural networks, it is common to keep the earlier layers (those closest to the input layer) frozen, as they capture lower-level features, while later layers often discern high-level features that can be more related to the task that the model is trained on.[2][4]
Models that are pre-trained on large, general corpora are usually fine-tuned by reusing their parameters as a starting point and adding a task-specific layer trained from scratch.[5] Fine-tuning the full model is also common and often yields better results, but is more computationally expensive.[6]
^Liu, Haokun; Tam, Derek; Muqeeth, Mohammed; Mohta, Jay; Huang, Tenghao; Bansal, Mohit; Raffel, Colin A (2022). Koyejo, S.; Mohamed, S.; Agarwal, A.; Belgrave, D.; Cho, K.; Oh, A. (eds.). Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning(PDF). Advances in Neural Information Processing Systems. Vol. 35. Curran Associates, Inc. pp. 1950–1965.
^Zeiler, Matthew D; Fergus, Rob (2013). "Visualizing and Understanding Convolutional Networks". ECCV. arXiv:1311.2901.
^Dodge, Jesse; Ilharco, Gabriel; Schwartz, Roy; Farhadi, Ali; Hajishirzi, Hannaneh; Smith, Noah (2020). "Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping". arXiv:2002.06305. {{cite journal}}: Cite journal requires |journal= (help)
^Cite error: The named reference amazon was invoked but never defined (see the help page).
^Yu, Yue; Zuo, Simiao; Jiang, Haoming; Ren, Wendi; Zhao, Tuo; Zhang, Chao (2020). "Fine-Tuning Pre-trained Language Model with Weak Supervision: A Contrastive-Regularized Self-Training Approach". Association for Computational Linguistics. arXiv:2010.07835.