Specializing smaller language models towards multi-step reasoning

Post time The day before yesterday 12:52

Specializing smaller language models towards multi-step reasoningDOI: https://dl.acm.org/doi/10.5555/3618408.3618828
@inproceedings{10.5555/3618408.3618828,
author = {Fu, Yao and Peng, Hao and Ou, Litu and Sabharwal, Ashish and Khot, Tushar},
title = {Specializing smaller language models towards multi-step reasoning},
year = {2023},
publisher = {JMLR.org},
abstract = {The surprising ability of Large Language Models (LLMs) to perform well on complex reasoning with only few-shot chain-of-thought prompts is believed to emerge only in very large-scale models. We show that such abilities can, in fact, be distilled down from GPT-3.5 (≥ 175B) to T5 variants (≤11B). We propose model specialization, to specialize the model's ability towards a target task. The hypothesis is that large models (commonly viewed as larger than 100B) have strong modeling power such that they can perform a large spectrum of tasks. Small models (commonly viewed as smaller than 10B) have limited model capacity, but if we specialize their capacity towards a target task, the model can achieve decent performance improvements. We use multi-step math reasoning as our testbed because it is a very typical emergent ability. We show two important aspects of model abilities: (1) balancing language model's performance on multiple tasks is a delicate matter, as improvements on one task may compromise other tasks; (2) yet by intentionally paying the price of decreased generic ability, we can clearly improve across different model scales smaller than 10B towards a specialized multi-step math reasoning ability. We further give comprehensive discussions about important design choices for better generalization, including the data format mixture and the start model checkpoint. We hope our practice and discoveries can serve as an important attempt towards specialized smaller models in the new research paradigm set by LLMs.},
booktitle = {Proceedings of the 40th International Conference on Machine Learning},
articleno = {420},
numpages = {10},
location = {Honolulu, Hawaii, USA},
series = {ICML'23}
}

[Other] Specializing smaller language models towards multi-step reasoning

Reply

Same Category

Latest Reply

Announcement

Enthusiastic users today

[Other] Specializing smaller language models towards multi-step reasoning Copy

Reply

Same Category

Latest Reply

Announcement

Enthusiastic users today

[Other] Specializing smaller language models towards multi-step reasoning