[Other] Specializing smaller language models towards multi-step reasoning

Tavneet Post time The day before yesterday 12:52 | Show all posts |Read mode
Reward10points

Specializing smaller language models towards multi-step reasoningDOI: https://dl.acm.org/doi/10.5555/3618408.3618828
@inproceedings{10.5555/3618408.3618828,
author = {Fu, Yao and Peng, Hao and Ou, Litu and Sabharwal, Ashish and Khot, Tushar},
title = {Specializing smaller language models towards multi-step reasoning},
year = {2023},
publisher = {JMLR.org},
abstract = {The surprising ability of Large Language Models (LLMs) to perform well on complex reasoning with only few-shot chain-of-thought prompts is believed to emerge only in very large-scale models. We show that such abilities can, in fact, be distilled down from GPT-3.5 (≥ 175B) to T5 variants (≤11B). We propose model specialization, to specialize the model's ability towards a target task. The hypothesis is that large models (commonly viewed as larger than 100B) have strong modeling power such that they can perform a large spectrum of tasks. Small models (commonly viewed as smaller than 10B) have limited model capacity, but if we specialize their capacity towards a target task, the model can achieve decent performance improvements. We use multi-step math reasoning as our testbed because it is a very typical emergent ability. We show two important aspects of model abilities: (1) balancing language model's performance on multiple tasks is a delicate matter, as improvements on one task may compromise other tasks; (2) yet by intentionally paying the price of decreased generic ability, we can clearly improve across different model scales smaller than 10B towards a specialized multi-step math reasoning ability. We further give comprehensive discussions about important design choices for better generalization, including the data format mixture and the start model checkpoint. We hope our practice and discoveries can serve as an important attempt towards specialized smaller models in the new research paradigm set by LLMs.},
booktitle = {Proceedings of the 40th International Conference on Machine Learning},
articleno = {420},
numpages = {10},
location = {Honolulu, Hawaii, USA},
series = {ICML'23}
}


Reply

Use magic Donate Report

All Reply0 Show all posts

Reply

You have to log in before you can reply Login | Register

Points Rules

Intermediate member
  • post

  • reply

  • points

    380

Latest Reply

Return to the list