DOI QR코드

DOI QR Code

Research Trends in Large Language Models and Mathematical Reasoning

초거대 언어모델과 수학추론 연구 동향

  • O.W. Kwon ;
  • J.H. Shin ;
  • Y.A. Seo ;
  • S.J. Lim ;
  • J. Heo ;
  • K.Y. Lee
  • 권오욱 (언어지능연구실) ;
  • 신종훈 (언어지능연구실) ;
  • 서영애 (언어지능연구실) ;
  • 임수종 (언어지능연구실) ;
  • 허정 (언어지능연구실) ;
  • 이기영 (언어지능연구실)
  • Published : 2023.12.01

Abstract

Large language models seem promising for handling reasoning problems, but their underlying solving mechanisms remain unclear. Large language models will establish a new paradigm in artificial intelligence and the society as a whole. However, a major challenge of large language models is the massive resources required for training and operation. To address this issue, researchers are actively exploring compact large language models that retain the capabilities of large language models while notably reducing the model size. These research efforts are mainly focused on improving pretraining, instruction tuning, and alignment. On the other hand, chain-of-thought prompting is a technique aimed at enhancing the reasoning ability of large language models. It provides an answer through a series of intermediate reasoning steps when given a problem. By guiding the model through a multistep problem-solving process, chain-of-thought prompting may improve the model reasoning skills. Mathematical reasoning, which is a fundamental aspect of human intelligence, has played a crucial role in advancing large language models toward human-level performance. As a result, mathematical reasoning is being widely explored in the context of large language models. This type of research extends to various domains such as geometry problem solving, tabular mathematical reasoning, visual question answering, and other areas.

Keywords

Acknowledgement

이 논문은 2023년도 정부(과학기술정보통신부)의 재원으로 정보통신기획평가원의 지원을 받아 수행한 연구임[RS-2023-00216011, 사람처럼 개념적으로 이해/추론이 가능한 복합인공지능 원천기술 연구].

References

  1. J. Kaplan et al., "Scaling laws for neural language models," arXiv preprint, CoRR, 2020, arXiv: 2001.08361.
  2. J. Wei et al., "Emergent Abilities of Large Language Models," arXiv preprint, CoRR, 2022, arXiv: 2206.07682.
  3. W.X. Zhao et al., "A survey of large language models," arXiv preprint, CoRR, 2023, arXiv: 2303.18223.
  4. H. Touvron et al., "LLaMA: Open and efficient foundation language models," arXiv preprint, CoRR, 2023, arXiv: 2302.13971.
  5. S. Zhang et al., "OPT: Open pre-trained transformer language models," arXiv preprint, CoRR, 2022, arXiv: 2205.01068.
  6. S. Black et al., "Gpt-neox-20b: An open-source autoregressive language model," arXiv preprint, CoRR, 2022, arXiv: 2204.06745.
  7. H. Ko et al., "A technical report for Polyglot-Ko: Open-source large-scale korean language models," arXiv preprint. arXiv: 2306.02254.2023.
  8. S. Gunasekar et al., "Textbooks are all you need," arXiv preprint, CoRR, 2023, arXiv: 2306.11644.
  9. L. Ouyang et al., "Training language models to follow instructions with human feedback," arXiv preprint, CoRR, 2022, arXiv: 2203.02155.
  10. J. Wei et al., "Chain-of-thought prompting elicits reasoning in large language models," arXiv preprint, CoRR, 2022, arXiv: 2201.11903.
  11. D. Zhou et al., "Least-to-most prompting enables complex reasoning in large language models," arXiv preprint, CoRR, 2023, arXiv: 2205.10625.
  12. H. Lee et al., "Rlaif: Scaling reinforcement learning from human feedback with ai feedback," arXiv preprint, CoRR, 2023, arXiv: 2309.00261.
  13. T. Shin et al., "Autoprompt: Eliciting knowledge from language models with automatically generated prompts," arXiv preprint, CoRR, 2020, arXiv: 2010.15980.
  14. X.L. Li and P. Liang, "Prefix-tuning: Optimizing continuous prompts for generation," arXiv preprint, CoRR, 2021, arXiv: 2101.00190.
  15. T. Dettmers et al., "LLM.int8(): 8-bit matrix multiplication for transformers at scale," arXiv preprint, CoRR, 2022, arXiv: 2208.07339.
  16. E.J. Hu et al., "LoRA: Low-rank adaptation of large language models," arXiv preprint, CoRR, 2021, arXiv: 2106.09685.
  17. H. Liu et al., "Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning," arXiv preprint, CoRR, 2022, arXiv: 2205.05638.
  18. T. Dettmers et al., "QLoRA: Efficient finetuning of quantized LLMs," arXiv preprint, CoRR, 2023, arXiv: 2305.14314.
  19. P. Lu et al., "A survey of deep learning for mathematical reasoning," in Proc. ACL, (Toronto, Canada), Jul. 2023, pp. 14605-14631.
  20. P. Lu et al., "Inter-GPS: Interpretable geometry problem solving with formal language and symbolic reasoning," in Proc.. ACL&IJCNLP, (Online Only), Aug. 2021, pp. 6774-6786.
  21. D. Dua et al., "DROP: A reading comprehension benchmark requiring discrete reasoning over paragraphs," in Proc. NAACL-HLT, (Minneapolis, MN, USA), June 2019, pp. 2368-2378.
  22. L. Gao et al., "PAL: Program-aided language models," arXiv preprint, CoRR, 2022, arXiv: abs/2211.10435.
  23. C. Zheng et al., "Progressive-hint prompting improves reasoning in large language models," arXiv preprint, CoRR, 2023, arXiv: 2304.09797.
  24. A. Madaan et al., "Self-refine: Iterative refinement with self-feedback," arXiv preprint, CoRR, 2023, arXiv: 2303.17651.
  25. X. Zhao et al., "Automatic model selection with large language models for reasoning," arXiv preprint, CoRR, 2023, arXiv: 2305.14333.
  26. A. Zhou et al., "Solving challenging math word problems using GPT-4 code interpreter with code-based self-verification," arXiv preprint, CoRR, 2023, arXiv: 2308.07921.
  27. Y. Fu et al., "Complexity-based prompting for multistep reasoning," arXiv preprint, CoRR, 2022, arXiv: 2210.00720.
  28. https://paperswithcode.com/sota/arithmeticreasoning-on-gsm8k
  29. T. Kojima et al., "Large language models are zero-shot reasoners," in Proc. NeurIPS, (Virtual Only), Nov. 2022, pp. 22199-22213.
  30. L. Wang et al., "Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models," in Proc. ACL, (Toronto, Canada), Jul. 2023, pp. 2609-2634.
  31. W. Chen et al., "Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks," arXiv preprint, CoRR, 2022, arXiv: 2211.12588.
  32. Y. Fu et al., "Specializing smaller language models towards multi-step reasoning," arXiv preprint, CoRR, 2023, arXiv: 2301.12726.
  33. H. Luo et al., "Wizardmath: Empowering mathematical reasoning for large language models via reinforced evolinstruct," arXiv preprint, CoRR, 2023, arXiv: 2308.09583.
  34. Y. Hao et al., "PGDP5K: A diagram parsing dataset for plane geometry problems," in Proc. ICPR, (Montreal, Canada), Aug. 2022.
  35. M.L. Zhang et al., "Plane geometry diagram parsing," arXiv preprint, CoRR, 2022, arXiv: 2205.09363.
  36. J. Chen et al., "UniGeo: Unifying geometry logical reasoning via reformulating mathematical expression," arXiv preprint, CoRR, 2022, arXiv: 2212.02746.
  37. P. Lu et al., "Dynamic prompt learning via policy gradient for semi-structured mathematical reasoning," arXiv preprint, CoRR, 2022, arXiv: 2209.14610.
  38. Z. Chen et al., "Finqa: A dataset of numerical reasoning over financial data," arXiv preprint, CoRR, 2021, arXiv: 2109.00122.
  39. F. Zhu et al., "TAT-QA: A question answering benchmark on a hybrid of tabular and textual content in finance," arXiv preprint, CoRR, 2021, arXiv: 2105.07624.
  40. Y. Zhao et al., "MultiHiertt: Numerical reasoning over multi hierarchical tabular and textual data," arXiv preprint, CoRR, 2022, arXiv: 2206.01347.
  41. J. Herzig et al., "TaPas: Weakly supervised table parsing via pre-training," arXiv preprint, CoRR, 2020, arXiv: 2004.02349.
  42. F. Liu et al., "DePlot: One-shot visual language reasoning by plot-to-table translation," arXiv preprint, CoRR, 2022, arXiv: 2212.10505.
  43. B. Zhang et al., "Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning," arXiv preprint, CoRR, 2023, arXiv: 2306.02408.
  44. P. Lu et al., "Chameleon: Plug-and-play compositional reasoning with large language models," arXiv preprint, CoRR, 2023, arXiv: 2304.09842.