DOI QR코드

DOI QR Code

Towards a small language model powered chain-of-reasoning for open-domain question answering

  • Jihyeon Roh (Language Intelligence Research Group, Electronics and Telecommunications Research Institute) ;
  • Minho Kim (Language Intelligence Research Group, Electronics and Telecommunications Research Institute) ;
  • Kyoungman Bae (Language Intelligence Research Group, Electronics and Telecommunications Research Institute)
  • Received : 2023.08.26
  • Accepted : 2023.12.20
  • Published : 2024.02.20

Abstract

We focus on open-domain question-answering tasks that involve a chain-of-reasoning, which are primarily implemented using large language models. With an emphasis on cost-effectiveness, we designed EffiChainQA, an architecture centered on the use of small language models. We employed a retrieval-based language model to address the limitations of large language models, such as the hallucination issue and the lack of updated knowledge. To enhance reasoning capabilities, we introduced a question decomposer that leverages a generative language model and serves as a key component in the chain-of-reasoning process. To generate training data for our question decomposer, we leveraged ChatGPT, which is known for its data augmentation ability. Comprehensive experiments were conducted using the HotpotQA dataset. Our method outperformed several established approaches, including the Chain-of-Thoughts approach, which is based on large language models. Moreover, our results are on par with those of state-of-the-art Retrieve-then-Read methods that utilize large language models.

Keywords

Acknowledgement

This research was supported by the Institute for Information & communications Technology Planning & Evaluation(IITP) grant funded by the Korea government (MSIT) (no. 2022-0-00369, [Part 4] Development of AI Technology to support Expert Decision-making that can Explain the Reasons/Grounds for Judgment Results based on Expert Knowledge).

References

  1. J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. H. Chi, Q. V. Le, and D. Zhou, Chain-of-Thought prompting elicits reasoning in large language models, Vol. 35, 2022, pp. 24824-24837.
  2. J. Maynez, S. Narayan, B. Bohnet, and R. McDonald, On faithfulness and factuality in abstractive summarization, arXiv preprint, 2020, DOI 10.48550/arXiv.2005.00661
  3. A. Lazaridou, E. Gribovskaya, W. Stokowiec, and N. Grigorev, Internet-augmented language models through few-shot prompting for open-domain question answering, arXiv preprint, 2022, DOI 10.48550/arXiv.2203.05115
  4. H. He, H. Zhang, and D. Roth, Rethinking with retrieval: faithful large language model inference, arXiv preprint, 2022, DOI 10.48550/arXiv.2301.00303
  5. X. Ma, Y. Gong, P. He, H. Zhao, and N. Duan, Query rewriting for retrieval-augmented large language models, arXiv preprint, 2023, DOI 10.48550/arXiv.2305.14283
  6. W. Shi, S. Min, M. Yasunaga, M. Seo, R. James, M. Lewis, L. Zettlemoyer, and W. Yih, REPLUG: retrieval-augmented blackbox language models, arXiv preprint, 2023, DOI 10.48550/arXiv.2301.12652
  7. G. Izacard, P. Lewis, M. Lomeli, L. Hosseini, F. Petroni, T. Schick, J. Dwivedi-Yu, A. Joulin, S. Riedel, and E. Grave, Atlas: Few-shot learning with retrieval augmented language models, arXiv preprint, 2022, DOI 10.48550/arXiv.2208.03299
  8. S. Min, W. Shi, M. Lewis, X. Chen, W. Yih, H. Hajishirzi, and L. Zettlemoyer, Nonparametric masked language modeling, (Findings of the Association for Computational Linguistics), 2023, pp. 2097-2118, DOI 10.18653/v1/2023.findings-acl.132
  9. W. Yu, Retrieval-augmented generation across heterogeneous knowledge, (Proc. NAACL: Human Language Technologies: Student Research Workshop), 2022, pp. 52-58, DOI 10.18653/v1/2022.naacl-srw.7.
  10. E. Perez, P. Lewis, W. Yih, K. Cho, and D. Kiela, Unsupervised question decomposition for question answering, (Proc. Empirical Methods in Natural Language Processing), 2020, pp. 8864-8880.
  11. H. Dai, Z. Liu, W. Liao, X. Huang, Y. Cao, Z. Wu, L. Zhao, S. Xu, W. Liu, N. Liu, S. Li, D. Zhu, H. Cai, L. Sun, Q. Li, D. Shen, T. Liu, and X. Li, AugGPT: leveraging ChatGPT for text data augmentation, arXiv preprint, 2023, DOI 10.48550/arXiv.2302.13007
  12. X. Wang, J. Wei, D. Schuurmans, Q. V. Le, E. H. Chi, S. Narang, A. Chowdhery, and D. Zhou, Self-consistency improves chain of thought reasoning in language models, (International Conference on Learning Representations, Kigali, Rwanda), 2022.
  13. G. Izacard and E. Grave, Leveraging passage retrieval with generative models for open domain question answering, (Proc. of the 16th Conf. of the European Chapter of the Association for Computational Linguistics), 2021, pp. 874-880.
  14. A. Asai, M. Gardner, and H. Hajishirzi, Evidentiality-guided generation for knowledge-intensive NLP tasks, (Proc. of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, WA, USA), 2022, pp. 2226-2243.
  15. S. Hofstatter, J. Chen, K. Raman, and H. Zamani, Fid-light: efficient and effective retrieval-augmented text generation, arXiv preprint, 2022, DOI 10.48550/arXiv.2209.14290
  16. Y. Levine, O. Ram, D. Jannai, B. Lenz, S. Shalev-Shwartz, A. Shashua, K. Leyton-Brown, and Y. Shoham, Huge frozen language models as readers for open-domain question answering, (ICML 2022 Workshop on Knowledge Retrieval and Language Models), 2022.
  17. S. Zheng, J. Huang, and K. C.-C. Chang, Why does ChatGPT fall short in answering questions faithfully? arXiv preprint, 2023, DOI 10.48550/arXiv.2304.10513
  18. Z. Deng, Y. Zhu, Y. Chen, M. Witbrock, and P. Riddle, Interpretable AMR-based question decomposition for multi-hop question answering, arXiv preprint, 2022, DOI 10.48550/arXiv.2206.08486
  19. Y. Liu, S. Yavuz, R. Meng, D. Radev, C. Xiong, and Y. Zhou, HPE: Answering complex questions over text by hybrid question parsing and execution, arXiv preprint, 2023, DOI 10.48550/arXiv.2305.07789
  20. J. Li, M. Ren, Y. Gao, and Y. Yang, Ask to understand: question generation for multi-hop question answering, arXiv preprint, 2022, DOI 10.48550/arXiv.2203.09073
  21. S. Min, V. Zhong, L. Zettlemoyer, and H. Hajishirzi, Multi-hop reading comprehension through question decomposition and rescoring, (Proc. Annual Meeting of the Association for Computational, Linguistics, Florence, Italy), 2019, pp. 6097-6109.
  22. M. Bevilacqua, R. Blloshmi, and R. Navigli, One spring to rule them both: symmetric AMR semantic parsing and generation without a complex pipeline, (Proc. AAAI Technical Track on Speech and Natural Language Processing), Vol. 35, 2021, pp. 12564-12573.
  23. E. Chung and J. G. Park, Sentence-chain based seq2seq model for corpus expansion, EERI J. 39 (2017), no. 4, 455-466.
  24. H. You, R. Sun, Z. Wang, L. Chen, G. Wang, H. A. Ayyubi, K.-W. Chang, and S.-F. Chang, IdealGPT: iteratively decomposing vision and language reasoning via large language models, arXiv preprint, 2023, DOI 10.48550/arXiv.2305.14985
  25. S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. R. Narasimhan, and Y. Cao, ReAct: Synergizing reasoning and acting in language models, (International Conference on Learning Representations), 2022.
  26. W. Yu, Z. Zhang, Z. Liang, M. Jiang, and A. Sabharwal, Improving language models via plug-and-play retrieval feedback, arXiv preprint, 2023, DOI 10.48550/arXiv.2305.14002
  27. P. Lu, B. Peng, H. Cheng, M. Galley, K.-W. Chang, Y. N. Wu, S.-C. Zhu, and J. Gao, Chameleon: Plug-and-play compositional reasoning with large language models, arXiv preprint, 2023, DOI 10.48550/arXiv.2304.09842
  28. Y. Qin, S. Hu, Y. Lin, W. Chen, N. Ding, G. Cui, Z. Zeng, Y. Huang, C. Xiao, C. Han, and Y. R. Fung, Tool learning with foundation models, arXiv preprint, 2023, DOI 10.48550/arXiv.2304.08354
  29. T. Schick, J. Dwivedi-Yu, R. Dessi, R. Raileanu, M. Lomeli, L. Zettlemoyer, N. Cancedda, and T. Scialom, Toolformer: Language models can teach themselves to use tools, arXiv preprint, 2023, DOI 10.48550/arXiv.2302.04761
  30. K. Ma, H. Cheng, X. Liu, E. Nyberg, and J. Gao, Open-domain question answering via chain of reasoning over heterogeneous knowledge, (Findings of the Association for Computational Linguistics: EMNLP), 2022, pp. 5360-5374.
  31. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, BERT: pretraining of Deep Bidirectional Transformers for Language Understanding, (Proc. NAACL-HLT, Minneapolis, MN, USA), 2019, pp. 4171-4186.
  32. T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, Language models are few-shot learners, Adv. Neural Info. Process. Syst. 33 (2020), 1877-1901.
  33. G. Lample and A. Conneau, Cross-lingual language model pretraining, arXiv preprint, 2019, DOI 10.48550/arXiv.1901.07291
  34. T. Kwiatkowski, J. Palomaki, O. Redfield, M. Collins, A. Parikh, C. Alberti, D. Epstein, I. Polosukhin, J. Devlin, K. Lee, and K. Toutanova, Natural questions: a benchmark for question answering research, Trans. Assoc. Comput. Linguist. 7 (2019), 452-466.
  35. Z. Yang, P. Qi, S. Zhang, Y. Bengio, W. W. Cohen, R. Salakhutdinov, and C. D. Manning, HotpotQA: a dataset for diverse, explainable multi-hop question answering, (Proc. Conf. Empirical Methods in Natural Language Processing), 2018, pp. 2369-2380, DOI 10.18653/v1/D18-1259.