Authors:
(1) Zhan Ling, UC San Diego and equal contribution;
(2) Yunhao Fang, UC San Diego and equal contribution;
(3) Xuanlin Li, UC San Diego;
(4) Zhiao Huang, UC San Diego;
(5) Mingu Lee, Qualcomm AI Research and Qualcomm AI Research
(6) Roland Memisevic, Qualcomm AI Research;
(7) Hao Su, UC San Diego.
Table of Links
Motivation and Problem Formulation
Deductively Verifiable Chain-of-Thought Reasoning
Conclusion, Acknowledgements and References
A Deductive Verification with Vicuna Models
C More Details on Answer Extraction
E More Deductive Verification Examples
D Prompts
D.1 Prompt for Direct Reasoning Chain Verification Without Natural Program Format
For the results in Tab. 2 of the main paper, We use “Do you think the above reasoning process is correct? Let’s think step by step.” as the zero-shot prompt to verify an entire reasoning chain at once. We also design a two-shot prompt for reasoning chain verification as shown in Tab. 12, which covers one correct reasoning chain and one incorrect reasoning chain.
D.2 Prompts for Reasoning Chain Generation in the Natural Program Format
To instruct models to generate reasoning chains in the Natural Program format that facilitates step-by-step deductive verification, we have designed four distinct prompts to address different types of problems. These include:
-
Math word problems, as illustrated in Tab. 13, covering GSM8K, MATH, and AddSub datasets.
-
Math word problems with multiple-choice options, illustrated in Tab. 14, covering the AQuA dataset.
-
Date-related problems, illustrated in Tab. 15, covering the Date dataset.
-
Last Letters problems, illustrated in Tab. 16, covering the Last Letters dataset.
D.3 Prompt for Deductive Verification Following Natural Program Format and Step-by-Step Decomposition
We have designed a general one-shot prompt for the deductive verification of a single reasoning step on different datasets, as shown in Tab. 17. This prompt serves to instruct language models to generate the deductive validity of each reasoning step as illustrated in Sec. 4.2 and the top-right box of Fig. 1 of the main paper.
This paper is available on arxiv under CC BY 4.0 DEED license.