https://github.com/RUCBM/DeepCritic.\n","updatedAt":"2025-05-02T02:42:33.970Z","author":{"_id":"64b7df742f5a966b973e25f7","avatarUrl":"/avatars/e24e7769188d441317b3b7d10ef8fd60.svg","fullname":"Wenkai Yang","name":"Keven16","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":10}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8861172795295715},"editors":["Keven16"],"editorAvatarUrls":["/avatars/e24e7769188d441317b3b7d10ef8fd60.svg"],"reactions":[],"isReport":false}},{"id":"681433ca59578cfe65f5ec83","author":{"_id":"63574878a482286f0dcb2107","avatarUrl":"/avatars/29e471be738916468e8fc48a4d6622e7.svg","fullname":"Yang Liu","name":"AronYang","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1},"createdAt":"2025-05-02T02:54:02.000Z","type":"comment","data":{"edited":true,"hidden":true,"hiddenBy":"","latest":{"raw":"This comment has been hidden","html":"This comment has been hidden","updatedAt":"2025-05-02T16:54:49.438Z","author":{"_id":"63574878a482286f0dcb2107","avatarUrl":"/avatars/29e471be738916468e8fc48a4d6622e7.svg","fullname":"Yang Liu","name":"AronYang","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1}},"numEdits":0,"editors":[],"editorAvatarUrls":[],"reactions":[]}},{"id":"68144444ce3821afb7126698","author":{"_id":"66d9e820f5693ea15f87d271","avatarUrl":"/avatars/2ecb85469832b9f63e760b3c6b1e1598.svg","fullname":"weiyao_ruc","name":"weiweiruc","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1},"createdAt":"2025-05-02T04:04:20.000Z","type":"comment","data":{"edited":true,"hidden":true,"hiddenBy":"","hiddenReason":"Resolved","latest":{"raw":"This comment has been hidden","html":"This comment has been hidden","updatedAt":"2025-05-02T04:51:51.833Z","author":{"_id":"66d9e820f5693ea15f87d271","avatarUrl":"/avatars/2ecb85469832b9f63e760b3c6b1e1598.svg","fullname":"weiyao_ruc","name":"weiweiruc","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1}},"numEdits":0,"editors":[],"editorAvatarUrls":[],"reactions":[]}},{"id":"6814445e4ec15ef597883beb","author":{"_id":"66beae55c55655c71507adc4","avatarUrl":"/avatars/56e63dba6b55cb03734e70c3d1199874.svg","fullname":"AnIdealRing","name":"SmartDazi","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3},"createdAt":"2025-05-02T04:04:46.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Kai is my God in AI.\n\n","html":"
Kai is my God in AI.
\n","updatedAt":"2025-05-02T04:04:46.629Z","author":{"_id":"66beae55c55655c71507adc4","avatarUrl":"/avatars/56e63dba6b55cb03734e70c3d1199874.svg","fullname":"AnIdealRing","name":"SmartDazi","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9776479005813599},"editors":["SmartDazi"],"editorAvatarUrls":["/avatars/56e63dba6b55cb03734e70c3d1199874.svg"],"reactions":[],"isReport":false}},{"id":"6814517cb1da99de9ccb057d","author":{"_id":"62434e02ffb7778797651d50","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1654982689149-62434e02ffb7778797651d50.jpeg","fullname":"Xu Ma","name":"ma-xu","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2},"createdAt":"2025-05-02T05:00:44.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Big congrats!","html":"
Big congrats!
\n","updatedAt":"2025-05-02T05:00:44.729Z","author":{"_id":"62434e02ffb7778797651d50","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1654982689149-62434e02ffb7778797651d50.jpeg","fullname":"Xu Ma","name":"ma-xu","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.45840659737586975},"editors":["ma-xu"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1654982689149-62434e02ffb7778797651d50.jpeg"],"reactions":[],"isReport":false},"replies":[{"id":"68145abe8b0f7c265a9920d7","author":{"_id":"64b7df742f5a966b973e25f7","avatarUrl":"/avatars/e24e7769188d441317b3b7d10ef8fd60.svg","fullname":"Wenkai Yang","name":"Keven16","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":10},"createdAt":"2025-05-02T05:40:14.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Thanks, Xu~","html":"
Thanks, Xu~
\n","updatedAt":"2025-05-02T05:40:14.054Z","author":{"_id":"64b7df742f5a966b973e25f7","avatarUrl":"/avatars/e24e7769188d441317b3b7d10ef8fd60.svg","fullname":"Wenkai Yang","name":"Keven16","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":10}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6485767960548401},"editors":["Keven16"],"editorAvatarUrls":["/avatars/e24e7769188d441317b3b7d10ef8fd60.svg"],"reactions":[],"isReport":false,"parentCommentId":"6814517cb1da99de9ccb057d"}},{"id":"6817b5a4cd7c6beb1e84cff1","author":{"_id":"6817b3d98e267618d9ed28ba","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/0yrGbmdWgDDMohJfEcqvw.png","fullname":"gianni gusi","name":"giustino","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false},"createdAt":"2025-05-04T18:44:52.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"non ho capito\n\n","html":"
non ho capito
\n","updatedAt":"2025-05-04T18:44:52.205Z","author":{"_id":"6817b3d98e267618d9ed28ba","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/0yrGbmdWgDDMohJfEcqvw.png","fullname":"gianni gusi","name":"giustino","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false}},"numEdits":0,"identifiedLanguage":{"language":"it","probability":0.9901523590087891},"editors":["giustino"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/0yrGbmdWgDDMohJfEcqvw.png"],"reactions":[],"isReport":false,"parentCommentId":"6814517cb1da99de9ccb057d"}}]},{"id":"681a6fb26878091dd92da5b2","author":{"_id":"6813ee19c9b224a738fea856","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/g1uPHIKEgWe1ftHGHbo_U.png","fullname":"YJ","name":"yjh415","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false},"createdAt":"2025-05-06T20:23:14.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"I made an audio overview so I can listen to the paper in commute: https://youtu.be/1yQ3iFanmLs?si=lOSp4pbFPzv7Ug3Z\ni hope this doesn't violate community policy to share it here :D ","html":"
I made an audio overview so I can listen to the paper in commute: https://youtu.be/1yQ3iFanmLs?si=lOSp4pbFPzv7Ug3Z
i hope this doesn't violate community policy to share it here :D
\n","updatedAt":"2025-05-06T20:23:14.290Z","author":{"_id":"6813ee19c9b224a738fea856","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/g1uPHIKEgWe1ftHGHbo_U.png","fullname":"YJ","name":"yjh415","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8461275100708008},"editors":["yjh415"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/g1uPHIKEgWe1ftHGHbo_U.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2505.00662","authors":[{"_id":"68142e4a551709da9244e8d1","user":{"_id":"64b7df742f5a966b973e25f7","avatarUrl":"/avatars/e24e7769188d441317b3b7d10ef8fd60.svg","isPro":false,"fullname":"Wenkai Yang","user":"Keven16","type":"user"},"name":"Wenkai Yang","status":"claimed_verified","statusLastChangedAt":"2025-05-02T06:34:18.531Z","hidden":false},{"_id":"68142e4a551709da9244e8d2","name":"Jingwen Chen","hidden":false},{"_id":"68142e4a551709da9244e8d3","user":{"_id":"657a651e1433ea7d44de6397","avatarUrl":"/avatars/ccfc76f94595a38ff4a80f77c911eabf.svg","isPro":false,"fullname":"Yankai Lin","user":"lyk423","type":"user"},"name":"Yankai Lin","status":"admin_assigned","statusLastChangedAt":"2025-05-05T07:44:11.626Z","hidden":false},{"_id":"68142e4a551709da9244e8d4","user":{"_id":"64b8c89052b7353d8c6a1013","avatarUrl":"/avatars/cd59fffe81f6b07b4519540b8ff3d95f.svg","isPro":false,"fullname":"Ji-Rong Wen","user":"jrwen","type":"user"},"name":"Ji-Rong Wen","status":"admin_assigned","statusLastChangedAt":"2025-05-05T07:44:17.877Z","hidden":false}],"publishedAt":"2025-05-01T17:03:17.000Z","submittedOnDailyAt":"2025-05-02T01:12:33.949Z","title":"DeepCritic: Deliberate Critique with Large Language Models","submittedOnDailyBy":{"_id":"64b7df742f5a966b973e25f7","avatarUrl":"/avatars/e24e7769188d441317b3b7d10ef8fd60.svg","isPro":false,"fullname":"Wenkai Yang","user":"Keven16","type":"user"},"summary":"As Large Language Models (LLMs) are rapidly evolving, providing accurate\nfeedback and scalable oversight on their outputs becomes an urgent and critical\nproblem. Leveraging LLMs as critique models to achieve automated supervision is\na promising solution. In this work, we focus on studying and enhancing the math\ncritique ability of LLMs. Current LLM critics provide critiques that are too\nshallow and superficial on each step, leading to low judgment accuracy and\nstruggling to offer sufficient feedback for the LLM generator to correct\nmistakes. To tackle this issue, we propose a novel and effective two-stage\nframework to develop LLM critics that are capable of deliberately critiquing on\neach reasoning step of math solutions. In the first stage, we utilize\nQwen2.5-72B-Instruct to generate 4.5K long-form critiques as seed data for\nsupervised fine-tuning. Each seed critique consists of deliberate step-wise\ncritiques that includes multi-perspective verifications as well as in-depth\ncritiques of initial critiques for each reasoning step. Then, we perform\nreinforcement learning on the fine-tuned model with either existing\nhuman-labeled data from PRM800K or our automatically annotated data obtained\nvia Monte Carlo sampling-based correctness estimation, to further incentivize\nits critique ability. Our developed critique model built on Qwen2.5-7B-Instruct\nnot only significantly outperforms existing LLM critics (including the\nsame-sized DeepSeek-R1-distill models and GPT-4o) on various error\nidentification benchmarks, but also more effectively helps the LLM generator\nrefine erroneous steps through more detailed feedback.","upvotes":54,"discussionId":"68142e4b551709da9244e8f8","githubRepo":"https://github.com/rucbm/deepcritic","ai_summary":"A novel two-stage framework using Qwen2.5-72B-Instruct enhances LLMs' math critique ability by generating detailed step-wise critiques and applying reinforcement learning, resulting in better error identification and refinement.","ai_keywords":["Large Language Models (LLMs)","critique models","Qwen2.5-72B-Instruct","supervised fine-tuning","Monte Carlo sampling","reinforcement learning","DeepSeek-R1-distill","GPT-4o","error identification benchmarks"],"githubStars":41},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64b7df742f5a966b973e25f7","avatarUrl":"/avatars/e24e7769188d441317b3b7d10ef8fd60.svg","isPro":false,"fullname":"Wenkai Yang","user":"Keven16","type":"user"},{"_id":"624f909eac5dd186b01ac3f5","avatarUrl":"/avatars/0aafdb1cbb492fda52a0303031cc6c14.svg","isPro":false,"fullname":"Zebin You","user":"yyyou","type":"user"},{"_id":"66ea8b5d895ee753e5bd1dc2","avatarUrl":"/avatars/58ad1fe54b9cd33f0c90e79f5b7dede8.svg","isPro":false,"fullname":"Liming Wu","user":"Limiww","type":"user"},{"_id":"640dd9a5fdeaae13908208a7","avatarUrl":"/avatars/61f8f1d5f6ef1c4af1f47285e9cc0217.svg","isPro":false,"fullname":"nieshen","user":"nieshen","type":"user"},{"_id":"642c25e68f90c557f7423eb0","avatarUrl":"/avatars/6f86a611994590b871e9ecfed61dc06f.svg","isPro":false,"fullname":"Dong Jing","user":"Timsty","type":"user"},{"_id":"63574878a482286f0dcb2107","avatarUrl":"/avatars/29e471be738916468e8fc48a4d6622e7.svg","isPro":false,"fullname":"Yang Liu","user":"AronYang","type":"user"},{"_id":"65309a1d657ae56cdb65e0e7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/lHQI9RNjfz8E5v1uyCGeV.png","isPro":false,"fullname":"Zhi-Yuan Chen","user":"JaxChen","type":"user"},{"_id":"64f5abc2e8f27f20a067a596","avatarUrl":"/avatars/d0eac39488fac0c9c08d76109cabaa9f.svg","isPro":false,"fullname":"cwt","user":"yiye2023","type":"user"},{"_id":"670740744341dcee459fb990","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/66UkZvrAk7fQr5YCylEFk.png","isPro":false,"fullname":"Rosy24","user":"Rsy24","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"67bc4f0bbcc20fb84d61f4fa","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/9f8n0ZCSP1AktsYh8DK5b.jpeg","isPro":false,"fullname":"Guangxiang Zhao","user":"zhaoguangxiang","type":"user"},{"_id":"66d9e820f5693ea15f87d271","avatarUrl":"/avatars/2ecb85469832b9f63e760b3c6b1e1598.svg","isPro":false,"fullname":"weiyao_ruc","user":"weiweiruc","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":1}">
DeepCritic: Deliberate Critique with Large Language Models
Published on May 1
#1 Paper of the day
Abstract
A novel two-stage framework using Qwen2.5-72B-Instruct enhances LLMs' math critique ability by generating detailed step-wise critiques and applying reinforcement learning, resulting in better error identification and refinement.
As Large Language Models (LLMs) are rapidly evolving, providing accurate
feedback and scalable oversight on their outputs becomes an urgent and critical
problem. Leveraging LLMs as critique models to achieve automated supervision is
a promising solution. In this work, we focus on studying and enhancing the math
critique ability of LLMs. Current LLM critics provide critiques that are too
shallow and superficial on each step, leading to low judgment accuracy and
struggling to offer sufficient feedback for the LLM generator to correct
mistakes. To tackle this issue, we propose a novel and effective two-stage
framework to develop LLM critics that are capable of deliberately critiquing on
each reasoning step of math solutions. In the first stage, we utilize
Qwen2.5-72B-Instruct to generate 4.5K long-form critiques as seed data for
supervised fine-tuning. Each seed critique consists of deliberate step-wise
critiques that includes multi-perspective verifications as well as in-depth
critiques of initial critiques for each reasoning step. Then, we perform
reinforcement learning on the fine-tuned model with either existing
human-labeled data from PRM800K or our automatically annotated data obtained
via Monte Carlo sampling-based correctness estimation, to further incentivize
its critique ability. Our developed critique model built on Qwen2.5-7B-Instruct
not only significantly outperforms existing LLM critics (including the
same-sized DeepSeek-R1-distill models and GPT-4o) on various error
identification benchmarks, but also more effectively helps the LLM generator
refine erroneous steps through more detailed feedback.