------------------------- METAREVIEW ------------------------ RECOMMENDATION: accept TEXT: The authors introduce an analysis of large language models (LLMs) for conversational recommendation tasks. They construct a new dataset by scraping Reddit and demonstrate that LLMs outperform existing fine-tuned conversational recommendation models without fine-tuning. They also present a number of important findings about LLM-based CRS. Overall, this work presents a good contribution to CIKM this year, and the authors can incorporate the reviewers' comments to improve the quality of this submission. ----------------------- REVIEW 1 --------------------- SUBMISSION: 1929 TITLE: Large Language Models as Zero-Shot Conversational Recommenders AUTHORS: Zhankui He, Zhouhang Xie, Rahul Jha, Harald Steck, Dawen Liang, Yesu Feng, Bodhisattwa Majumder, Nathan Kallus and Julian Mcauley ----------- Strengths ----------- - A newly created dataset for conversational recommendation - An analysis of LLMs for conversational recommendation ----------- Weaknesses ----------- - The paper seems to only provide quantitative analyses of LLMs for the conversational recommendation. The more important question is how LLMs work for the conversational recommendation? - The paper only investigates the behavior of LLMs with zero-shot learning. - The paper lacks investigation with other prompting methods, e.g., CoT. ----------- Overall evaluation ----------- SCORE: 1 (weak accept) ----- TEXT: The paper introduces an analysis of LLMs for the conversational recommendation. To do that, the authors first create a new conversational recommendation dataset and then analyze the behavior of LLMs in the conversational recommendation scenarios. - In Finding 3, LLMs may generate out-of-dataset items. I think removing generated out-of-dataset items from LLMs is not so fair for comparison. The authors may provide results in two settings: removing and w/o removing out-of-dataset items in Figure 4. - Some findings are quite obvious, e.g., Finding 4 and 7. - The analysis should cover important issues of LLMs such as halucination, trust and correctness, etc. ----------- Best paper ----------- SELECTION: no ----------------------- REVIEW 2 --------------------- SUBMISSION: 1929 TITLE: Large Language Models as Zero-Shot Conversational Recommenders AUTHORS: Zhankui He, Zhouhang Xie, Rahul Jha, Harald Steck, Dawen Liang, Yesu Feng, Bodhisattwa Majumder, Nathan Kallus and Julian Mcauley ----------- Strengths ----------- The paper presents a novel way of using LLMs as zero-shot conversational recommenders, adding to the understanding of the capabilities of LLMs in a relatively unexplored area. The creation of the new dataset, Reddit-Movie, is an important contribution to the field, and it helps validate the authors' claims on a large scale. The detailed analysis of the LLMs' mechanisms of recommendation gives valuable insights into the functioning of these models. The identification of evaluation problems in the current CRS models and the presentation of a solution (removal of repeated items) is another strength. ----------- Weaknesses ----------- While the Reddit-Movie dataset is an interesting contribution, it is scraped from a single platform (Reddit), which may limit the diversity of the data. For instance, it may not represent the way people converse on other platforms or in other cultural contexts. The authors mention that LLMs suffer from limitations such as popularity bias and sensitivity to geographical regions. However, the paper does not propose any concrete solutions to these issues. The paper emphasizes the role of LLMs' content/context knowledge in their superior performance, but it does not fully explore how to enhance the collaborative knowledge of these models, which is an important aspect of recommendation systems. The study is largely empirical, and it would be beneficial to have more theoretical insights or models that explain the observed behaviors of the LLMs in the CRS setting. ----------- Overall evaluation ----------- SCORE: 1 (weak accept) ----- TEXT: The paper presents a comprehensive empirical study on the performance of large language models (LLMs) in conversational recommendation tasks. The authors make three main contributions: They constructed a new dataset, Reddit-Movie, for conversational recommendation by scraping Reddit, creating the largest public real-world conversational recommendation dataset to date. In terms of evaluation, the authors discovered that LLMs outperform existing fine-tuned conversational recommendation models, even without fine-tuning, across the new dataset and two existing conversational recommendation datasets. They also observed a repeated item shortcut in current conversational recommendation system (CRS) evaluation protocols, leading to spurious conclusions about current CRS recommendation abilities. The authors performed various probing tasks to investigate the reasons behind the impressive performance of LLMs in conversational recommendation. They found that LLMs mainly utilize their superior content/context knowledge rather than their collaborative knowledge to make recommendations. The authors conclude that while LLMs are effective for CRS tasks, there are challenges in evaluation, datasets, and potential issues such as debiasing in future CRS design with LLMs. ----------- Best paper ----------- SELECTION: no ----------------------- REVIEW 3 --------------------- SUBMISSION: 1929 TITLE: Large Language Models as Zero-Shot Conversational Recommenders AUTHORS: Zhankui He, Zhouhang Xie, Rahul Jha, Harald Steck, Dawen Liang, Yesu Feng, Bodhisattwa Majumder, Nathan Kallus and Julian Mcauley ----------- Strengths ----------- 1. The paper answers an important question about the effectiveness of zero-shot LLMs for conversational recommendation. 2. The methodology used is sound and clearly explained. 3. The authors also release a new, larger conversational dataset for movie recommendations based on Reddit discussions instead of crowdsourced workers. ----------- Weaknesses ----------- 1. It would be interesting to see if LLMs could be made to perform better collaborative-filtering-like recommendations by explicitly prompting them to do so. 2. Reddit discussions from movie-related subreddits tend not to reflect how most people talk about movies. The dataset introduced in the paper is useful for recommendation evaluation but not very much for training conversational models. ----------- Overall evaluation ----------- SCORE: 2 (accept) ----- TEXT: The paper presents a thorough analysis of the zero-shot conversational recommendation capabilities of LLMs. The authors first show that zero-shot LLMs perform state of the art fine-tuned models for CRS. Then they investigate the type of input LLMs can utilize more effectively than previous models. The authors also point out a limitation in previous methods’ evaluation approach, which is that the test items to be predicted were not always disjoint from the items that had already appeared in the conversation. Strengths: The paper answers an important question about the effectiveness of zero-shot LLMs for conversational recommendation. The methodology used is sound and clearly explained. The authors also release a new, larger conversational dataset for movie recommendations based on Reddit discussions instead of crowdsourced workers. Suggestions: It would be interesting to see if LLMs could be made to perform better collaborative-filtering-like recommendations by explicitly prompting them to do so. ----------- Best paper ----------- SELECTION: no