You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is an obvious-in-hindsight idea that should have been implemented long ago. It parallels the "sentence-numbering trick" used in the Relevance Extractor.
Currently, DocChatAgent.answer_from_docs(query, passages) (where passages are already relevant extracts from chunks, pulled using the LLM) sends this prompt to LLM:
Answer the QUERY based on the PASSAGES, and append CITE SOURCES you have used, showing for each
source, the SOURCE and EXTRACTS, where EXTRACTS should at most contain the first 3 and last 3 words of each extract.
PASSAGES:
{passages}
QUERY:
{query}
This results in an LLM response that looks like:
In the year 2050, GPT10 was released. Additionally, all countries merged into Lithuania.
SOURCE: wikipedia
EXTRACTS: In the year ... GPT10 was released.
SOURCE: almanac
EXTRACTS: In the year ... merged into Lithuania.
SOURCE: world history, 2070 edition
EXTRACTS: All countries had ... back in 2050
There are many issues with this:
having the LLM generate (even partial) extracts is wasteful (token cost), slow, and results in incomplete extracts (since we're trying to save tokens by only generating the first/last few words)
When the response is long, there may be several references used, but the above scheme results in all the references showing up at the end, rather than more granular references for different parts of the response. So we don't know which parts of the response came from which reference.
This can be much improved by instead doing this:
number the passages sent in the prompt, [1]... [2]... etc
ask the LLM to just cite sources using markdown footnote-notation like [^1][^3], etc
the code should then extract the final fully-detailed cited texts and display them (again in markdown footnote syntax) after the LLM generates its answer.
So the idea is just have LLM generate granular, numerical citations, and let the code extract the detailed source text (so we don't spend LLM token cost on this).
This will result in a response that is much more like a standard footnote or reference format:
In the year 2050, GPT10 was released [^1]. Additionally, all countries merged into Lithuania [^2][^5].
SOURCES:
[^1] wikipedia
In the year 2050, GPT10 was released.
[^2] almanac
In the year 2050, all countries merged into Lithuania.
[^5] world history, 2070 edition
All countries had already become part of Lithuania, back in 2050
Note the granular citations. Also, unlike the existing approach, the citations are detailed, not just snippets, and are not generated by the LLM (they are extracted from the LLM's numerical citations).
The text was updated successfully, but these errors were encountered:
This is an obvious-in-hindsight idea that should have been implemented long ago. It parallels the "sentence-numbering trick" used in the Relevance Extractor.
Currently,
DocChatAgent.answer_from_docs(query, passages)
(wherepassages
are already relevant extracts from chunks, pulled using the LLM) sends this prompt to LLM:This results in an LLM response that looks like:
There are many issues with this:
This can be much improved by instead doing this:
So the idea is just have LLM generate granular, numerical citations, and let the code extract the detailed source text (so we don't spend LLM token cost on this).
This will result in a response that is much more like a standard footnote or reference format:
Note the granular citations. Also, unlike the existing approach, the citations are detailed, not just snippets, and are not generated by the LLM (they are extracted from the LLM's numerical citations).
The text was updated successfully, but these errors were encountered: