-
Notifications
You must be signed in to change notification settings - Fork 12k
webui : handle PDF input (as text or image) + convert pasted long content to file #13562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool!
const pdf = await pdfjs.getDocument(buffer).promise; | ||
const numPages = pdf.numPages; | ||
const textContentPromises: Promise<TextContent>[] = []; | ||
for (let i = 1; i <= numPages; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just confirming that the pdfjs
really numbers the pages from 1 instead of 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it counting from 1, quite strange but every tutorial online do the same.
For example, on stackoverflow: https://stackoverflow.com/questions/16480469/how-to-display-whole-pdf-not-only-one-page-with-pdf-js

Some notes about this feature (and general UX commentary): In recent builds of the llama.cpp webui, there is some uncertainty as to whether your first user prompt message actually gets sent. There is a bug where sometimes your message instantly disappears when you try sending it, and it's lost, and the model hallucinates something because you've suddenly sent a conversation without a user message. This happens often enough that I often draft my first message in a conversation in a separate text editor and paste it in. Or, I edit inside the web UI and then copy it out to another text editor in case it gets lost. So this feature was a little jarring to me when suddenly it stopped pasting the text and that meant I could no longer edit in the UI directly or see what I pasted. For some reason I did not locate the setting myself so it wasn't until I found this PR that I was able to disable it. It would be better UX if there were some option to insert the pasted text in or even disable the pasted text feature directly from the pasted content item. Also the UI requires you to type some extra text now and you cannot just send a message with the pasted content only. Of course I am grateful I can disable this completely in the settings, but it caused some initial frustration. And, also if the pasted feature is enabled, you cannot paste while the model is generating, which is strange. |
It's very unclear from what you said. Does it disappear entirely from the view? What is the "uncertainty" that you are talking about (i.e. is the reason unknown?) In anyway, a video recording what happened would be much better |
I'd love to see https://github.com/docling-project/docling and this webui integrated together, I don't know how possible that is though. Would any docling folks be interested in this @dolfim-ibm @vagenas ? |
It would definitely be a nice integration. Let’s follow up for it. |
Yes, I agree... I'll see if I can capture a video of it. But here's what happens:
And what happens is that the prompt I just entered never appears as a user message. All I see is the assistant message pending animation, and the response is a hallucination (depends on the model exactly what happens, but usually it responds to the system message). So, the prompt is lost. Why I mention uncertainty is this behaviour does not always happen, and it only happens on the first user message. I don't know exactly how to reproduce it yet, but I'll see if I can. Sometimes it happens several times in a row (when I keep clicking new conversation). Edit: I'm not sure if this is specifically causing it, but clicking "New conversation" several times seems to be helping it to occur... I'll post a video of this shortly in a new issue. See #13622 |
pdfjs-dist is an Apache-2.0 licensed package, so requires a NOTICE file in this repo and distributed package with the license. |
Supersede #11647
After second thought, I think PDF parsing should be a built-in functionality because of its popularity.
This PR adds the ability to: