webui : handle PDF input (as text or image) + convert pasted long content to file #13562

ngxson · 2025-05-15T10:32:47Z

Supersede #11647

After second thought, I think PDF parsing should be a built-in functionality because of its popularity.

This PR adds the ability to:

Upload PDF to web UI, it will be parse as either image or text (configurable via Settings)
Pasted long content will be converted into a file (match the UX of Claude)

ggerganov

Very cool!

ggerganov · 2025-05-15T11:29:13Z

tools/server/webui/src/components/useChatExtraContext.tsx

+  const pdf = await pdfjs.getDocument(buffer).promise;
+  const numPages = pdf.numPages;
+  const textContentPromises: Promise<TextContent>[] = [];
+  for (let i = 1; i <= numPages; i++) {


Just confirming that the pdfjs really numbers the pages from 1 instead of 0?

Yes it counting from 1, quite strange but every tutorial online do the same.

For example, on stackoverflow: https://stackoverflow.com/questions/16480469/how-to-display-whole-pdf-not-only-one-page-with-pdf-js

mashdragon · 2025-05-18T03:58:07Z

Some notes about this feature (and general UX commentary):

In recent builds of the llama.cpp webui, there is some uncertainty as to whether your first user prompt message actually gets sent. There is a bug where sometimes your message instantly disappears when you try sending it, and it's lost, and the model hallucinates something because you've suddenly sent a conversation without a user message.

This happens often enough that I often draft my first message in a conversation in a separate text editor and paste it in. Or, I edit inside the web UI and then copy it out to another text editor in case it gets lost. So this feature was a little jarring to me when suddenly it stopped pasting the text and that meant I could no longer edit in the UI directly or see what I pasted. For some reason I did not locate the setting myself so it wasn't until I found this PR that I was able to disable it. It would be better UX if there were some option to insert the pasted text in or even disable the pasted text feature directly from the pasted content item. Also the UI requires you to type some extra text now and you cannot just send a message with the pasted content only.

Of course I am grateful I can disable this completely in the settings, but it caused some initial frustration.

And, also if the pasted feature is enabled, you cannot paste while the model is generating, which is strange.

ngxson · 2025-05-18T07:16:54Z

There is a bug where sometimes your message instantly disappears when you try sending it

It's very unclear from what you said. Does it disappear entirely from the view? What is the "uncertainty" that you are talking about (i.e. is the reason unknown?) In anyway, a video recording what happened would be much better

ericcurtin · 2025-05-18T11:12:03Z

I'd love to see https://github.com/docling-project/docling and this webui integrated together, I don't know how possible that is though. Would any docling folks be interested in this @dolfim-ibm @vagenas ?

dolfim-ibm · 2025-05-18T12:04:40Z

I'd love to see https://github.com/docling-project/docling and this webui integrated together, I don't know how possible that is though. Would any docling folks be interested in this @dolfim-ibm @vagenas ?

It would definitely be a nice integration. Let’s follow up for it.

mashdragon · 2025-05-18T19:08:24Z

@ngxson

Does it disappear entirely from the view? What is the "uncertainty" that you are talking about (i.e. is the reason unknown?) In anyway, a video recording what happened would be much better

Yes, I agree... I'll see if I can capture a video of it. But here's what happens:

I open a new chat
Write a prompt (often, paste something in and add some surrounding instruction text)
Press Enter to submit

And what happens is that the prompt I just entered never appears as a user message. All I see is the assistant message pending animation, and the response is a hallucination (depends on the model exactly what happens, but usually it responds to the system message). So, the prompt is lost.

Why I mention uncertainty is this behaviour does not always happen, and it only happens on the first user message. I don't know exactly how to reproduce it yet, but I'll see if I can. Sometimes it happens several times in a row (when I keep clicking new conversation).

Edit: I'm not sure if this is specifically causing it, but clicking "New conversation" several times seems to be helping it to occur... I'll post a video of this shortly in a new issue. See #13622

prabhu · 2025-05-20T14:31:00Z

pdfjs-dist is an Apache-2.0 licensed package, so requires a NOTICE file in this repo and distributed package with the license.

ngxson added 2 commits May 15, 2025 12:23

webui : handle PDF input (as text or image)

71ac85b

handle the case where pdf image + server without mtmd

f8ed8dc

ngxson requested a review from ggerganov May 15, 2025 10:32

fix bug missing pages

8e92b20

github-actions bot added examples server labels May 15, 2025

ngxson mentioned this pull request May 15, 2025

server : (webui) add support for .pdf file upload #11647

Closed

ngxson requested a review from slaren May 15, 2025 10:38

ngxson changed the title ~~webui : handle PDF input (as text or image)~~ webui : handle PDF input (as text or image) + convert pasted long content to file May 15, 2025

ggerganov approved these changes May 15, 2025

View reviewed changes

slaren approved these changes May 15, 2025

View reviewed changes

ngxson merged commit 3cc1f1f into ggml-org:master May 15, 2025
6 checks passed

xunjieliu mentioned this pull request May 16, 2025

Reddit News Daily 2025-05-16 xunjieliu/reddit-daily-news#76

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

webui : handle PDF input (as text or image) + convert pasted long content to file #13562

webui : handle PDF input (as text or image) + convert pasted long content to file #13562

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

webui : handle PDF input (as text or image) + convert pasted long content to file #13562

webui : handle PDF input (as text or image) + convert pasted long content to file #13562

Uh oh!

Conversation

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!