10000 Server allow /completion and /embedding · Issue #3815 · ggml-org/llama.cpp · GitHub
[go: up one dir, main page]

Skip to content
Server allow /completion and /embedding #3815
Closed
@christianwengert

Description

@christianwengert

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Feature Description

When I start the server as follows:

./server -m wizardlm-70b-v1.0.q4_K_S.gguf --threads 8 -ngl 100  -c 4096 --embedding

and make a request to /embedding

curl --request POST \
    --url http://localhost:8080/embedding \
    --header "Content-Type: application/json" \
    --data '{"content": "Building a website can be done in 10 simple steps:"}'

I get back - as expected the vector of embeddings. Now If I make a request to /completion as follows:

curl --request POST \
    --url http://localhost:8080/completion \
    --header "Content-Type: application/json" \
    --data '{"prompt": "Building a website can be done in 10 simple steps:","n_predict": 128}'

I'd expect the normal completion to still work. But all I get is the embedding of the prompt (I tested it with above examples, and it is the same vector returned in both examples)

Motivation

I guess having both normal completion and the possibiilty to just get embeddings makes sense in a lot of applications with the server.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0