Visual Question Answering (VQA) is the task of answering open-ended questions based on an image. The input to models supporting this task is typically a combination of an image and a question, and the output is an answer expressed in natural language.
Some noteworthy use case examples for VQA include:
- Accessibility applications for visually impaired individuals.
- Education: posing questions about visual materials presented in lectures or textbooks. VQA can also be utilized in interactive museum exhibits or historical sites.
- Customer service and e-commerce: VQA can enhance user experience by letting users ask questions about products.
- Image retrieval: VQA models can be used to retrieve images with specific characteristics. For example, the user can ask “Is there a dog?” to find all images with dogs from a set of images.
The VisualQnA example is implemented using the component-level microservices defined in GenAIComps. The flow chart below shows the information flow between different microservices for this example.
---
config:
flowchart:
nodeSpacing: 400
rankSpacing: 100
curve: linear
themeVariables:
fontSize: 50px
---
flowchart LR
%% Colors %%
classDef blue fill:#ADD8E6,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
classDef orange fill:#FBAA60,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
classDef orchid fill:#C26DBC,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
classDef invisible fill:transparent,stroke:transparent;
style VisualQnA-MegaService stroke:#000000
%% Subgraphs %%
subgraph VisualQnA-MegaService["VisualQnA MegaService "]
direction LR
LVM([LVM MicroService]):::blue
end
subgraph UserInterface[" User Interface "]
direction LR
a([User Input Query]):::orchid
Ingest([Ingest data]):::orchid
UI([UI server<br>]):::orchid
end
LVM_gen{{LVM Service <br>}}
GW([VisualQnA GateWay<br>]):::orange
NG([Nginx MicroService]):::blue
%% Questions interaction
direction LR
Ingest[Ingest data] --> UI
a[User Input Query] --> |Need Proxy Server|NG
a[User Input Query] --> UI
NG --> UI
UI --> GW
GW <==> VisualQnA-MegaService
%% Embedding service flow
direction LR
LVM <-.-> LVM_gen
This example guides you through how to deploy a LLaVA-NeXT (Open Large Multimodal Models) model on Intel Gaudi2, Intel Xeon Scalable Processors and AMD EPYC™ Processors. We invite contributions from other hardware vendors to expand the OPEA ecosystem.
The VisualQnA service can be effortlessly deployed on Intel Gaudi2 or Intel Xeon Scalable Processors and AMD EPYC™ Processors.
The table below lists currently available deployment options. They outline in detail the implementation of this example on selected hardware.
| Category | Deployment Option | Description |
|---|---|---|
| On-premise Deployments | Docker compose | VisualQnA deployment on Xeon |
| VisualQnA deployment on Gaudi | ||
| VisualQnA deployment on AMD EPYC | ||
| VisualQnA deployment on AMD ROCm | ||
| Kubernetes | Helm Charts | |
| GMC |
| Deploy Method | LLM Engine | LLM Model | Hardware |
|---|---|---|---|
| Docker Compose | TGI, vLLM | llava-hf/llava-v1.6-mistral-7b-hf | Intel Xeon |
| Docker Compose | TGI, vLLM | llava-hf/llava-1.5-7b-hf | Intel Gaudi |
| Docker Compose | TGI, vLLM | llava-hf/llava-v1.6-mistral-7b-hf | AMD EPYC |
| Docker Compose | TGI, vLLM | Xkev/Llama-3.2V-11B-cot | AMD ROCm |
| Helm Charts | TGI, vLLM | llava-hf/llava-v1.6-mistral-7b-hf | Intel Gaudi |
| Helm Charts | TGI, vLLM | llava-hf/llava-v1.6-mistral-7b-hf | Intel Xeon |


