VisualQnA

Visual Question and Answering

Visual Question Answering (VQA) is the task of answering open-ended questions based on an image. The input to models supporting this task is typically a combination of an image and a question, and the output is an answer expressed in natural language.

Some noteworthy use case examples for VQA include:

Accessibility applications for visually impaired individuals.
Education: posing questions about visual materials presented in lectures or textbooks. VQA can also be utilized in interactive museum exhibits or historical sites.
Customer service and e-commerce: VQA can enhance user experience by letting users ask questions about products.
Image retrieval: VQA models can be used to retrieve images with specific characteristics. For example, the user can ask “Is there a dog?” to find all images with dogs from a set of images.

Architecture

The VisualQnA example is implemented using the component-level microservices defined in GenAIComps. The flow chart below shows the information flow between different microservices for this example.

---
config:
  flowchart:
    nodeSpacing: 400
    rankSpacing: 100
    curve: linear
  themeVariables:
    fontSize: 50px
---
flowchart LR
    %% Colors %%
    classDef blue fill:#ADD8E6,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
    classDef orange fill:#FBAA60,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
    classDef orchid fill:#C26DBC,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
    classDef invisible fill:transparent,stroke:transparent;
    style VisualQnA-MegaService stroke:#000000

    %% Subgraphs %%
    subgraph VisualQnA-MegaService["VisualQnA MegaService "]
        direction LR
        LVM([LVM MicroService]):::blue
    end
    subgraph UserInterface[" User Interface "]
        direction LR
        a([User Input Query]):::orchid
        Ingest([Ingest data]):::orchid
        UI([UI server<br>]):::orchid
    end


    LVM_gen{{LVM Service <br>}}
    GW([VisualQnA GateWay<br>]):::orange
    NG([Nginx MicroService]):::blue


    %% Questions interaction
    direction LR
    Ingest[Ingest data] --> UI
    a[User Input Query] --> |Need Proxy Server|NG
    a[User Input Query] --> UI
    NG --> UI
    UI --> GW
    GW <==> VisualQnA-MegaService


    %% Embedding service flow
    direction LR
    LVM <-.-> LVM_gen

This example guides you through how to deploy a LLaVA-NeXT (Open Large Multimodal Models) model on Intel Gaudi2, Intel Xeon Scalable Processors and AMD EPYC™ Processors. We invite contributions from other hardware vendors to expand the OPEA ecosystem.

Deployment Options

The VisualQnA service can be effortlessly deployed on Intel Gaudi2 or Intel Xeon Scalable Processors and AMD EPYC™ Processors.

The table below lists currently available deployment options. They outline in detail the implementation of this example on selected hardware.

Category	Deployment Option	Description
On-premise Deployments	Docker compose	VisualQnA deployment on Xeon
		VisualQnA deployment on Gaudi
		VisualQnA deployment on AMD EPYC
		VisualQnA deployment on AMD ROCm
	Kubernetes	Helm Charts
		GMC

Validated Configurations

Deploy Method	LLM Engine	LLM Model	Hardware
Docker Compose	TGI, vLLM	llava-hf/llava-v1.6-mistral-7b-hf	Intel Xeon
Docker Compose	TGI, vLLM	llava-hf/llava-1.5-7b-hf	Intel Gaudi
Docker Compose	TGI, vLLM	llava-hf/llava-v1.6-mistral-7b-hf	AMD EPYC
Docker Compose	TGI, vLLM	Xkev/Llama-3.2V-11B-cot	AMD ROCm
Helm Charts	TGI, vLLM	llava-hf/llava-v1.6-mistral-7b-hf	Intel Gaudi
Helm Charts	TGI, vLLM	llava-hf/llava-v1.6-mistral-7b-hf	Intel Xeon

Name		Name	Last commit message	Last commit date
parent directory ..
assets/img		assets/img
benchmark/performance		benchmark/performance
docker_compose		docker_compose
docker_image_build		docker_image_build
kubernetes		kubernetes
tests		tests
ui		ui
Dockerfile		Dockerfile
README.md		README.md
README_miscellaneous.md		README_miscellaneous.md
visualqna.py		visualqna.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Visual Question and Answering

Table of Contents

Architecture

Deployment Options

Validated Configurations

FilesExpand file tree

VisualQnA

Directory actions

More options

Directory actions

More options

Latest commit

History

VisualQnA

Folders and files

parent directory

README.md

Visual Question and Answering

Table of Contents

Architecture

Deployment Options

Validated Configurations