0% found this document useful (0 votes)

747 views7 pages

Getting Started With Gemini - Prompt Engineering Guide

Gemini is Google's latest AI model with multimodal capabilities, achieving state-of-the-art performance on various benchmarks, including MMLU and MMMU. It comes in three sizes: Ultra, Pro, and Nano, each designed for different tasks and efficiency levels. The model excels in crossmodal reasoning, handling text, images, audio, and video, and is capable of advanced tasks like coding, reasoning, and information extraction.

Uploaded by

kevin.get.back.to.work

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

747 views7 pages

Getting Started With Gemini - Prompt Engineering Guide

Uploaded by

kevin.get.back.to.work

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

🚀 Master Prompt Engineering and building AI Agents in our NEW courses!

Use PROMPTING20 for 20% off ➜ Enroll now It is the first model to achieve human-expert performance on MMLU (a popular exam benchmark), and claim state of the art in 20
multimodal benchmarks. Gemini Ultra achieves 90.0% on MMLU and 62.4% on the MMMU benchmark which requires college-level
Models Gemini subject knowledge and reasoning.
Getting Started with Gemini The Gemini models are trained to support 32k context length and built of top of Transformer decoders with efficient attention
mechanisms (e.g., multi-query attention). They support textual input interleaved with audio and visual inputs and can produce text
In this guide, we provide an overview of the Gemini models and how to effectively prompt and use them. The guide also includes and image outputs.
capabilities, tips, applications, limitations, papers, and additional reading materials related to the Gemini models.

Introduction to Gemini #
Gemini is the newest most capable AI model from Google Deepmind. It's built with multimodal capabilities from the ground up and
can showcases impressive crossmodal reasoning across texts, images, video, audio, and code.
Gemini comes in three sizes:
Ultra - the most capable of the model series and good for highly complex tasks
Pro - considered the best model for scaling across a wide range of tasks
Nano - an efficient model for on-device memory-constrained tasks and use-cases; they include 1.8B (Nano-1) and 3.25B
(Nano-2) parameters models and distilled from large Gemini models and quantized to 4-bit.
According to the accompanying technical report, Gemini advances state of the art in 30 of 32 benchmarks covering tasks such as
language, coding, reasoning, and multimodal reasoning.
The models are trained on both multimodal and multilingual data such as web documents, books, and code data, including images,
audio, and video data. The models are trained jointly across all modalities and show strong crossmodal reasoning capabilities and

even strong capabilities in each domain.

Gemini Experimental Results

Gemini Ultra achieves highest accuracy when combined with approaches like chain-of-thought (CoT) prompting and self-
consistency which helps dealing with model uncertainty.
As reported in the technical report, Gemini Ultra improves its performance on MMLU from 84.0% with greedy sampling to 90.0%
with uncertainty-routed chain-of-thought approach (involve CoT and majority voting) with 32 samples while it marginally improves
to 85.0% with the use of 32 chain-of-thought samples only. Similarly, CoT and self-consistency achieves 94.4% accuracy on the
GSM8K grade-school math benchmark. In addition, Gemini Ultra correctly implements 74.4% of the HumanEval code completion
problems. Below is a table summarizing the results of Gemini and how the models compare to other notable models.
capabilities also transfer across a diverse set of global language (e.g., generating image descriptions using languages like Hindi and
The Gemini Nano Models also show strong performance on factuality (i.e. retrieval-related tasks), reasoning, STEM, coding, Romanian).
multimodal and multilingual tasks.
Besides standard multilingual capabilities, Gemini shows great performance on multilingual math and summarization benchmarks
Text Summarization
like MGSM and XLSum, respectively. While Gemini is trained as a multimodal system it possess many of the capabilities present in modern large language models like
GPT-3.5, Claude, and Llama. Below is an example of a simple text summarization task using Gemini Pro. We are using Google AI
The Gemini models are trained on a sequence length of 32K and are found to retrieve correct values with 98% accuracy when Studio for this example with a temperature value of 0.
queried across the context length. This is an important capability to support new use cases such as retrieval over documents and
video understanding. Prompt:
The instruction-tuned Gemini models are consistently preferred by human evaluators on important capabilities such as instruction
following, creative writing, and safety. Your task is to summarize an abstract into one sentence.
Avoid technical jargon and explain it in the simplest of words.
Abstract: Antibiotics are a type of medication used to treat bacterial infections. They work by either killing the

Gemini Multimodal Reasoning Capabilities

bacteria or preventing them from reproducing, allowing the body’s immune system to fight off the infection.
Antibiotics are usually taken orally in the form of pills, capsules, or liquid solutions, or sometimes administered
intravenously. They are not effective against viral infections, and using them inappropriately can lead to antibiotic
resistance.

Gemini is trained natively multimodal and exhibits the ability to combine capabilities across modalities with the reasoning
capabilities of the language model. Capabilities include but not limited to information extraction from tables, charts, and figures. Gemini Pro Output:
Other interesting capabilities include discerning fine-grained details from inputs, aggregating context across space and time, and
combining information across different modalities. Antibiotics are medicines used to kill or stop the growth of bacteria causing infections, but they don't work against
viruses.
Gemini consistently outperforms existing approaches across image understanding tasks such as high-level object recognition, fine-
grained transcription, chart understanding, and multimodal reasoning. Some of the image understanding and generation Here is the screenshot of how the task and model response (highlighted) looks inside Google AI Studio.

Your task is to extract model names from machine learning paper abstracts. Your response is an array of the model
names in the format [\"model_name\"]. If you don't find model names in the abstract or you are not sure, return
[\"NA\"]
Abstract: Large Language Models (LLMs), such as ChatGPT and GPT-4, have revolutionized natural language processing
research and demonstrated potential in Artificial General Intelligence (AGI). However, the expensive training and
deployment of LLMs present challenges to transparent and open academic research. To address these issues, this project
open-sources the Chinese LLaMA and Alpaca…

Gemini Pro Output:

[\"LLMs\", \"ChatGPT\", \"GPT-4\", \"Chinese LLaMA\", \"Alpaca\"]

Visual Question Answering

Visual question answering involves asking the model questions about an image passed as input. The Gemini models show different
multimodal reasoning capabilities for image understanding over charts, natural images, memes, and many other types of images. In
the example below, we provide the model (Gemini Pro Vision accessed via Google AI Studio) a text instruction and an image which
represents a snapshot of this prompt engineering guide.
Information Extraction The model responds "The title of the website is "Prompt Engineering Guide"." which seems like the correct answer based on the
question given.
Here is another example of a task that analyzes a piece of text and extracts the desired information. Keep in mind that this is using
zero-shot prompting so the result is not perfect but the model is performing relatively well.
Prompt:
Here is another example with a different input question. Google AI Studio allows you to test with different inputs by click on the
{{}} Test input option above. You can then add the prompts you are testing in the table below.

Feel free to experiment by uploading your own image and asking questions. It's reported that Gemini Ultra can do a lot better at
these types of tasks. This is something we will experiment more with when the model is made available.

Verifying and Correcting

Gemini models display impressive crossmodal reasoning capabilities. For instance, the figure below demonstrates a solution to a
physics problem drawn by a teacher (left). Gemini is then prompted to reason about the question and explain where the student
went wrong in the solution if they did so. The model is also instructed to solve the problem and use LaTeX for the math parts. The
response (right) is the solution provided by the model which explains the problem and solution with details.
Rearranging Figures
Below is another interesting example from the technical report showing Gemini's multimodal reasoning capabilities to generate
matplotlib code for rearranging subplots. The multimodal prompt is shown on the top left, the generated code on the right, and the
rendered code on the bottom left. The model is leveraging several capabilities to solve the task such as recognition, code
generation, abstract reasoning on subplot location, and instruction following to rearrange the subplots in their desired positions.

Video Understanding
Gemini Ultra achieves state-of-the-art results on various few-shot video captioning tasks and zero-shot video question answering.
The example below shows that the model is provided a video and text instruction as input. It can analyze the video and reason
about the situation to provide an appropriate answer or in this case recommendations on how the person could improve their
technique.
Image Understanding Modality Combination
Gemini Ultra can also take few-shot prompts and generate images. For example, as shown in the example below, it can be The Gemini models also show the ability to process a sequence of audio and images natively. From the example, you can observe
prompted with one example of interleaved image and text where the user provides information about two colors and image that the model can be prompted with a sequence of audio and images. The model is able to then send back a text response that's
suggestions. The model then take the final instruction in the prompt and then respond with the colors it sees together with some taking the context of each interaction.
ideas.

Gemini is also used to build a generalist agent called AlphaCode 2 that combines it's reasoning capabilities with search and tool-
use to solve competitive programming problems. AlphaCode 2 ranks within the top 15% of entrants on the Codeforces competitive
programming platform.

Few-Shot Prompting with Gemini

Few-shot prompting is a prompting approach which is useful to indicate to the model the kind of output that you want. This is useful
for various scenarios such as when you want the output in a specific format (e.g., JSON object) or style. Google AI Studio also
enables this in the interface. Below is an example of how to use few-shot prompting with the Gemini models.
We are interested in building a simple emotion classifier using Gemini. The first step is to create a "Structured prompt" by clicking
on "Create new" or "+". The few-shot prompt will combine your instructions (describing the task) and examples you have provided.
The figure below shows the instruction (top) and examples we are passing to the model. You can set the INPUT text and OUTPUT
text to have more descriptive indicators. The example below is using "Text:" as input and "Emotion:" as the input and output
indicators, respectively.

Gemini Generalist Coding Agent

Emotion: joy
Text: I am actually feeling good today.
Emotion:

You can then test the prompt by adding inputs to under the "Test your prompt" section. We are using the "I am actually feeling
good today." example as input and the model correctly outputs the "joy" label after clicking on "Run". See the example in the figure
below:

The entire combined prompt is the following:

Your task is to classify a piece of text, delimited by triple backticks, into the following emotion labels: ["anger",
"fear", "joy", "love", "sadness", "surprise"]. Just output the label as a lowercase string.
Text: I feel very angry today
Emotion: anger
Text: Feeling thrilled by the good news today.

Below is a simple example that demonstrates how to prompt the Gemini Pro model using the Gemini API. You need install the
google-generativeai library and obtain an API Key from Google AI Studio. The example below is the code to run the same

information extraction task used in the sections above.

"""
At the command line, only need to run once to install the package via pip:

$ pip install google-generativeai

"""

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")

# Set up the model

generation_config = {
"temperature": 0,
"top_p": 1,
"top_k": 1,
"max_output_tokens": 2048,
}

safety_settings = [
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_HATE_SPEECH",

Library Usage },
{
"threshold": "BLOCK_MEDIUM_AND_ABOVE"

"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE" Introducing Gemini: our largest and most capable AI model
},
{ How it’s Made: Interacting with Gemini through multimodal prompting
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
Welcome to the Gemini era
} Prompt design strategies
]
Gemini: A Family of Highly Capable Multimodal Models - Technical Report
model = genai.GenerativeModel(model_name="gemini-pro", Fast Transformer Decoding: One Write-Head is All You Need
generation_config=generation_config,
safety_settings=safety_settings) Google AI Studio quickstart
prompt_parts = [
Multimodal Prompts
"Your task is to extract model names from machine learning paper abstracts. Your response is an array of the model Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases
names in the format [\\\"model_name\\\"]. If you don't find model names in the abstract or you are not sure, return
[\\\"NA\\\"]\n\nAbstract: Large Language Models (LLMs), such as ChatGPT and GPT-4, have revolutionized natural
A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise
language processing research and demonstrated potential in Artificial General Intelligence (AGI). However, the
expensive training and deployment of LLMs present challenges to transparent and open academic research. To address
these issues, this project open-sources the Chinese LLaMA and Alpaca…", Last updated on April 24, 2025
]

response = model.generate_content(prompt_parts)
print(response.text)

[\"LLMs\", \"ChatGPT\", \"GPT-4\", \"Chinese LLaMA\", \"Alpaca\"]

References

Google Gemini
No ratings yet
Google Gemini
17 pages
Gemini 1 Report
No ratings yet
Gemini 1 Report
90 pages
Ojo Seminar Paper On Gemini
No ratings yet
Ojo Seminar Paper On Gemini
14 pages
Gemini
No ratings yet
Gemini
26 pages
AI Experts: Gemini 1.5 Insights
No ratings yet
AI Experts: Gemini 1.5 Insights
25 pages
ChatGPT Vs Gemini
No ratings yet
ChatGPT Vs Gemini
75 pages
Mastermind Workbook
100% (1)
Mastermind Workbook
6 pages
The Litigation Paralegal A Systems Approach 6th Edition James W. H. Mccord PDF Download
No ratings yet
The Litigation Paralegal A Systems Approach 6th Edition James W. H. Mccord PDF Download
52 pages
Module 13 or 14 Leveraging AI With Notebook LM SLIDES
No ratings yet
Module 13 or 14 Leveraging AI With Notebook LM SLIDES
14 pages
ChatGPT Design Cheat Sheet
No ratings yet
ChatGPT Design Cheat Sheet
1 page
How Anthropic Teams Use Claude Code v2
No ratings yet
How Anthropic Teams Use Claude Code v2
23 pages
1.2 AI Research Report Template
No ratings yet
1.2 AI Research Report Template
129 pages
UiPath Automation Implementation Methodology
No ratings yet
UiPath Automation Implementation Methodology
14 pages
Adobe XMP Enhanced Profiles / Adobe XMP Preset Profiles / Adobe XMP Profiles For ACR v10.3+
100% (1)
Adobe XMP Enhanced Profiles / Adobe XMP Preset Profiles / Adobe XMP Profiles For ACR v10.3+
21 pages
Short Vid Guide
No ratings yet
Short Vid Guide
6 pages
UiPath Brand Guidelines
No ratings yet
UiPath Brand Guidelines
1 page
Vision - OpenAI API
No ratings yet
Vision - OpenAI API
8 pages
N8N Email Automation AI Agent
No ratings yet
N8N Email Automation AI Agent
4 pages
GitHub Trainings
No ratings yet
GitHub Trainings
5 pages
Effective Prompt Engineering With GitHub Copilot
No ratings yet
Effective Prompt Engineering With GitHub Copilot
2 pages
After Effects Expressions
No ratings yet
After Effects Expressions
9 pages
Steps To Build AI Agent
No ratings yet
Steps To Build AI Agent
8 pages
Advanced AI Prompt Toolkit
No ratings yet
Advanced AI Prompt Toolkit
88 pages
Understanding and Using ChatGPT
No ratings yet
Understanding and Using ChatGPT
5 pages
Mastering Generative AI
100% (1)
Mastering Generative AI
4 pages
Gws Gemini Advanced One Ebook
No ratings yet
Gws Gemini Advanced One Ebook
46 pages
30-Day Prompt Engineering Learning Plan
No ratings yet
30-Day Prompt Engineering Learning Plan
6 pages
Gen AI Course Content
No ratings yet
Gen AI Course Content
6 pages
Gemini For Google Workspace Guided Evaluation EN
No ratings yet
Gemini For Google Workspace Guided Evaluation EN
1 page
Ai As A Search Engine 1
No ratings yet
Ai As A Search Engine 1
10 pages
A Beginner's Guide To Natural Language Processing - IBM Developer
No ratings yet
A Beginner's Guide To Natural Language Processing - IBM Developer
9 pages
Canon DPP RAW Image Editing Guide
No ratings yet
Canon DPP RAW Image Editing Guide
152 pages
Understanding GitHub Copilot and AI Coding Assistants
No ratings yet
Understanding GitHub Copilot and AI Coding Assistants
5 pages
The Complete Guide To Prompt Engineering....
No ratings yet
The Complete Guide To Prompt Engineering....
47 pages
AI Developer Tools Evolution
No ratings yet
AI Developer Tools Evolution
10 pages
Critical Thinking
No ratings yet
Critical Thinking
46 pages
2023 Getting Started Windows Security
No ratings yet
2023 Getting Started Windows Security
2,037 pages
LLM Powered Autonomous Agents - Lil'Log
No ratings yet
LLM Powered Autonomous Agents - Lil'Log
24 pages
AI Promt Engineering Prelim LAB Exam
No ratings yet
AI Promt Engineering Prelim LAB Exam
19 pages
Google Prompting Essentials Key Takeaways
100% (1)
Google Prompting Essentials Key Takeaways
1 page
AI+ Prompt Engineer Level 1 Detailed Curriculum
No ratings yet
AI+ Prompt Engineer Level 1 Detailed Curriculum
10 pages
Step by Step Guide To Using ChatGPT For Business Professional Clean
No ratings yet
Step by Step Guide To Using ChatGPT For Business Professional Clean
5 pages
The Six Best PDF Generator APIs - PSPDFKit
No ratings yet
The Six Best PDF Generator APIs - PSPDFKit
22 pages
AI Copywriting Tools
No ratings yet
AI Copywriting Tools
15 pages
Writing Prompts Chatgpt3.5
No ratings yet
Writing Prompts Chatgpt3.5
12 pages
ChatGPT LLM Website and AI Python Guide
No ratings yet
ChatGPT LLM Website and AI Python Guide
3 pages
Day 1 AI Tools
No ratings yet
Day 1 AI Tools
3 pages
Embeddings
No ratings yet
Embeddings
5 pages
ChatGPT Masterclass
No ratings yet
ChatGPT Masterclass
21 pages
Finxter Prompting OpenAI-2
No ratings yet
Finxter Prompting OpenAI-2
1 page
USCIS 2025 Civics Test Study Guide
No ratings yet
USCIS 2025 Civics Test Study Guide
88 pages
Final PDF
No ratings yet
Final PDF
279 pages
IO - Collab - MS Teams Support Guide
No ratings yet
IO - Collab - MS Teams Support Guide
18 pages
Understanding Large Language Models Learning Their Underlying Concepts and Technologies (Thimira Amaratunga) (Z-Library)
No ratings yet
Understanding Large Language Models Learning Their Underlying Concepts and Technologies (Thimira Amaratunga) (Z-Library)
145 pages
Artificial Analysis State of AI Q1 2025 Highlights Report
No ratings yet
Artificial Analysis State of AI Q1 2025 Highlights Report
29 pages
Ebook Prompt.v1
100% (1)
Ebook Prompt.v1
43 pages
Difference Between Mobile Apps and Web Apps
100% (1)
Difference Between Mobile Apps and Web Apps
10 pages
Rate Limits - OpenAI API 3
No ratings yet
Rate Limits - OpenAI API 3
6 pages
Stable-Difussion-Guia-2023 (1) .Es - en
No ratings yet
Stable-Difussion-Guia-2023 (1) .Es - en
24 pages
Gemini
No ratings yet
Gemini
62 pages
The Art of Parenting Docling
100% (1)
The Art of Parenting Docling
231 pages
FoodSkills FoodSafetyatHome
No ratings yet
FoodSkills FoodSafetyatHome
8 pages
Brexhq - Prompt-Engineering - Tips and Tricks For Working With Large Language Models Like OpenAI's GPT-4
No ratings yet
Brexhq - Prompt-Engineering - Tips and Tricks For Working With Large Language Models Like OpenAI's GPT-4
12 pages
GPT-4.1 Prompting Guide - OpenAI Cookbook
No ratings yet
GPT-4.1 Prompting Guide - OpenAI Cookbook
10 pages
Choosing The Right AI Model For Your Task - GitHub Docs
No ratings yet
Choosing The Right AI Model For Your Task - GitHub Docs
7 pages
Prompt Engineering Overview - Anthropic
No ratings yet
Prompt Engineering Overview - Anthropic
1 page
Workplace Safety Essentials
No ratings yet
Workplace Safety Essentials
60 pages
Service Manual: Diva Avr200 Surround Sound Receiver
No ratings yet
Service Manual: Diva Avr200 Surround Sound Receiver
61 pages
Resonance III B.A B.S.W B.A (Music)
No ratings yet
Resonance III B.A B.S.W B.A (Music)
104 pages
Falkner-Collapse Strength and Design of Submarine by D. Falkner
No ratings yet
Falkner-Collapse Strength and Design of Submarine by D. Falkner
16 pages
Cisco UCS B200 M4 Blade Server: Spec Sheet
No ratings yet
Cisco UCS B200 M4 Blade Server: Spec Sheet
64 pages
7 Benefits of A Balanced Scorecard
No ratings yet
7 Benefits of A Balanced Scorecard
3 pages
Chapter Fourteen: Multiple Regression and Correlation Analysis
No ratings yet
Chapter Fourteen: Multiple Regression and Correlation Analysis
27 pages
Networking Essentials - Network Types
No ratings yet
Networking Essentials - Network Types
1 page
Article Summary Worksheet: Upholding Policies
No ratings yet
Article Summary Worksheet: Upholding Policies
2 pages
D&D Monster: Awakened Snowman
No ratings yet
D&D Monster: Awakened Snowman
1 page
Day 2 - Reading - Adventure Sports
No ratings yet
Day 2 - Reading - Adventure Sports
3 pages
GHG Protocol Agricultural Guidance (April 26) - 0
No ratings yet
GHG Protocol Agricultural Guidance (April 26) - 0
103 pages
Case Study For Oligohydramnios
100% (1)
Case Study For Oligohydramnios
8 pages
Contact Details Updation Form
No ratings yet
Contact Details Updation Form
1 page
4QM in Procurement
No ratings yet
4QM in Procurement
3 pages
施耐德SD328变频器说明书
No ratings yet
施耐德SD328变频器说明书
11 pages
Montiel 21032718 Sentences
No ratings yet
Montiel 21032718 Sentences
2 pages
Quartal Jazz Piano Voicings PDF
0% (2)
Quartal Jazz Piano Voicings PDF
2 pages
Science 8 Summative Test
No ratings yet
Science 8 Summative Test
3 pages
Unit 8 - HS Part 1
No ratings yet
Unit 8 - HS Part 1
7 pages
Teacher Level 2
No ratings yet
Teacher Level 2
6 pages
The Stony Brook Press - Volume 26, Issue 10
No ratings yet
The Stony Brook Press - Volume 26, Issue 10
48 pages
Calculation of No. of Bolts For Individual Members Bolt Diameter
No ratings yet
Calculation of No. of Bolts For Individual Members Bolt Diameter
1 page
CHHINDWARA
No ratings yet
CHHINDWARA
4 pages
Mechatronics: Pınar Boyraz, Mutlu Gündüz
No ratings yet
Mechatronics: Pınar Boyraz, Mutlu Gündüz
13 pages
DVB-T Modulator IP Core Specification
No ratings yet
DVB-T Modulator IP Core Specification
10 pages
CN 17 en
No ratings yet
CN 17 en
2 pages
Skull and Bones Chart
No ratings yet
Skull and Bones Chart
63 pages
3865022734PL ImpactAbsorption
No ratings yet
3865022734PL ImpactAbsorption
8 pages
MBA With Digital Marketing-UWS-London
No ratings yet
MBA With Digital Marketing-UWS-London
8 pages