[go: up one dir, main page]

100% found this document useful (1 vote)
241 views20 pages

Reflexive Prompt Engineering

The document presents a framework for responsible prompt engineering in generative AI, emphasizing the importance of ethical, legal, and social considerations in AI interactions. It outlines five key components: prompt design, system selection, system configuration, performance evaluation, and prompt management, advocating for a balance between technical precision and ethical consciousness. The article highlights the need for accountability in prompt engineering practices and proposes guidelines for enhancing responsible AI deployment.

Uploaded by

ehsan255
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
241 views20 pages

Reflexive Prompt Engineering

The document presents a framework for responsible prompt engineering in generative AI, emphasizing the importance of ethical, legal, and social considerations in AI interactions. It outlines five key components: prompt design, system selection, system configuration, performance evaluation, and prompt management, advocating for a balance between technical precision and ethical consciousness. The article highlights the need for accountability in prompt engineering practices and proposes guidelines for enhancing responsible AI deployment.

Uploaded by

ehsan255
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Reflexive Prompt Engineering

A Framework for Responsible Prompt Engineering and Interaction Design

Christian Djeffal*
Technical University of Munich, Germany, christian.djeffal@tum.de

Responsible prompt engineering has emerged as a critical framework for ensuring that generative artificial intelligence (AI) systems serve
society's needs while minimizing potential harms. As generative AI applications become increasingly powerful and ubiquitous, the way
we instruct and interact with them through prompts has profound implications for fairness, accountability, and transparency. This article
examines how strategic prompt engineering can embed ethical and legal considerations and societal values directly into AI interactions,
moving beyond mere technical optimization for functionality.

This article proposes a comprehensive framework for responsible prompt engineering that encompasses five interconnected components:
prompt design, system selection, system configuration, performance evaluation, and prompt management. Drawing from empirical
evidence, the paper demonstrates how each component can be leveraged to promote improved societal outcomes while mitigating potential
risks. The analysis reveals that effective prompt engineering requires a delicate balance between technical precision and ethical
consciousness, combining the systematic rigor and focus on functionality with the nuanced understanding of social impact.

Through examination of real-world and emerging practices, the article illustrates how responsible prompt engineering serves as a crucial
bridge between AI development and deployment, enabling organizations to fine-tune AI outputs without modifying underlying model
architectures. This approach aligns with broader "Responsibility by Design" principles, embedding ethical considerations directly into the
implementation process rather than treating them as post-hoc additions. The article concludes by identifying key research directions and
practical guidelines for advancing the field of responsible prompt engineering.

CCS CONCEPTS • Codes of ethics • Interaction techniques • Computing literacy

Additional Keywords and Phrases: Prompt Engineering, Responsible AI, AI Ethics, Human-AI Interaction, AI Governance,
Accountability, Transparency

* Professor for Law, Science and Technology at Technical University Munich. © Christian Djeffal 2025. This article has been accepted for the the ACM
FAccT Conference on Fairness, Accountability, and Transparency. It is the author's version of the work. It is posted here for your personal use. Not for
redistribution. The definitive Version of Record was published in the forthcoming proceedings of the ACM FAccT.
1 INTRODUCTION

1.1 Who is accountable for generative AI?


The rapid advancement of generative AI technologies has ushered in an era of unprecedented capabilities, but also
mounting concerns about their responsible deployment [1–4]. While these technologies offer remarkable opportunities for
innovation, recent incidents have highlighted the complex challenges of ensuring their responsible use. A striking example
emerged in early 2024 when Google's Gemini AI image generator produced historically inaccurate representations,
generating images that misrepresented historical figures and events in an apparent overcorrection for diversity and
inclusion [5, 6]. This incident, which led to the immediate suspension of the system's people-generation capabilities and a
public acknowledgment of failure by Google's leadership[7, 8], serves as a powerful illustration of how even well-
intentioned AI implementations can go awry without proper oversight and responsible use practices. One key element in
that regard is to focus on the right instances and actors concerning responsibility.
As the discourse around AI safety and ethics intensifies, there is growing recognition that accountability must extend
beyond the technical architects of these systems [9–12]. While considerable attention has been paid to the responsibilities
of AI developers and companies, a critical gap exists in our understanding of how deployers - particularly those engaging
in prompt engineering - can contribute to responsible AI deployment. Prompt engineering, the practice of crafting and
refining inputs to generate desired outputs from AI systems, has emerged as a crucial interface between human intent and
AI capability. However, despite its significance, there remains a notable absence of structured frameworks to guide
responsible prompt engineering practices. In this absence of such a framework, it is hard to understand, evaluate, and
compare the many contributions to responsible prompt engineering that have been made in academia and in practice. This
paper addresses this gap through a comprehensive narrative review, examining how prompt engineering can be approached
responsibly to mitigate risks and enhance the beneficial deployment of generative AI technologies. By analyzing existing
practices, incidents, and emerging guidelines, the goal is to develop a framework that organizes and structures the various
aspects of responsible prompt engineering and allows for an assessment of the current state of the art. This will hopefully
contribute to a foundation that empowers users to engage with these powerful tools in ways that promote fairness,
accountability, and transparency.

1.2 Research question and methodology


This article examines how organizations can systematically implement and evaluate responsible prompt engineering
practices through an integrated framework that addresses technical, legal, ethical, and social considerations. The
investigation focuses on three interconnected dimensions. First, the analysis examines the essential components of prompt
engineering practice, exploring the dimensions deployers can engage in when crafting the systems output. Second, the
research explores how existing responsible prompt engineering practices enhance implementation across different
organizational contexts. Third, the analysis identifies critical gaps between current prompt engineering practices and
responsible AI principles, while highlighting emerging opportunities for enhancing responsibility in AI deployment. This
examination reveals how and to what extent responsible prompt engineering can serve as a crucial bridge between AI
development and deployment, enabling organizations to fine-tune AI outputs without modifying underlying model
architectures.
This narrative review examines responsible prompt engineering practices through a systematic analysis of academic
literature, technical documentation, and practitioner insights. The rapidly evolving nature of prompt engineering and its
emerging responsible practices necessitated a flexible yet rigorous approach to synthesize current knowledge and identify

2
conceptual frameworks [13–15]. The literature search encompassed multiple academic databases including arXiv, IEEE
Xplore, and ACM Digital Library, complemented by targeted searches on Google Scholar, DuckDuckGo, and Semantic
Scholar. I employed various combinations of search terms centered around “responsible,” “ethical,” and “legal” in
conjunction with „prompt“ as well as “prompt engineering” and “prompt design”. The review covered publications from
2019 through early 2025, focusing exclusively on English-language materials. The inclusion criteria prioritized sources
that contributed to understanding prompt engineering fundamentals and responsible practices. I extracted and processed
information using Citavi reference management software, employing thematic analysis to identify recurring concepts and
emerging patterns. This approach allowed me to develop a comprehensive framework organizing prompt engineering into
five key components: design, selection, configuration, evaluation, and management. The analysis revealed an evolving
scope, particularly regarding evaluation methods and system configuration aspects. The framework emerged iteratively
through careful examination of how different sources conceptualized and approached responsible prompt engineering
practices. When encountering conflicting findings or approaches, I incorporated them into the framework while noting
their complementary nature, as various prompt engineering techniques can often be combined effectively. The following
aspects characterize the author’s position concerning this research question. {ANONYMIZED}

2 THE CONCEPT OF RESPONSIBLE PROMPT ENGINEERING


Before delving into the analysis of responsible prompt engineering practices, we must establish two essential
foundations. First, we need a clear and concise definition of responsible prompt engineering and its core components.
Second, we must examine prompt engineering's dual significance: both as a critical element in AI development and as a
framework for responsible design principles.

2.1 Definition
A prompt serves as an instruction to a generative AI model, directing the model to produce specific outputs [16–18].
These prompts can take various forms, including text, images, video, or audio inputs, reflecting the multimodal capabilities
of contemporary AI systems [19, 20]. Modern generative AI models, primarily built on transformer architectures, excel at
processing and producing diverse and complex content across these modalities. These models utilize attention mechanisms
that enable them to selectively focus on and weigh the most relevant parts of input data while processing information,
similar to human cognitive processes.
Prompt engineering is more than working on instructions to generative AI. A review of the literature and guides on
prompt engineering shows that it encompasses a comprehensive approach to optimizing interactions with generative AI
systems. through five essential components: First, prompt design focuses on systematically crafting instructions to
maximize desired outputs [20, 21]. This involves developing specific templates, techniques and design patterns ranging
from preconfigured prompting structures to specific techniques that can be applied in various circumstances like chain-of-
thought reasoning, to certain patterns like “Let’s think step-by-step”. The goal is to craft prompts that effectively
communicate intended tasks to the AI system. Second, system selection requires strategic decisions about which AI models
to employ based on their documented capabilities [22, 23]. This selection process can rely on established benchmarks such
as FrontierMath for mathematical reasoning or MMLU for general knowledge, as well as user-based comparisons displayed
on various leaderboards. Third, system configuration involves adapting model parameters to optimize performance for
specific use cases. This includes adjusting settings such as temperature parameters, which control the balance between
predictability and creativity in outputs. Lower temperature values produce more conservative, consistent responses, while
higher values generate more diverse and creative outputs. Fourth, performance evaluation encompasses systematic

3
assessment of prompt effectiveness against predetermined evaluation criteria. This includes analyzing output quality,
consistency, and alignment with intended objectives through both automated metrics and human-in-the-loop assessment
protocols[24, 25]. Fifth, prompt management involves implementing systematic approaches to organizing, tracking, and
improving prompts over time. This includes implementing version control systems for prompts, maintaining detailed
records of configuration settings, and tracking performance outcomes [26]. Proper documentation enables knowledge
sharing, facilitates continuous improvement, and supports accountability in prompt engineering practices. Documentation
can establish protocols for various aspects of prompt engineering, including standardized formats for recording prompt
versions, test results, and modification histories. The five components of prompt engineering can be summarized as
follows.

• Goal: Craft and improve instruction


1. Prompt Design • Example: Include chain of thought

2. Performance • Goal: Assess quality


Evaluation • Example: Define criteria and rate output

3. System • Goals: Adapt settings


Configuration • Example: Temperature, TopP

4. Model and Agent • Goal: Choose model and agent


Selection • Example: Choose open source LLM

5. Prompt • Goal: Organize and manage


Management • Example: Documentation and version tracking
Figure 1 Components of Prompt Engineering

Prompt engineering could be conceptualized as an art and a science. It embodies a unique duality, combining creativity
with rigorous methodology [19]. This hybrid nature reflects both the complexity of human-AI interaction and the emerging
maturity of the field. As an art form, prompt engineering requires creative intuition and craftmanship. The creative
dimension manifests in the nuanced understanding of language, context, and model behavior that experienced prompt
engineers develop over time [27]. This artistic aspect becomes evident in the subtle choices of wording, tone, and structure
that can dramatically influence model outputs. Like skilled writers, prompt engineers develop an intuitive feel for how to
frame instructions effectively, often drawing on metaphorical thinking and creative problem-solving to overcome model
limitations. This creative dimension becomes particularly crucial when dealing with edge cases or novel applications where
established approaches prove insufficient. However, prompt engineering increasingly embraces scientific rigor through
systematic experimentation and empirical validation [28]. Structured experiments allow prompt engineers to test

4
hypotheses about prompt effectiveness across different contexts and tasks. These experiments take place in controlled
testing environments where researchers systematically vary factors such as prompt structure, length, and complexity while
maintaining constant conditions for other variables. Through quantitative metrics, researchers measure output quality,
examining dimensions such as accuracy, relevance, and consistency across different prompting strategies.
Responsible prompt engineering transforms these technical practices by integrating ethical, legal, and social
considerations into the prompt design process [29–31]. This approach moves beyond purely functional optimization to
address broader societal implications and ethical concerns. While traditional prompt engineering might focus solely on
performance metrics, responsible practices examine the wider implications of AI system deployment. It seeks proportionate
outcomes that mitigate between functionality, efficiency, and ethical, legal, and social concerns. The methodology of
responsible prompt engineering adapts technical strategies to serve objectives of responsibility, considering dimensions
such as fairness, accountability, and transparency. This might involve modifying prompts to prevent discriminatory
outcomes, implementing additional validation steps to ensure accessibility, or designing prompts that actively promote
inclusive representation [32–35]. Responsible prompt engineering can aim at various standards from minimum
requirements of responsibility to a proactive realization of specific values through generative AI. Therefore, it speaks to
inherent risk and limitations of generative AI as well as to inherent potentials to realize ethical, legal and social values.
While this contribution focusses on responsible practices in the use, it is acknowledged that critical examinations of flaws,
limitations, and shortcomings of systems are also necessary and often part of responsible prompt engineering practices.
Such critical exercises are necessary [36]. It is also acknowledged that prompt engineering cannot make up for every flaw
or risk inherent in generative AI. The task of responsible prompt engineering is in fact to raise awareness of deployers of
generative AI and to test how far prompt engineering in all its aspect can help to improve and mitigate shortcomings and
realize potentials of systems.

2.2 Relevance
Prompt engineering can play a pivotal role in ensuring responsible deployment of generative AI systems by addressing
fundamental questions of accountability and control. The current technological landscape, characterized by large language
models with bounded capabilities, positions prompt engineering as a critical interface between human intent and machine
output. Although this dynamic may evolve as AI systems develop greater restrictions of what users can do in the context
of agentic AI [37, 38] or more autonomy through artificial general intelligence [39], the present architecture of generative
AI systems makes prompt engineering particularly significant for three key reasons that will be explored in detail.
First, versatile nature of generative AI systems enables prompt engineers to produce an extensive range of outputs [40,
41], exemplified by their ability to generate sophisticated code without requiring advanced programming expertise. This
dual-edged capability simultaneously democratizes access to powerful tools while raising significant e.g. cybersecurity
concerns[42, 43]. The EU AI Act [44] formally recognizes this broad applicability, defining general-purpose AI systems
as those possessing "significant generality" and capable of performing diverse tasks across various applications and
downstream systems, as outlined in Article 3 Nr. 63. This expansive capability creates distinct accountability challenges,
particularly regarding system deployment and use. The EU AI Act acknowledges this complexity by establishing separate
regulatory frameworks for general-purpose AI and generally introducing a dual accountability structure that encompasses
both providers (developers) and deployers (users). This stresses the importance of holding also deployers accountable. This
distinction is crucial because deployers, who may lack the technical expertise of developers [45], require clear guidance
for responsible system use. The limited ability of providers to anticipate all possible applications of general-purpose AI
systems further emphasizes the importance of responsible prompt engineering as a framework for ethical deployment.

5
Second, prompt engineering occupies a unique position in the AI development cycle, bridging the gap between model
development and practical deployment [18]. Unlike traditional approaches such as model retraining or fine-tuning, which
require substantial computational resources and technical expertise, prompt engineering offers a more accessible and
sometimes efficient method to influence AI behavior. For instance, in healthcare applications, medical professionals can
adapt language models to specific diagnostic contexts without requiring deep machine learning expertise or expensive
computational infrastructure [46, 47]. The strategic value of prompt engineering lies in its ability to achieve sophisticated
model adaptations through non-invasive means, preserving the underlying architecture while enabling significant
improvements in output quality. This approach facilitates rapid iteration, cross-domain knowledge transfer, and the
development of reusable patterns that can shape future AI applications across diverse fields. However, this powerful
position comes with both opportunities and responsibilities. While prompt engineering enables practitioners to mitigate
risks and ensure ethical, legal, and socially responsible AI outputs, it can also potentially circumvent built-in model
safeguards through techniques of prompt hacking [48–50]. This dual nature underscores the critical importance of treating
prompt engineering as a key component requiring systematic responsibility consideration and responsible implementation
practices, particularly as the patterns and templates developed today will significantly influence future AI applications
across global contexts.
Third, prompt engineering plays a pivotal role in shaping how generative AI systems interact with the world, making it
a crucial leverage point for implementing responsible AI design principles. The concept of "Responsibility by Design"
provides a comprehensive framework for embedding ethical considerations directly into technical systems during their
development, rather than treating them as post-deployment considerations [51, 52]. This approach transforms how we think
about AI responsibility, moving beyond simple compliance checkboxes toward creating inherently ethical and socially
beneficial systems. In the context of prompt engineering, Responsibility by Design manifests through three key
mechanisms [52]. First, it requires anticipatory governance - systematically identifying and addressing potential risks and
ethical challenges before they emerge [53, 54]. For instance, prompt designers must consider how their instructions might
be misused or produce unintended consequences across different cultural contexts. Second, it demands inclusive
stakeholder engagement, ensuring that diverse voices and perspectives inform prompt design decisions [55, 56]. This might
involve consulting with affected communities, subject matter experts, and end-users to understand potential impacts and
necessary safeguards. Third, it requires building in responsive adaptation capabilities, allowing prompting practices to
evolve as new ethical challenges emerge. The practical implementation of these principles requires fundamental changes
in how organizations approach prompt engineering. Rather than treating design considerations as constraints, they become
core design criteria that shape how prompts are constructed, tested, and deployed. This involves developing new workflows
that integrate responsibility assessment tools, establishing feedback mechanisms for continuous improvement, and creating
comprehensive training programs that emphasize both technical excellence and ethical awareness [57, 58].

3 RESPONSIBLE PRACTICES
After examining the foundations and significance of responsible prompt engineering, we now turn to its practical
implementation across the five key categories previously established. While existing literature offers valuable insights into
specific aspects, the proposed framework allows to systematically organize best practices and to reveal gaps for further
research and experimentation.

6
3.1 Prompt design: techniques
Prompt design represents the core practice commonly associated with prompt engineering - the systematic crafting of
inputs to optimize generative AI system performance. While traditional approaches focus primarily on enhancing output
quality and reliability, responsible prompt engineering necessitates a more nuanced evaluation of these techniques through
an ethical and accountability lens. This expanded perspective does not diminish the effectiveness of established methods
but rather enriches them through critical reflection and purposeful adaptation.
To illustrate how conventional techniques can be modified to align with responsible engineering principles, we examine
two widely adopted and empirically validated approaches: exemplar-based prompting and chain-of-thought methodology.
These cases are particularly instructive as they highlight the intersection between technical efficacy and ethical
considerations, demonstrating how responsibility frameworks can enhance rather than constrain engineering practices.
These examples, however, also highlight the need to systematically review all prompt engineering techniques for their
potentials, but also for specific risks.

3.1.1 Examples
Examples in prompt engineering leverage the unique capability of large language models to perform in-context learning
- a process fundamentally different from traditional model training. While traditional training involves updating model
parameters through backpropagation, in-context learning occurs entirely during the model's forward pass, using only its
existing parameters to identify and apply patterns from provided examples. When examples are included in a prompt, the
model performs a form of Bayesian inference to recognize relevant concepts from its pre-training and creates temporary
task-specific representations without any permanent parameter changes [59]. This remarkable ability allows users to guide
model behavior simply by demonstrating desired input-output relationships in the prompt, making complex AI capabilities
accessible without requiring technical expertise in model training. When providing examples, it is most effective to include
pairs of both inputs and their corresponding outputs, allowing the model to understand how to transform given information
into the desired result. Depending on the task, one can also only include desired outputs. The effectiveness of examples
depends on their quality, relevance, and ability to demonstrate the full range of desired variations, as these characteristics
directly influence how the model interprets and applies the demonstrated patterns [60].
This reference to the range of desired variations leads to some of the most prominent issues of uses of AI in society:
equality, fairness and discrimination. Therefore, when giving examples, it is important to think about potential effects of
such prompts on different groups that can be impacted by the prompt. While such evaluations are highly contextual,
practical reviews of prompt engineering have highlighted many practices to improve results. A fundamental approach is
maintaining balanced representation across different demographics in few-shot prompts, which helps improve model
generalization. This includes using diverse examples that cover various aspects of the problem space while avoiding
clustering similar examples together [61]. One important aspect is the random order of examples and the avoidance of
clustering similar examples next to each other in order to avoid biases. Counterfactual data augmentation serves as a
powerful technique, where variations of examples are created by flipping attributes like gender or race to identify group-
specific biases [62]. For instance, when writing job descriptions, comparing outputs with different demographic indicators
can reveal hidden biases - as demonstrated in experiments where identical prompts specifying different universities
(Howard versus Harvard) produced notably different results [63]. Example debiasing can be further enhanced through
anonymization and careful attribute replacement. [64] This involves defining potentially biased data points and replacing
them with neutral alternatives - such as using "person" instead of gender-specific terms [61]. When constructing example
sets, it is crucial to randomize their ordering rather than grouping similar ones together, as clustering can reinforce existing

7
biases. The effectiveness of example-based debiasing is particularly evident in domain-specific applications. For instance,
in recruitment contexts, providing diverse examples of successful candidates across different demographics helps prevent
the model from developing stereotypical associations [65]. Similarly, when generating performance reviews, using
balanced examples that focus on work deliverables rather than personality traits helps mitigate demographic-based bias
[63]. Another large area of concern is copyright. Of course, copyrighted examples cannot be used without
authorization[66].

3.1.2 Chain of thought


Chain of thought prompting represents a significant advancement in how we interact with large language models,
enabling them to tackle complex reasoning tasks by breaking them down into intermediate steps. This technique mirrors
human cognitive processes, allowing models to show their reasoning before reaching a conclusion [67, 68]. The approach
works by encouraging language models to generate a series of logical steps that lead to a final answer, similar to how
humans solve complex problems. When provided with a few examples demonstrating this step-by-step reasoning, large
language models can naturally adopt this problem-solving strategy [67, 68]. For instance, when solving mathematical word
problems, a model using chain of thought prompting can achieve state-of-the-art accuracy, even surpassing specially
trained models. Recent research has demonstrated that chain of thought prompting significantly enhances model
performance across various domains, including arithmetic reasoning, commonsense understanding, and symbolic
manipulation [68, 69].
Chain-of-thought prompting can be strategically enhanced to promote responsible AI development by explicitly
incorporating ethical checkpoints and responsibility considerations into the reasoning workflow. By breaking down
complex decisions into discrete steps, organizations can embed responsibility validation processes that evaluate aspects
like potential biases, fairness implications, and privacy concerns at each stage of the reasoning chain. This can happen
either by including

• specific steps, like "What barriers or assumptions might affect different groups in this reasoning process?" [70],
• general instructions for each step like “Assess potential impacts at each step of the reasoning”
• or final overall evaluations of the result as a separate step like "Plan verification questions to fact-check this draft"
[71]

Chain of Thought prompting can also help to tackle legal and ethical tasks specifically. It significantly improves legal
analysis by mirroring established legal reasoning frameworks like IRAC (Issue, Rule, Application, Conclusion) [72, 73].
By breaking down complex legal questions into discrete analytical steps, lawyers and legal AI systems can systematically
evaluate cases, interpret statutes, and apply precedents with greater precision[72]. Chain of Thought prompting can assist
in ethical decision-making by decomposing complex moral dilemmas into manageable components that can be
systematically evaluated [74]. Such structured approaches help identify potential biases, assess fairness implications, and
consider multiple stakeholder perspectives throughout the reasoning process. By incorporating explicit ethical checkpoints
into the decision-making workflow, organizations can ensure that ethical considerations become an integral part of the
process rather than an afterthought, leading to more responsible and well-reasoned outcomes [75, 76].
Chain-of-thought prompting, where AI models generate step-by-step explanations of their reasoning, was initially
celebrated as a breakthrough in AI transparency[77]. However, critical analysis reveals a concerning disconnect: the
narrative explanations produced by these systems may not accurately reflect their internal decision-making processes [78].
This discrepancy creates what some researchers describe as an "illusion of transparency"[78] - a coherent but potentially

8
misleading representation of the system's actual operations. This misalignment between displayed reasoning and actual
computation raises serious concerns for responsible AI development. There is evidence that models can generate plausible-
sounding explanations even when their internal processes follow entirely different paths[79]. More troublingly, these
explanations can be convincing even when the underlying computation is flawed or based on spurious correlations. This
phenomenon is particularly problematic in high-stakes applications where understanding the true basis of AI decisions is
crucial.

3.2 Prompt design 2: patterns


Prompts are often designed for re-use. Therefore, they could be conceptualized as design patterns [80]. Accordingly,
another responsible prompting activity would be to try to use such patterns or at least part of it in order to make use of
existing example prompts for various situations. However, it is advisable to test and evaluate them thoroughly. There are
several sources to draw from regarding prompts in general [81, 82] or responsible prompts more specifically [83, 84].
System prompts operate behind the scenes, creating a layer of computation that influences output without being directly
visible in the model's responses [85]. This hidden processing can enable non-trivial computations while maintaining a
seamless user experience. While some providers have open sourced their system prompts [86, 87], other system prompts
have allegedly been obtained through techniques of prompt injection and published [88, 89]. All of these prompts can be
analyzed from a perspective of attempts to practice responsible prompt engineering. Other patterns and sources thereof
have been referred to above (3.1.2).

3.3 Evaluation
Prompt engineering evaluation encompasses systematic approaches to assess and refine how effectively prompts guide
AI models to produce desired outputs [25]. This evaluation process has become increasingly critical as large language
models are deployed across various domains, from code generation to content analysis[90, 91]. The evaluation of prompts
requires examining multiple dimensions simultaneously. At its core, the process involves analyzing the accuracy and
reliability of AI-generated responses, while also measuring how well the prompts align with intended tasks and objectives
in other dimensions. This includes assessing both the technical performance metrics and the broader implications of prompt
design [36, 92]. A fundamental aspect of evaluation involves systematic testing with different prompt variations to
understand their effectiveness. This process typically employs both qualitative and quantitative techniques to
comprehensively assess prompts across various stages of development. This involves careful documentation of assessment
methods, criteria, and findings to enable accountability and facilitate continuous improvement [36, 92].
Beyond technical performance, responsible prompt engineering evaluation must consider ethical dimensions and
potential societal impacts. This includes examining prompts for potential biases, assessing privacy implications, and
ensuring compliance with relevant regulatory frameworks. Evaluators must verify that prompts respect privacy boundaries,
maintain data protection standards, and uphold principles of transparency and fairness, and address other weaknesses like
hallucinations [36, 65, 93, 94]. The integration of responsibility considerations into prompt engineering evaluation
necessitates examining how prompts might affect different stakeholder groups and implementing safeguards against
potential harmful applications. This includes assessing how prompts handle sensitive topics or potentially controversial
subjects while maintaining appropriate ethical boundaries [36, 92].
Therefore, the question who gets to evaluate is key in the context of responsible prompt engineering. This could be
either

• the prompt engineer or team itself,

9
• generative AI models
• or third parties like deployers or those affected by generative AI.

Especially when the stakes are high for some people, this might require to include those affected by generative AI in
order to understand their perspectives on potential harmful impacts, but also to include their feedback on how to potentially
improve the prompt. In responsible design, stakeholders and in particular vulnerable groups should also be included
proactively in idea development and design choices. [95–97]
Careful consideration is needed when using models for evaluation purposes. The evaluation of prompts requires
examining multiple dimensions simultaneously, including accuracy, reliability, and alignment with intended objectives.
Current AI systems struggle with providing comprehensive assessments across these dimensions [98]. The challenge is
particularly evident in cases requiring deep contextual understanding and creative reasoning [99]. The key biases identified
in language model evaluation include:

• Position bias - Models tend to favor responses based on their sequential position rather than their actual quality or
content [100]
• Verbosity bias - Models show preference for longer, more detailed responses regardless of the actual content quality
or relevance [101]
• Self-enhancement bias - Models demonstrate a tendency to rate their own outputs more favorably compared to
outputs from other sources [102]

The choice of evaluation methods, including stakeholder involvement where feasible, should be guided by available
resources, application context, and potential risks, with particular attention to cases where automated evaluation might
miss crucial qualitative or ethical considerations.

3.4 System configuration


System configuration in prompt engineering represents a critical aspect of responsible AI deployment, encompassing
various parameters and settings that influence how generative AI models process and respond to inputs. This configuration
process involves careful calibration of multiple technical elements to ensure optimal model performance, yet it also carries
a responsibility dimension. The foundation of system configuration lies in controlling the model's output generation
through key parameters. Temperature control stands as a fundamental configuration element, determining the balance
between creativity and predictability in model responses. Lower temperature settings produce more focused and
deterministic outputs, while higher values increase response variability and creativity[24, 103, 104]. Whenever accuracy
and reliability are important from a standpoint of responsible prompt engineering, respective choices are mandated. A
comprehensive implementation framework for system configuration should incorporate both technical and ethical
considerations. This includes establishing clear guidelines for parameter adjustment, implementing monitoring systems for
performance evaluation, and maintaining documentation of configuration changes. Such frameworks have proven essential
in ensuring consistent and responsible AI deployment across various applications. It is important that such impacts should
also be considered from a perspective of responsibility.

3.5 Model selection


The interrelated processes of model selection forms a foundational pillar in responsible prompt engineering, where
strategic choices about AI model deployment must be guided by comprehensive performance metrics (benchmarks) that
encompass not only technical capabilities but also ethical, societal, and environmental considerations.

10
3.5.1 Model selection
The foundation of responsible prompt engineering begins with understanding model selection. When organizations or
developers choose an AI model, they are essentially selecting a specific AI. This choice will determine how the system
processes and responds to inputs. It is analogous to selecting the right tool for a specific task - just as one would not use a
sledgehammer to hang a picture frame, selecting an inappropriately powerful or insufficiently capable AI model can lead
to suboptimal or potentially harmful outcomes. The technical aspects of model selection extend beyond mere performance
metrics. While processing power and response speed are important considerations, responsible model selection must also
account for the model's ability to handle diverse inputs, its tendency to produce biased outputs, and its overall reliability.
Therefore, the societal implications of model selection ripple far beyond technical specifications like latency or the
respective context window, touching on fundamental aspects of fairness, accessibility, and social justice.
A particularly crucial consideration is the environmental impact of model deployment [105, 106]. Larger language
models, while potentially more capable, require significant computational resources and energy consumption. This
environmental cost must be weighed against the actual benefits provided by more powerful models. In many cases, smaller,
more efficient models might serve the intended purpose while maintaining a more sustainable footprint. This applies also
to other environmental questions of resources like water or waste [105, 107, 108].
The intersection of model selection and prompt engineering also raises important questions about transparency and
accountability. When organizations implement AI systems, they must consider how their model choices affect their ability
to explain decisions, audit processes, and maintain accountability to stakeholders. This becomes particularly relevant in
contexts where AI systems influence important decisions about individuals' lives, such as in healthcare, employment, or
financial services. This is all the more important as the practices of model providers regarding transparency and open
source vary to a large extent [109]. Privacy considerations add another layer of complexity to responsible model selection.
Different applications vary in their ability to protect sensitive information and maintain data confidentiality regarding
training data. The degree of control an organization has over AI models, including the option for on-premises deployment,
can be a decisive factor. The relative importance of these considerations in decision-making processes varies significantly
based on specific use cases and organizational requirements.

3.5.2 Benchmarking
Benchmarks serve as essential tools for evaluating and comparing AI models' performance, helping organizations make
informed decisions about which models best suit their needs. In the context of prompt engineering, benchmarks assess how
well models respond to different types of instructions and their ability to generate accurate, relevant outputs. When
selecting models for prompt engineering applications, practitioners must consider multiple performance dimensions. These
include the model's ability to reason, solve complex problems, and generate natural-sounding content. However, it is crucial
to recognize that small differences in benchmark scores might not translate into significant real-world improvements, and
factors like cost-effectiveness and speed often prove more practical for specific use cases. Also, the nondeterministic nature
of generative AI systems presents unique challenges for benchmarking, as these models may produce different outputs
even with almost identical prompts. This variability necessitates multiple evaluation runs to capture the range of potential
behaviors and ensure consistent performance [110]. Continuous evaluation throughout development helps detect
unintended changes in output and maintain alignment with aspects of responsible AI.
Beyond traditional performance metrics, there is growing recognition of the importance of benchmarking ethical, legal,
and social aspects of AI systems. These frameworks evaluate models across multiple dimensions, including fairness, bias
mitigation, and social impact. For instance, discrimination benchmarks assess how AI systems might affect different

11
demographic groups, examining both direct and indirect forms of bias [111]. These evaluations help ensure that prompt
engineering practices do not perpetuate or amplify existing societal biases. Environmental sustainability has emerged as a
critical benchmark dimension for responsible AI development. The life cycle assessment of AI models encompasses data
collection, experimentation, training, and deployment phases, each contributing to the overall carbon footprint [112].
International initiatives are increasingly incorporating sustainability metrics into their AI evaluation frameworks [112].
From a prompt engineering perspective, such benchmarks can give first indications regarding sensitive issues. Especially
in the case of environmental sustainability, they inform about impacts that cannot be otherwise evaluated from a prompt
engineering perspective.

3.6 Prompt management: documentation


Documenting prompts has emerged as a crucial practice in the responsible development and deployment of AI systems.
Much like traditional software documentation, prompt documentation serves as a comprehensive record of how AI models
are instructed to perform specific tasks, ensuring transparency and reproducibility of results. This is particularly relevant
if generative AI is used in the context of decisions that need to be explained to their addressees, even if the system was just
used to prepare the decision. Documentation in prompt engineering encompasses recording not only the prompts
themselves but also their intended purposes, outcomes, and iterations. This practice is particularly vital because prompt
outputs can vary significantly across different models, sampling settings, and even different versions of the same model
[113]. By maintaining detailed records, organizations can track the evolution of their prompts, understand what works and
what does not, and ensure consistency in AI interactions. A comprehensive prompt documentation system typically
captures several key elements. The fundamental components include the prompt's name or identifier, its version history,
creation and modification dates, the specific AI model used, and detailed performance notes [114]. This basic information
helps teams maintain oversight of their AI interactions and enables systematic improvement of prompt effectiveness.
Organizations typically employ two primary approaches to prompt documentation. The reduced documentation method
focuses on tracking basic elements like AI tools used and their general purposes, while extensive documentation captures
complete prompt-output pairs and detailed chat histories [114]. For practical implementation, prompts should be stored in
easily accessible text formats rather than screenshots, with proper version control systems in place to track modifications
[114]. Effective prompt documentation requires a systematic approach to recording and organizing information. Teams
should maintain a centralized repository, such as a spreadsheet, where each prompt's complete history can be tracked [113].
This documentation should include not just the prompt text but also contextual information about its purpose, any specific
parameters or settings used, and notes about its performance in different scenarios [114].
Documentation plays a vital role in ensuring accountability and transparency in AI systems. By maintaining detailed
records of prompts and their outcomes, organizations can better understand how their AI systems react, identify potential
biases, and make necessary adjustments to improve fairness and accuracy [115]. This practice also facilitates collaboration
among team members and helps maintain consistency in AI interactions across different applications and use cases. Even
if generative AI is not directly tasked with making decision, but only used as part of a decision support system, a
documentation of prompts might be valuable in order to explain how the decision came about. This is evidenced in Art. 86
of the EU AI Act, which provides for a right to explanation when the decision is taken “on the basis of the output from a
high-risk AI system”. This includes an explanation “of the role of the AI system in the decision-making procedure and the
main elements of the decision taken.” When using generative AI, a very good and tangible way to communicate how the
system was used is to include the prompts or parts of it.

12
4 CONCLUSIONS
Responsible prompt engineering has emerged as a critical framework for ensuring that generative artificial intelligence
systems serve society's needs while minimizing potential harms. As these AI systems become increasingly powerful and
ubiquitous, the way we instruct and interact with them through prompts carries profound implications for fairness,
accountability, and transparency (2.1). The analysis reveals that effective prompt engineering requires a delicate balance
between technical precision and ethical consciousness, combining systematic rigor with a nuanced understanding of social
impact. Through examination of real-world practices, this article demonstrates how responsible prompt engineering serves
as a crucial bridge between AI development and deployment, enabling organizations to steer AI outputs without modifying
underlying model architectures (2.2).
The research highlights five interconnected components essential for responsible prompt engineering: prompt design,
system selection, system configuration, performance evaluation, and prompt management. Each component plays a vital
role in promoting improved societal outcomes while mitigating potential risks. The framework emphasizes the importance
of documentation, systematic evaluation, and careful consideration of ethical implications throughout the prompt
engineering process (3). Thus, this work contributes to the growing discourse on AI responsibility by providing practical
guidelines for implementing responsible prompt engineering practices. The findings suggest that organizations must move
beyond viewing prompt engineering as merely a technical skill and instead recognize it as a crucial component of
responsible AI deployment. As generative AI continues to evolve, the principles and practices outlined in this article offer
a foundation for ensuring these powerful tools serve society's needs while upholding ethical standards and promoting
fairness. The implications of this research extend beyond technical implementation, touching on fundamental questions of
accountability, transparency, and social justice in AI systems. As we continue to integrate these technologies into various
aspects of society, responsible prompt engineering will play an increasingly vital role in shaping how AI systems interact
with and impact human lives (2.2, 3.5.1).
The growing recognition of prompt engineering as a core competency in AI literacy underscores its importance beyond
technical domains [116, 117]. As educational institutions and organizations incorporate prompt engineering into their
curricula and training programs [83, 118], the need for responsible practices becomes increasingly apparent. Furthermore,
prompt engineering also serves as an experimental methodology to systematically explore and document both the
capabilities and limitations of generative AI systems, enabling organizations to better understand their practical boundaries
and potential applications. This evolution reflects a broader shift in how we approach AI development, moving from purely
technical optimization toward a more holistic understanding that encompasses ethical considerations and societal impacts.
The analysis reveals that responsible prompt engineering represents both an opportunity and a necessity. As an opportunity,
it offers a practical framework for embedding ethical considerations directly into AI interactions without requiring
modifications to underlying model architectures. As a necessity, it provides essential guardrails for ensuring that AI
systems serve societal needs while minimizing potential harms. This framework can function as an aid in prompt
engineering, for example when building master prompts to organize knowledge around what has worked and how the
components interact with each other. However, it can also aid further research in this space. There remains significant
potential for deeper investigation into specific aspects of implementation, evaluation metrics, and long-term impacts.
Future work might explore how responsible prompt engineering practices evolve alongside advancing AI capabilities, and
how these practices can be effectively scaled across different organizational contexts and application domains.

13
ACKNOWLEDGMENTS
[OMITTED FOR REASONS OF ANONYMIZATION]
The following AI models have been used for improving language and style of this contribution including spelling or
grammar checks and corrections, avoiding repetitions, translating, improving the order of the arguments and summarizing
parts of this text and other texts to be paraphrased and cited: Claude 3.5 Sonnet, deepl.com, ChatGPT 4o, Grammarly,
Microsoft Word

14
REFERENCES

References
[1] Danielle Allen and E. G. Weyl. 2024. The Real Dangers of Generative AI. jod 35, 1, 147–162. DOI:
https://doi.org/10.1353/jod.2024.a915355.
[2] Julia Black. 2012. The Role of Risk in Regulatory Processes. In The Oxford handbook of regulation, Robert
Baldwin, Martin Cave and Martin Lodge, Eds. Oxford University Press, Oxford U. K., 302–348. DOI:
https://doi.org/10.1093/oxfordhb/9780199560219.003.0014.
[3] Andrés Domínguez Hernández, Shyam Krishna, Antonella M. Perini, Michael Katell, S. J. Bennett, Ann Borda,
Youmna Hashem, Semeli Hadjiloizou, Sabeehah Mahomed, Smera Jayadeva, Mhairi Aitken, and David Leslie.
Mapping the individual, social and biospheric impacts of Foundation Models. In , 776–796. DOI:
https://doi.org/10.1145/3630106.3658939.
[4] Partha P. Ray. 2023. ChatGPT: A comprehensive review on background, applications, key challenges, bias,
ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems 3, 121–154. DOI:
https://doi.org/10.1016/j.iotcps.2023.04.003.
[5] BBC. 2024. Google to fix AI picture bot after 'woke' criticism. BBC News (Feb. 2024).
[6] David Gilbert. 2024. Google’s ‘Woke’ Image Generator Shows the Limitations of AI. wired (Feb. 2024).
[7] Nico Grant. 2024. Google Says It Fixed Its A.I. Image Generator. The New York Times (Aug. 2024).
[8] Prabhakar Raghavan. 2024. Gemini image generation got it wrong. We'll do better. Google (Feb. 2024).
[9] Daniel J. Bogiatzis-Gibbons. 2024. Beyond Individual Accountability: (Re-)Asserting Democratic Control of AI.
In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency. ACM Digital
Library. Association for Computing Machinery, Erscheinungsort nicht ermittelbar, 74–84. DOI:
https://doi.org/10.1145/3630106.3658541.
[10] Gloria Miller. 2022. Stakeholder-accountability model for artificial intelligence projects. Journal of Economics
and Management, 446–494.
[11] Zoe Porter, Annette Zimmermann, Phillip Morgan, John McDermid, Tom Lawton, and Ibrahim Habli. 2022.
Distinguishing two features of accountability for AI technologies. Nat Mach Intell 4, 9, 734–736. DOI:
https://doi.org/10.1038/s42256-022-00533-0.
[12] Stephen C. Slota, Kenneth R. Fleischmann, Sherri Greenberg, Nitin Verma, Brenna Cummings, Lan Li, and
Chris Shenefiel. 2023. Many hands make many fingers to point: challenges in creating accountable AI. AI & Soc
38, 4, 1287–1299. DOI: https://doi.org/10.1007/s00146-021-01302-0.
[13] Trisha Greenhalgh, Sally Thorne, and Kirsti Malterud. 2018. Time to challenge the spurious hierarchy of
systematic over narrative reviews? European Journal of Clinical Investigation 48, 6, e12931. DOI:
https://doi.org/10.1111/eci.12931.
[14] Dave Harris. 2019 // 2020. Literature Review and Research Design // Literature review and research design: A
guide to effective research practice. Research Methods. Routledge, London, New York.
[15] Javeed Sukhera. 2022. Narrative Reviews: Flexible, Rigorous, and Practical. Journal of graduate medical
education 14, 4, 414–417. DOI: https://doi.org/10.4300/JGME-D-22-00480.1.
[16] Xavier Amatriain. 2024. Prompt Design and Engineering: Introduction and Advanced Methods.
[17] Sabit Ekin. 2023. Prompt Engineering For ChatGPT: A Quick Guide To Techniques, Tips, And Best Practices.
[18] Shubham Vatsal and Harsh Dubey. 2024. A Survey of Prompt Engineering Methods in Large Language Models
for Different NLP Tasks. DOI: https://doi.org/10.48550/arXiv.2407.12994.
[19] Joseph Lindley and Roger Whitham. 2024. From Prompt Engineering to Prompt Craft. DOI:
https://doi.org/10.1145/3689050.3704424.
[20] Sander Schulhoff, Michael Ilie, Nishant Balepur, Konstantine Kahadze, Amanda Liu, Chenglei Si, Yinheng Li,
Aayush Gupta, HyoJung Han, Sevien Schulhoff, Pranav S. Dulepet, Saurav Vidyadhara, Dayeon Ki, Sweta
Agrawal, Chau Pham, Gerson Kroiz, Feileen Li, Hudson Tao, Ashay Srivastava, Hevander Da Costa, Saloni
Gupta, Megan L. Rogers, Inna Goncearenco, Giuseppe Sarli, Igor Galynker, Denis Peskoff, Marine Carpuat,

15
Jules White, Shyamal Anadkat, Alexander Hoyle, and Philip Resnik. 2024. The Prompt Report: A Systematic
Survey of Prompting Techniques.
[21] James Phoenix and Mike Taylor. 2024. Prompt Engineering for Generative AI: Future-proof inputs for reliable
AI outputs at scale. O'Reilly, Sebastopol.
[22] Dan Cleary. 2025. Strategies for Managing Prompt Sensitivity and Model Consistency (January 2025). Retrieved
January 22, 2025 from https://www.prompthub.us/blog/strategies-for-managing-prompt-sensitivity-and-model-
consistency-.
[23] t2informatik. We develop software. 2023. What is Prompt Engineering? - Smartpedia - t2informatik.
t2informatik GmbH (EN) (Jan. 2023).
[24] Sunil Ramlochan - Enterpise AI Strategist. 2024. Complete Guide to Prompt Engineering with Temperature and
Top-p. Prompt Engineering (Aug. 2024).
[25] Research Team. 2024. Unveiling the Secrets: The Art of Evaluating Prompt Engineering Strategies.
Threatshare.ai (May. 2024).
[26] Zijie J. Wang, Aishwarya Chakravarthy, David Munechika, and Duen H. Chau. 2024. Wordflow: Social Prompt
Engineering for Large Language Models. DOI: https://doi.org/10.48550/arXiv.2401.14447.
[27] Denis Federiakin, Dimitri Molerov, Olga Zlatkin-Troitschanskaia, and Andreas Maur. Prompt engineering as a
new 21st century skill. DOI: https://doi.org/10.3389/feduc.2024.1366434.
[28] Chirag Shah. 2024. From Prompt Engineering to Prompt Science With Human in the Loop. DOI:
https://doi.org/10.48550/arXiv.2401.04122.
[29] Navveen Balani. 2025. Ethical Prompt Engineering: A Pathway To Responsible AI Usage (January 2025).
Retrieved from https://navveenbalani.dev/index.php/articles/ethical-prompt-engineering-a-pathway-to-
responsible-ai-usage/.
[30] Vagner F. de Santana. Challenges and Opportunities for Responsible Prompting. CHI EA '24: Extended Abstracts
of the CHI Conference on Human Factors in Computing Systems 2024, 1–4. DOI:
https://doi.org/10.1145/3613905.3636268.
[31] Adam M. Victor. 2024. Prompt Engineering: The Key to Ethical AI Conversations (2024). Retrieved from.
[32] Andrei-Victor Chisca, Andrei-Cristian Rad, and Camelia Lemnaru. 2024. Prompting Fairness: Learning Prompts
for Debiasing Large Language Models. In Proceedings of the Fourth Workshop on Language Technology for
Equality, Diversity, Inclusion, 52–62.
[33] Satyam Dwivedi, Sanjukta Ghosh, and Shivam Dwivedi. 2023. Breaking the Bias: Gender Fairness in LLMs
Using Prompt Engineering and In-Context Learning. rupkatha 15, 4. DOI:
https://doi.org/10.21659/rupkatha.v15n4.10.
[34] Felix Friedrich, Katharina Hämmerl, Patrick Schramowski, Jindrich Libovicky, Kristian Kersting, and Alexander
Fraser. 2024. Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering
May Not Help You.
[35] Rachel Skilton and Alison Cardinal. 2024. Inclusive Prompt Engineering: A Methodology for Hacking Biased AI
Image Generation. SIGDOC '24: Proceedings of the 42nd ACM International Conference on Design of
Communication, 76–80. DOI: https://doi.org/10.1145/3641237.3691655.
[36] Amalia Foka. 2024. A Framework for Critical Evaluation of Text-to-Image Models: Integrating Art Historical
Analysis, Artistic Exploration, and Critical Prompt Engineering.
[37] Zane Durante, Qiuyuan Huang, Naoki Wake, Ran Gong, Jae S. Park, Bidipta Sarkar, Rohan Taori, Yusuke Noda,
Demetri Terzopoulos, Yejin Choi, Katsushi Ikeuchi, Hoi Vo, Li Fei-Fei, and Jianfeng Gao. 2024. Agent AI:
Surveying the Horizons of Multimodal Interaction.
[38] Onadav Shavit, Sandhini Agarwal, Miles Brundage, Steven Adler, Cullen O'Keefe, Rosie Campbell, Teddy Lee,
Pamela Mishkin, Tyna Eloundou, Alan Hickey, Katarina Slama, Lama Ahmad, Paul McMillan, Alex Beutel,
Alexandre Passos, and David G. Robinson. Practices for Governing Agentic AI Systems. Retrieved from https://
cdn.openai.com/papers/practices-for-governing-agentic-ai-systems.pdf.
[39] Seth Baum. A Survey of Artificial General Intelligence Projects for Ethics, Risk, and Policy. Global Catastrophic
Risk Institute Working Paper 1. Global Catastrophic Risk Institute. DOI: https://doi.org/10.2139/ssrn.3070741.

16
[40] Isaac Triguero, Daniel Molina, Javier Poyatos, Javier Del Ser, and Francisco Herrera. 2024. General Purpose
Artificial Intelligence Systems (GPAIS): Properties, definition, taxonomy, societal implications and responsible
governance. Information Fusion 103, 1–16. DOI: https://doi.org/10.1016/j.inffus.2023.102135.
[41] Justin D. Weisz, Michael Muller, Jessica He, and Stephanie Houde. 2023. Toward General Design Principles for
Generative AI Applications.
[42] Victoria Arkhurst. 2023. Security Risks, Bias, AI Prompt Engineering (2023). Retrieved from.
[43] Maanak Gupta, Charankumar Akiri, Kshitiz Aryal, Eli Parker, and Lopamudra Praharaj. 2023. From ChatGPT to
ThreatGPT: Impact of Generative AI in Cybersecurity and Privacy. IEEE Access 11, 80218–80245. DOI:
https://doi.org/10.1109/ACCESS.2023.3300381.
[44] European Parliament and Council of the European Union. 2024. Regulation (EU) 2024/1689 Laying down
harmonised rules on artificial intelligence (Artificial Intelligence Act) and amending certain Union Legislative
Acts.
[45] J. D. Zamfirescu-Pereira, Richmond Y. Wong, Bjoern Hartmann, and Qian Yang. 2023. Why Johnny Can’t
Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts. In Proceedings of the 2023 CHI
Conference on Human Factors in Computing Systems. ACM Digital Library. Association for Computing
Machinery, New York, 1–21. DOI: https://doi.org/10.1145/3544548.3581388.
[46] Louie Giray. 2023. Prompt Engineering with ChatGPT: A Guide for Academic Writers. Annals of biomedical
engineering 51, 12, 2629–2633. DOI: https://doi.org/10.1007/s10439-023-03272-4.
[47] Jiajia Yuan, Peng Bao, Zifan Chen, Mingze Yuan, Jie Zhao, Jiahua Pan, Yi Xie, Yanshuo Cao, Yakun Wang,
Zhenghang Wang, Zhihao Lu, Xiaotian Zhang, Jian Li, Lei Ma, Yang Chen, Li Zhang, Lin Shen, and Bin Dong.
2023. Advanced prompting as a catalyst: Empowering large language models in the management of
gastrointestinal cancers. TIME 1, 2, 100019. DOI: https://doi.org/10.59717/j.xinn-med.2023.100019.
[48] Baha Rababah, Shang, Wu, Matthew Kwiatkowski, Carson Leung, and Cuneyt G. Akcora. 2024. SoK: Prompt
Hacking of Large Language Models.
[49] Sander Schulhoff. 2025. Prompt Hacking: Understanding Types and Defenses for LLM Security (January 2025).
Retrieved January 22, 2025 from https://learnprompting.org/docs/prompt_hacking/introduction.
[50] The People Who Fight Hacking and Cybercrime Are Turning to Designers For Help. Retrieved June 8, 2017
from http://www.nextgov.com/cybersecurity/2017/05/people-who-fight-hacking-and-cybercrime-are-turning-
designers-help/138009/.
[51] Wouter Eggink, Deger Ozkaramanli, Cristina Zaga, and Nicola Liberati. 2020. Setting the Stage for Responsible
Design. In DRS2020: Synergy. Proceedings of DRS. Design Research Society. DOI:
https://doi.org/10.21606/drs.2020.116.
[52] Bernd C. Stahl, Simisola Akintoye, Lise Bitsch, Berit Bringedal, Damian Eke, Michele Farisco, Karin Grasenick,
Manuel Guerrero, William Knight, Tonii Leach, Sven Nyholm, George Ogoh, Achim Rosemann, Arleen Salles,
Julia Trattnig, and Inga Ulnicane. 2021. From Responsible Research and Innovation to responsibility by design.
Journal of Responsible Innovation 8, 2, 175–198. DOI: https://doi.org/10.1080/23299460.2021.1955613.
[53] David H. Guston. 2014. Understanding 'anticipatory governance'. Social Studies of Science 44, 2, 218–242. DOI:
https://doi.org/10.1177/0306312713508669.
[54] Daniel Sarewitz. 2011. Anticipatory Governance of Emerging Technologies. In The Growing Gap Between
Emerging Technologies and Legal-Ethical Oversight. The Pacing Problem, Gary E. Marchant, Braden R.
Allenby and Joseph R. Herkert, Eds. The International Library of Ethics, Law and Technology, 7. Springer
Netherlands, Dordrecht, 95–105.
[55] Keld Bødker, Finn Kensing, and Jesper Simonsen. 2004. Participatory IT design: Designing for business and
workplace realities. MIT Press, Cambridge, Mass.
[56] Elizabeth Bondi, Lily Xu, Diana Acosta-Navas, and Jackson A. Killian. 2021. Envisioning Communities: A:
Participatory Approach Towards AI for Social Good. AIES '21: Proceedings of the 2021 AAAI/ACM Conference
on AI, Ethics, and Society, 425–436. DOI: https://doi.org/10.1145/3461702.3462612.
[57] Lee A. Bygrave. 2022. Security by Design: Aspirations and Realities in a Regulatory Context. OLR 8, 3, 126–
177. DOI: https://doi.org/10.18261/olr.8.3.2.

17
[58] Mireille Hildebrandt. 2015. Legal Protection by Design: Objections and Refutations. Legisprudence 5, 2, 223–
248.
[59] Sang M. Xie, Aditi Raghunathan, Percy Liang, and Tengyu Ma. 2021. An Explanation of In-context Learning as
Implicit Bayesian Inference.
[60] DigitalOcean. 2025. Prompt Engineering Best Practices: Tips, Tricks, and Tools | (January 2025). Retrieved
January 22, 2025 from https://www.digitalocean.com/resources/articles/prompt-engineering-best-practices.
[61] Luzniak, Karolina. 2023. Preventing Bias in Generative AI: How to Ensure Models’ Fairness and Accuracy?
(2023). Retrieved from https://neoteric.eu/blog/preventing-bias-in-generative-ai-how-to-ensure-models-fairness-
and-accuracy/.
[62] Promptfoo. 2024. Preventing Bias & Toxicity in Generative AI (2024). Retrieved from https://
www.promptfoo.dev/blog/prevent-bias-in-generative-ai/.
[63] Snyder Kieran. Mindful AI: Crafting prompts to mitigate the bias in generative AI. Retrieved 21.01.25 from
https://textio.com/blog/mindful-ai-crafting-prompts-to-mitigate-the-bias-in-generative-ai.
[64] Alexander Pettersson and Melanie Paschke. 2024. Ethical Prompting for Generative AI: A how-to guide for
students at ETH Zurich. ETH Zurich, Zurich.
[65] Dengel Tobias. 2024. To Prevent Generative AI Hallucinations and Bias, Integrate Checks and Balances. Big
Data Wire (Aug. 2024).
[66] Harvard University IT. 2025. Getting started with prompts for text-based Generative AI tools (January 2025).
Retrieved January 22, 2025 from https://huit.harvard.edu/news/ai-prompts.
[67] Zheng Chu, Jingchang Chen, Qianglong Chen, Weijiang Yu, Tao He, Haotian Wang, Weihua Peng, Ming Liu,
Bing Qin, and Ting Liu. 2023. Navigate through Enigmatic Labyrinth A Survey of Chain of Thought Reasoning:
Advances, Frontiers and Future.
[68] Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Le Quoc, and Denny
Zhou. 2022. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.
[69] Qing Lyu, Shreya Havaldar, Adam Stein, Li Zhang, Delip Rao, Eric Wong, Marianna Apidianaki, and Chris
Callison-Burch. 2023. Faithful Chain-of-Thought Reasoning.
[70] Alex Bennet. 2025. Examples of Ethical Prompting for ChatGPT and Artificial Intelligence (2025). Retrieved
January 17, 2025 from https://www.thoughtmedia.com/ethical-prompting/.
[71] Lance Eliot. 2023. Latest Prompt Engineering Technique Chain-Of-Verification Does A Sleek Job Of Keeping
Generative AI Honest And Upright (2023). Retrieved from https://www.forbes.com/sites/lanceeliot/2023/09/23/
latest-prompt-engineering-technique-chain-of-verification-does-a-sleek-job-of-keeping-generative-ai-honest-and-
upright/.
[72] Aditya Kuppa, Rasumov-Rahe, Nikon, and Marc Voses, Eds. 2021. Chain Of Reference prompting helps LLM to
think like a lawyer.
[73] Fangyi Yu, Lee Quartey, and Frank Schilder. 2022. Legal Prompting: Teaching a Language Model to Think Like
a Lawyer.
[74] Jay J. Caughron, Alison L. Antes, Cheryl K. Stenmark, Chaise E. Thiel, Xiaoqian Wang, and Michael D.
Mumford. 2011. Sensemaking Strategies for Ethical Decision-making. Ethics & behavior 21, 5, 351–366. DOI:
https://doi.org/10.1080/10508422.2011.604293.
[75] David Miller. 2024. Understanding Prompt Bias and How to Overcome It (2024). Retrieved from https://
futureskillsacademy.com/blog/prompt-bias-in-ai/.
[76] Vidisha Vijay. 2024. Mitigating AI bias with prompt engineering — putting GPT to the test (2024). Retrieved
from https://venturebeat.com/ai/mitigating-ai-bias-with-prompt-engineering-putting-gpt-to-the-test/.
[77] Tongshuang Wu, Michael Terry, and Carrie J. Cai. 2022. AI Chains: Transparent and Controllable Human-AI
Interaction by Chaining Large Language Model Prompts. In CHI Conference on Human Factors in Computing
Systems. ACM Digital Library. Association for Computing Machinery, New York,NY,United States, 1–22. DOI:
https://doi.org/10.1145/3491102.3517582.
[78] Miles Turpin, Julian Michael, Ethan Perez, and Samuel R. Bowman. 2023. Language Models Don't Always Say
What They Think: Unfaithful Explanations in Chain-of-Thought Prompting. arXiv.org abs/2305.04388. DOI:
https://doi.org/10.48550/arXiv.2305.04388.

18
[79] Chirag Agarwal, Sree H. Tanneru, and Himabindu Lakkaraju. 2024. Faithfulness vs. Plausibility: On the
(Un)Reliability of Explanations from Large Language Models.
[80] Riikka Koulu and Jörg Pohle. 2024. Legal Design Patterns: New Tools for Analysis and Translations Between
Law and Technology. DISO 3, 2, 1–13. DOI: https://doi.org/10.1007/s44206-024-00109-y.
[81] PromptHero. 2025. Search prompts for Stable Diffusion, ChatGPT & Midjourney (January 2025). Retrieved
January 17, 2025 from https://prompthero.com/.
[82] The Prompt Index. AI Prompt Database. Retrieved January 18, 2025 from https://www.thepromptindex.com/
prompt-database.
[83] Maastricht University. 2025. AI Prompt Library (January 2025). Retrieved January 17, 2025 from https://
www.maastrichtuniversity.nl/about-um/education-at-um/edlab/ai-education-maastricht-university/ai-prompt-
library.
[84] promptsty.com. 2024. Prompts For Artificial Intelligence Ethics: Essential Guide - PromptsTY (2024). Retrieved
January 17, 2025 from https://promptsty.com/prompts-for-artificial-intelligence-ethics/.
[85] Tian Y. Liu, Stefano Soatto, Matteo Marchi, Pratik Chaudhari, and Paulo Tabuada. 2024. Meanings and Feelings
of Large Language Models: Observability of Latent States in Generative AI.
[86] Anthropic. 2025. System Prompts - Anthropic (January 2025). Retrieved January 17, 2025 from https://
docs.anthropic.com/en/release-notes/system-prompts.
[87] GenAIScript. 2025. System Prompts (January 2025). Retrieved January 17, 2025 from https://microsoft.github.io/
genaiscript/reference/scripts/system/.
[88] Elias Bachaalany. 2025. TheBigPromptLibrary: A collection of prompts, system prompts and LLM instructions
(January 2025). Retrieved January 17, 2025 from https://github.com/0xeb/TheBigPromptLibrary.
[89] Vlad Alex. 2025. ChatGPT-System-Prompts: (January 2025). Retrieved January 17, 2025 from https://
github.com/mustvlad/ChatGPT-System-Prompts.
[90] Krishna Ronanki, Beatriz Cabrero-Daniel, Jennifer Horkoff, and Christian Berger. 2023. Requirements
Engineering using Generative AI: Prompts and Prompting Patterns.
[91] Hari Subramonyam, Divy Thakkar, Jürgen Dieber, and Anoop Sinha. 2024. Content-Centric Prototyping of
Generative AI Applications: Emerging Approaches and Challenges in Collaborative Software Teams.
[92] Farouq Sammour, Jia Xu, Xi Wang, Mo Hu, and Zhenyu Zhang. 2024. Responsible AI in Construction Safety:
Systematic Evaluation of Large Language Models and Prompt Engineering.
[93] S. M. T. I. Tonmoy, S. M. M. Zaman, Vinija Jain, Anku Rani, Vipula Rawte, Aman Chadha, and Amitava Das.
2024. A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models.
[94] Yannan Li. 2024. Reduce AI Illusion Based on Data Science Technology and Prompt Engineering. Applied and
Computational Engineering.
[95] Douglas Schuler, Aki Namioka, and Lucy A. Suchman. 1993. Participatory Design: Principles and Practices.
CRC Press, Mahwah.
[96] Clay Spinuzzi. 2005. The Methodology Of Participatory Design. Technical Communication 32, 163–174.
[97] Andy Stirling. 2008. “Opening Up” and “Closing Down”: Power, Participation, and Pluralism in the Social
Appraisal of Technology. Science, Technology, & Human Values 33, 2, 262–294. DOI:
https://doi.org/10.1177/0162243907311265.
[98] Simon Thorne. Understanding and Evaluating Trust in Generative AI and Large Language Models for
Spreadsheets. In Proceedings of the European Spreadsheet Risks Interest Group, Simon Thorne, Ed. EuSpRiG,
65–78.
[99] Tomáš Ráčil, Petr Gallus, and Tomáš Šlajs. 2024. Efficiency Divide: Comparative Analysis of Human & Neural
Network Algorithm Development. European Conference on Cyber Warfare and Security, 683–692.
[100] Mehdi Ben Amor, Michael Granitzer, and Jelena Mitrović. 2023. Impact of Position Bias on Language Models in
Token Classification. DOI: https://doi.org/10.1145/3605098.3636126.
[101] Keita Saito, Akifumi Wachi, Koki Wataoka, and Youhei Akimoto. 2023. Verbosity Bias in Preference Labeling
by Large Language Models.
[102] Wenda Xu, Guanglei Zhu, Xuandong Zhao, Liangming Pan, Lei Li, and William Y. Wang. 2024. Pride and
Prejudice: LLM Amplifies Self-Bias in Self-Refinement.

19
[103] Lance Eliot. 2024. Knowing About Temperature Settings When Using Generative AI Is Hot Stuff For Prompt
Engineering (July 2024). Retrieved January 18, 2024 from https://www.forbes.com/sites/lanceeliot/2024/07/29/
knowing-about-temperature-settings-when-using-generative-ai-is-hot-stuff-for-prompt-engineering/.
[104] Sander Schulhoff. 2025. Understanding Temperature, Top P, and Maximum Length in LLMs (January 2025).
Retrieved January 22, 2025 from https://learnprompting.org/docs/intermediate/configuration_hyperparameters.
[105] Kate Crawford. 2024. Generative AI's environmental costs are soaring - and mostly secret. Nature 626, 8000,
693. DOI: https://doi.org/10.1038/d41586-024-00478-x.
[106] Mél Hogan. 2024. The Fumes of AI. Critical AI 2, 1. DOI: https://doi.org/10.1215/2834703X-11205231.
[107] Pengfei Li, Jianyi Yang, Mohammad A. Islam, and Shaolei Ren. 2023. Making AI Less "Thirsty": Uncovering
and Addressing the Secret Water Footprint of AI Models.
[108] Oxford Analytica. 2024. AI will exacerbate water scarcity. Emerald Expert Briefings. Emerald. Retrieved from.
[109] Rishi Bommasani, Kevin Klyman, Sayash Kapoor, Shayne Longpre, Betty Xiong, Nestor Maslej, and Percy
Liang. 2024. The Foundation Model Transparency Index v1.1.
[110] Michelle Avery. 2024. From Benchmarks to Red-Teaming: Ensuring Robust and Responsible AI (2024).
Retrieved from https://www.willowtreeapps.com/insights/genai-benchmarking-and-red-teaming.
[111] Angelina Wang, Aaron Hertzmann, and Olga Russakovsky. 2024. Benchmark suites instead of leaderboards for
evaluating AI fairness. Patterns (New York, N.Y.) 5, 11, 101080. DOI:
https://doi.org/10.1016/j.patter.2024.101080.
[112] Restack. Benchmarking Ai In Sustainability. Retrieved May 21, 2025 from https://www.restack.io/p/sustainable-
ai-answer-benchmarking-ai-sustainability-cat-ai.
[113] Lee Boonstra. 2024. Documenting Your Prompts: A Best Practice for Success (2024). Retrieved from https://
medium.com/google-cloud/documenting-your-prompts-a-best-practice-for-success-1278f2c0344e.
[114] S Kingson. 2024. How Technical Writers Can Master Prompt Engineering (2024). Retrieved from https://
document360.com/blog/prompt-engineering-for-technical-writers/.
[115] Thalia Khan, Albert Tanjaya, Jacob Pratt, and John Howell. Transparency Through Documentation: A Pathway
to Safer AI. Retrieved May 21, 2025 from https://partnershiponai.org/transparency-through-documentation-a-
pathway-to-safer-ai/.
[116] Leo S. Lo. 2023. The Art and Science of Prompt Engineering: A New Literacy in the Information Age. Internet
Reference Services Quarterly 27, 4, 203–210. DOI: https://doi.org/10.1080/10875301.2023.2227621.
[117] P. Korzyński, G. Mazurek, Pamela Krzypkowska, and Artur Kurasiński. 2023. Artificial intelligence prompt
engineering as a new digital competence: Analysis of generative AI technologies such as ChatGPT.
Entrepreneurial Business and Economics Review, 25–38.
[118] Gonzaga University School of Law. 2024. Generative AI & Legal Research: A guide for students and faculty on
using generative AI as a tool for legal research and writing (2024). Retrieved from https://
libguides.law.gonzaga.edu/c.php?g=1374374.

20

You might also like