WO2025219231A1

WO2025219231A1 - Computer-implemented means and methods for the de novo design of antibodies targeting a specific epitope

Info

Publication number: WO2025219231A1
Application number: PCT/EP2025/059986
Authority: WO
Inventors: Frederic Rousseau; Joost Schymkowitz; Rob VAN DER KANT; Zhongyao Zhang; Maarten Dewilde; Luis Serrano Pubul; Francisco Javier DELGADO BLANCO; Damiano CIANFERONI
Original assignee: Katholieke Universiteit Leuven; Vlaams Instituut voor Biotechnologie VIB; Institucio Catalana de Recerca i Estudis Avancats ICREA; Fundacio Privada Centre de Regulacio Genomica CRG
Current assignee: Katholieke Universiteit Leuven; Vlaams Instituut voor Biotechnologie VIB; Institucio Catalana de Recerca i Estudis Avancats ICREA; Fundacio Privada Centre de Regulacio Genomica CRG
Priority date: 2024-04-15
Filing date: 2025-04-11
Publication date: 2025-10-23
Anticipated expiration: 2026-10-15

Abstract

The present invention relates to methods and systems which relate to the biomedical field and relate to subfields of computational biology and bioinformatics. More, specifically the invention provides a computer-implemented algorithm which can produce a panel of antibodies directed to an epitope of choice.

Description

JoSc-FrRo/denovoabody/842 C_{OMPUTER-IMPLEMENTED MEANS AND METHODS FOR THE DE NOVO DESIGN OF ANTIBODIES} TARGETING A SPECIFIC EPITOPE Field of the invention The present invention relates to methods and systems which relate to the biomedical field and relate to subfields of computational biology and bioinformatics. More, specifically the invention provides a computer-implemented algorithm which can produce a panel of antibodies directed to an epitope of choice. Introduction to the invention Antibody-based therapeutics have emerged as pivotal therapeutic agents to combat various diseases, drastically expanding the therapeutic options available to patients and clinicians. Over the last decade, the vast majority of biologic license applications approved by the FDA are therapeutic antibodies. Traditional methodologies for antibody discovery, rooted in animal immunizations, hybridoma _{technologies, and phage display, have been crucially important to establish the potential and positioning} of antibody-based therapeutics. However, these approaches are fraught with fundamental challenges such as suboptimal developability, limited and unpredictable cross-reactivity of the developed antibodies, and difficulties in targeting conserved binding epitopes. These limitations necessitate a paradigm shift towards innovative approaches to address the evolving demands of antibody _{development. Classical antibody discovery processes are often time-consuming, labor-intensive and} entail sequential stages that prioritize antigen-binding over other critical parameters such as _{manufacturability and long-term stability. This sequential approach results in significant efforts being} _{spent in later stages to optimize these parameters independently, leading to increased costs and} timelines, and sometimes adverse effects on initial affinities. Furthermore, ethical concerns regarding the use of animal models for immunization and the challenges associated with immune responses in host organisms pose additional hurdles. In this context, structure-based de novo antibody design has emerged as a promising approach, leveraging computational methods to predict and optimize antibody-antigen interactions. Central to this approach is the ability to accurately model the three-dimensional structure of antibodies and their target antigens, allowing for rational design and optimization of binding interfaces. Through a combination of computational molecular docking, sequence optimization, and experimental validation of the designed _{models, structure-based de novo antibody design holds the potential to streamline the antibody} development process and to facilitate the discovery of antibodies with tailor-made properties such as _{enhanced binding affinity and stability. In the prior art US2020/168293 discloses a method for generating} JoSc-FrRo/denovoabody/842 _{novel antibodies binding at naturally occurring protein-binding sites, guided by pre-identified} _{interactions with a cognate binder. Thus US2020/168293 uses the structural data from a known binder} _{of the target protein to provide more focussed selection of candidate antibody structures from a} database having a relatively high average affinity to the same target epitope. _{In the present invention we approached structure-based de novo antibody design through two major in} _{silico steps. Firstly, a molecular docking step is performed to bring a template antibody in proximity to a} predefined epitope, and secondly, a design step, where the sequence is built that can accommodate binding to the epitope while retaining foldability of the antibody in isolation. The latter step can be validated by removing the sequence information from a known complex of an antibody bound to a protein ligand and evaluating if it is possible to design sequences that are able to bind the same epitope, _{termed re-paratoping. Here we report an in silico antibody design pipeline, termed EvolveX, for the} generation of antibodies targeting predefined epitopes on protein targets. The EvolveX pipeline _{addresses the main in silico issues in de novo design of antibodies against a predefine target: Docking of} the antibody on the target, optimization of the structure of the bound antibody and designing of the sequence of the complementarity determining regions of the antibody (CDRs), while maintaining good thermodynamic stability of the antibody and avoiding the introduction of aggregation propensity. _{Structure optimization is achieved using ModelX (Cianferoni D et al (2020) Bioinformatics 36(14):4208-} 10), interaction energy and antibody stability are evaluated by the empirical forcefield FoldX _{(Schymkowitz J et al (2005) Nucleic Acid Res. 33 W382-388 and Delgado J et al (2019) Bioinformatics} _{35(20: 4168-4169) and aggregation propensity by TANGO (Fernandez-Escamilla AM et al (2004) Nat.} _{Biotechnol. 22, 1302-1306). The efficacy of the antibody design step is validated through a series of} _{experimental assays, including phage display and biophysical characterization, demonstrating the} successful design of VHHs starting from both high and low-affinity protein complexes. We further _{explored the flexibility of the EvolveX pipeline by reengineering a nanobody towards an unrelated} protein target, the human Interleukin-9 receptor alpha (hIL-9R^) interface, obtaining binders with low _{nanomolar affinity. Overall, our finding underscores the potential of structure-based de novo antibody} design as a powerful tool in biotechnology and drug discovery. Figure legends Figure 1. Library design, screening, hit selection and validation of VHH’s designed with EvolveX to bind _{mouse Vsig4. (A) Graphical representation of the EvolveX in silico pipeline, where crystal structures of} _{the high affinity complex between mouse Vsig4 (mVsig4) and VHH_WT were used as templates to in} _{silico design sequences of CDR2 and CDR3 that should bind mVsig4 with high affinity, while displaying} good stability. A small library was subsequently screened using phage display. (B) Colonies picked after JoSc-FrRo/denovoabody/842 phage display selection were tested for binding to mVsig4 using AlphaScreen. VHH_WT was used as a positive control, colonies picked after two different rounds of selection were screened. (C) Off-rates of the top 12 unique sequences based on AlphaScreen signal to mVsig4, determined using BLI. (D) Example SPR binding curves of 1 out of 4 hits binding to mVsig4 with low nanomolar affinity. (E) Affinity constants _{of all 4 hits and VHH_WT binding to mVsig4, determined using BLI and SPR. (F) DG values determined} through curve fitting based on data shown in G. (G) Melting temperatures (T_m) for 4 hits and VHH_WT based on Intrinsic Tryptophan Fluorescence (ITF) through a temperature ramp. (H) Aggregation onset temperatures (T_agg) for 4 hits and VHH_WT based on Right Angle Light Scattering (RALS) through a temperature ramp. (I) Sequence alignment indicating the differences in sequence compared to VHH_WT. CDR2 and CDR3 are indicated by blue boxes, positions identical to VHH_WT are indicated with a dot (.), positions explored with the EvolveX pipeline are displayed bold on the sequence of VHH_WT. _{Figure 2. The binding curves (colored) of VHH_WT (A), VHH_h1-VHH_h5 (B-F) against hVsig4 (BLI) with} fitting curves (black). Figure 3. Hotspot grafting method for docking VHH templates onto hIL-9R models at the predicted _{interface with IL-2RG. A) Hotspots are identified on epitopes of known protein ligands using FoldX. B)} _{Sidechains of hotspots are superimposed on potential hotspots on target epitope. C) Original protein} _{ligand is removed, leaving docked template on target epitope. D) FoldX is used to model energetically} _{favorable sidechain conformations, using the Rotacloud command. E) The same template is docked on} all different conformations. F) Not only the backbone is close, but key interactions with the VHH are grafted along. G) The IL-9 signaling complex. IL-9 binds with high affinity to IL-9Ra, this heterodimer than interacts with IL-2RG with low affinity. Binding the interface between IL-9Ra and IL-2RG should inhibit _{signaling. H) Hotspots at this interface were selected for hotspot-based docking. I) Clusters of preferable} docks remain and are subjected to antibody design. _{Figure 4. The binding curves of VHH_i1-VHH_ i8 (A-H) against hIL-9RA (BLI).} Fig.5. Library design, screening, hit selection and validation of VHH’s designed with EvolveX to bind _{human Vsig4. (A) Graphical representation of the EvolveX in silico pipeline, where crystal structures of} _{the low affinity complex between human Vsig4 (hVsig4) and VHH_WT were used as templates to in sil-} _{ico design sequences of CDR2 and CDR3 that should bind hVsig4 with high affinity, while displaying good} stability. A small library was subsequently screened using phage display. (B) Colonies picked after phage display selection were tested for binding to hVsig4 using ELISA. VHH_WT was used as a positive control, colonies picked after two different rounds of selection were screened. (C) Off-rates of the 82 unique sequences, determined using BLI. (D) Example SPR binding curve of 1 out of 5 hits (VHH_h4) binding to hVsig4 with low nanomolar affinity. (E) Affinity constants of all 5 hits and VHH_WT binding to hVsig4 JoSc-FrRo/denovoabody/842 _{determined using BLI and SPR. (F) DG values determined through curve fitting based on data shown in} G. (G) Melting temperatures (T_m) for 5 hits and VHH_WT based on Intrinsic Tryptophan Fluorescence (ITF) through a temperature ramp. (H) Aggregation onset temperatures (Tagg) for 5 hits and VHH_WT based on Right Angle Light Scattering (RALS) through a temperature ramp. (I) Sequence alignment indi- cating the differences in sequence compared to VHH_WT. CDR2 and CDR3 are indicated by blue boxes, positions identical to VHH_WT are indicated with a dot (.), positions explored with the EvolveX pipeline are displayed bold on the sequence of VHH_WT. _{Fig.6. De novo in silico design of single digit nanomolar binding to human interleukin-9 receptor alpha} _{(hIL-9Ra) using hotspot grafting and antibody design. (A) Graphical representation of the EvolveX in} _{silico pipeline, where crystal structures of the low affinity complex between human Interleukin-9} Receptor alpha (hIL-9Ra) and VHH_WT were used as templates to dock VHH_WT onto an Alphafold _{model of hIL-9Ra using one hotspot as an anchoring point. Subsequently all three CDRs were in} _{silico designed, to create sequences able to bind hIL-9Ra with high affinity, while displaying good} stability. A small library was subsequently screened using phage display. (B) Colonies picked after phage display selection were tested for binding to hIL-9Rausing AlphaScreen. VHH_WT was used as a positive control, colonies picked after two different rounds of selection were screened. (C) Off-rates of the 82 unique sequences, determined using BLI. (D) Example BLI binding curve of 1 out of 5 hits (VHH_i1) _{binding to hIL-9Ra with low nanomolar affinity. (E) Affinity constants of all 5 hits and VHH_WT binding} _{to hIL-9Ra determined using BLI. (F) Hits were tested for binding to mVsig4, hVsig4 and hIL-9Ra using} _{AlphaScreen, confirming specificity. (G) Thermal unfolding curve for the highest affinity hit based on} Intrinsic Tryptophan Fluorescence (ITF) through a temperature ramp. Table 1. Biophysical characterization data of VHH_h1-VHH_h5. _{Table 2. Overview of IL9-Ralfa binders with their respective binding affinities.} Detailed description of the invention Definitions _{In order that the present description can be more readily understood, certain terms are first defined.} Additional definitions are set forth throughout the detailed description. The present invention is described with respect to particular embodiments and with reference to certain drawings but the invention is not limited thereto but only by the claims. Any reference signs in the claims shall not be construed as limiting the scope. The drawings described are only schematic and are non-limiting. In the drawings, the size of some of the elements may be exaggerated and not drawn on scale for illustrative JoSc-FrRo/denovoabody/842 purposes. It is to be noted that the term “a” or “an” entity refers to one or more of that entity; for example, “a nucleotide sequence”, is understood to represent one or more nucleotide sequences. As such, the terms “a” (or “an”), “one or more” and “at least one” can be used interchangeably herein. Furthermore, “and/or” where used herein is to be taken as specific disclosure of each of the two specified features or components with or without the other. Thus, the term “and/or” as used in a phrase such as “A and/or B” herein is intended to include “A and B”, “A or B”, “A” (alone), and “B” (alone). Likewise, the term “and/or” as used in a phrase such as “A, B, and/or C” is intended to encompass each of the following aspects: A, B, and C; A, B, or C; A or C; A or B; B or C; A and C; A and B; B and C; A (alone); B (alone); and C (alone). Where an indefinite or definite article is used when referring to a singular noun e.g. “a” or “an”, “the”, this includes a plural of that noun unless something else is specifically stated. Furthermore, the terms first, second, third and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and that the embodiments of the invention described herein are capable of operation in other sequences than described or illustrated herein. It is understood that wherever aspects or embodiments are described herein with the language “comprising”, otherwise analogous aspects or embodiments described in terms of “consisting of” and/or _{“consisting essentially of” are also provided. Where the term “comprising” is used in the present} description and claims, it does not exclude other elements or steps. Unless specifically defined herein, all terms used herein have the same meaning as they would to one skilled in the art of the present invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure is related. For example, the Concise Dictionary of Biomedicine and Molecular Biology, Juo, Pei-Show, 2nd ed., 2002, CRC Press; The Dictionary of Cell and Molecular Biology, 3rd ed., 1999, Academic Press; and the Oxford Dictionary of Biochemistry and Molecular Biology, Revised, 2000, Oxford University Press, provide one of skill with a general dictionary of many of the terms used in this disclosure. Practitioners are particularly directed to Sambrook et al., Molecular Cloning: A Laboratory Manual, 4th ed., Cold Spring Harbor Press, Plainsview, New York (2012); and Ausubel et al., current Protocols in Molecular Biology (Supplement 100), John Wiley & Sons, New York (2012), for definitions and terms of the art. The definitions provided herein should not be construed to have a scope less than understood by a person of ordinary skill in the art. JoSc-FrRo/denovoabody/842 Units, prefixes, and symbols are denoted in their Système International de Unites (SI) accepted form. Numeric ranges are inclusive of the numbers defining the range. Unless otherwise indicated, nucleotide sequences are written left to right in 5' to 3' orientation. Amino acid sequences are written left to right in amino to carboxy orientation. The headings provided herein are not limitations of the various aspects _{of the disclosure, which can be had by reference to the specification as a whole. Accordingly, the terms} defined immediately below are more fully defined by reference to the specification in its entirety. The term “about” is used herein to mean approximately, roughly, around, or in the regions of. When the term “about” is used in conjunction with a numerical range, it modifies that range by extending the boundaries above and below the numerical values set forth. In general, the term “about” can modify a numerical value above and below the stated value by a variance. For example a dissociation constant koff of about 1.50x10^-2/s implies that the koff is within the range between 1.45x10^-2 to 1.55x10^-2/s. The term “antibody” as used herein, refers to an immunoglobulin (Ig) molecule or a molecule comprising an immunoglobulin (Ig) domain, which specifically binds with an antigen. “Antibodies” can be intact immunoglobulins derived from natural sources or from recombinant sources and can be immunoreactive portions of intact immunoglobulins, non-limited examples are VHH’s, Fab’s and scFv’s. Antibodies are typically tetramers of immunoglobulin molecules. The term “immunoglobulin (Ig) domain” as used herein refers to a globular region of an antibody chain, or to a polypeptide that essentially consists of such a globular region. Immunoglobulin domains are characterized in that they retain the immunoglobulin fold (Ig fold as named herein) characteristic of antibody molecules, which consists of a two-layer sandwich of about seven to nine antiparallel β-strands arranged in two β-sheets, optionally stabilized by a conserved disulphide bond. The term “immunoglobulin (Ig) domain”, includes “immunoglobulin constant domain”, and “immunoglobulin variable domain” (abbreviated as “IVD”), wherein the latter means an immunoglobulin domain essentially consisting of four “framework regions” which are referred to in the art and herein below as “framework region 1” or “FR1”; as “framework region 2” or “FR2”; as “framework region 3” or “FR3”; and as “framework region 4” or “FR4”, respectively; which framework regions are interrupted by three “complementarity determining regions” or “CDRs”, which are referred to in the art and herein below as “complementarity determining region 1” _{or “CDR1”; as “complementarity determining region 2” or “CDR2”; and as “complementarity determining} region 3” or “CDR3”, respectively. Thus, the general structure or sequence of an immunoglobulin _{variable domain can be indicated as follows: FR1 - CDR1 - FR2 - CDR2 - FR3 - CDR3 - FR4. It is the} immunoglobulin variable domain(s) (IVDs) that confer specificity to an antibody for the antigen by carrying the antigen-binding site. JoSc-FrRo/denovoabody/842 An “immunoglobulin domain” of this application also includes “immunoglobulin single variable domains” (abbreviated as "ISVD"), equivalent to the term “single variable domains”, and defines molecules wherein the antigen binding site is present on, and formed by, a single immunoglobulin domain. This sets immunoglobulin single variable domains apart from “conventional” immunoglobulins or their fragments, wherein two immunoglobulin domains, in particular two variable domains, interact to form an antigen binding site. Typically, in conventional immunoglobulins, a heavy chain variable domain (VH) and a light chain variable domain (VL) interact to form an antigen binding site. In this case, the _{complementarity determining regions (CDRs) of both VH and VL will contribute to the antigen binding} site, i.e. a total of 6 CDRs will be involved in antigen binding site formation. In view of the above definition, the antigen-binding domain of a conventional 4-chain antibody (such as an IgG, IgM, IgA, IgD or IgE molecule; known in the art) or of a Fab fragment, a F(ab')2 fragment, an Fv fragment such as a disulphide linked Fv or a scFv fragment, or a diabody (all known in the art) derived from such conventional 4-chain antibody, would normally not be regarded as an immunoglobulin single variable domain, as, in these cases, binding to the respective epitope of an antigen would normally not occur by one (single) immunoglobulin domain but by a pair of (associated) immunoglobulin domains such as light and heavy chain variable domains, i.e., by a VH-VL pair of immunoglobulin domains, which jointly bind to an epitope of the respective antigen. In contrast, immunoglobulin single variable domains are capable of specifically binding to an epitope of the antigen without pairing with an additional immunoglobulin variable domain. The binding site of an immunoglobulin single variable domain is formed by a single VH/VHH or VL domain. Hence, the antigen binding site of an immunoglobulin single variable domain is formed by no more than three CDRs. As such, the single variable domain may be a light chain variable domain sequence (e.g., a VL-sequence) or a suitable fragment thereof; or a heavy chain variable domain sequence (e.g., a VH-sequence or VHH sequence) or a suitable fragment thereof; as long as it is capable of forming a single antigen binding unit (i.e., a functional antigen binding unit that essentially consists of the single variable domain, such that the single antigen binding domain does not need to interact with another variable domain to form a functional antigen binding unit). In one embodiment of the invention, the immunoglobulin single variable domains are heavy chain variable domain sequences (e.g., a VH- _{sequence); more specifically, the immunoglobulin single variable domains can be heavy chain variable} domain sequences that are derived from a conventional four-chain antibody or heavy chain variable domain sequences that are derived from a heavy chain antibody. For example, the immunoglobulin single variable domain may be a (single) domain antibody (or an amino acid sequence that is suitable for use as a (single) domain antibody), a “dAb” or dAb (or an amino acid sequence that is suitable for use as a dAb) or a Nanobody (as defined herein, and including but not limited to a VHH); other single variable domains, or any suitable fragment of any one thereof. In particular, the immunoglobulin single variable JoSc-FrRo/denovoabody/842 domain may be a Nanobody (as defined herein) or a suitable fragment thereof. Note: Nanobody®, Nanobodies® and Nanoclone® are registered trademarks of Ablynx N.V. For a general description of Nanobodies, reference is made to the further description below, as well as to the prior art cited herein, such as e.g. described in WO2008/020079. Immunoglobulin domains herein also include “VHH domains”, also known as VHHs, VHH domains, VHH antibody fragments, and VHH antibodies, have originally been described as the antigen-binding immunoglobulin (Ig) (variable) domain of “heavy chain antibodies” (i.e., of “antibodies devoid of light chains”; Hamers-Casterman et al (1993) Nature 363: 446-448). The term “VHH domain” has been chosen to distinguish these variable domains from the heavy chain variable domains that are present in conventional 4-chain antibodies (which are referred to herein as “VH domains”) and from the light chain variable domains that are present in conventional 4-chain antibodies (which are referred to herein as “VL domains”). For a further description of VHHs and Nanobody, reference is made to the review article by Muyldermans (Reviews in Molecular Biotechnology 74: 277-302, 2001), as well as to the following patent applications, which are mentioned as general background art: WO 94/04678, WO 95/04079 and WO 96/34103 of the Vrije Universiteit Brussel; WO 94/25591, WO 99/37681, WO 00/40968, WO 00/43507, WO 00/65057, WO 01/40310, WO 01/44301, EP 1134231 and WO 02/48193 of Unilever; WO 97/49805, WO 01/21817, WO 03/035694, WO 03/054016 and WO 03/055527 of the Vlaams Instituut voor Biotechnologie (VIB); WO 03/050531 of Algonomics N.V. and Ablynx N.V.; WO 01/90190 by the National Research Council of Canada; WO 03/025020 (= EP 1433793) by the Institute of Antibodies; as well as WO 04/041867, WO 04/041862, WO 04/041865, WO 04/041863, WO 04/062551, WO 05/044858, WO 06/40153, WO 06/079372, WO 06/122786, WO 06/122787 and WO 06/122825, by Ablynx N.V. and the further published patent applications by Ablynx N.V. As described in these references, Nanobody (in particular VHH sequences and partially humanized Nanobody) can in particular be characterized by the presence of one or more “Hallmark residues” in one or more of the framework sequences. A further description of the Nanobody, including humanization and/or camelization of Nanobody, as well as other modifications, parts or fragments, derivatives or “Nanobody fusions”, multivalent constructs (including some non-limiting examples of linker sequences) and different modifications to increase the half-life of the Nanobody and their preparations can be found e.g. in WO 08/101985 and WO 08/142164. “Domain antibodies”, also known as “Dabs”, “Domain Antibodies”, and “dAbs” (the terms “Domain Antibodies” and “dAbs” being used as trademarks by the GlaxoSmithKline group of companies) have been described in e.g., EP 0368684, Ward et al. (Nature 341: 544-546, 1989), Holt et al. (Tends in Biotechnology 21: 484-490, 2003) and WO 03/002609 as well as for example WO 04/068820, WO 06/030220, WO 06/003388 and other published patent applications of Domantis Ltd. Domain antibodies JoSc-FrRo/denovoabody/842 essentially correspond to the VH or VL domains of non-camelid mammalians, in particular human 4-chain antibodies. In order to bind an epitope as a single antigen binding domain, i.e., without being paired with a VL or VH domain, respectively, specific selection for such antigen binding properties is required, e.g. by using libraries of human single VH or VL domain sequences. Domain antibodies have, like VHHs, a molecular weight of approximately 13 to approximately 16 kDa and, if derived from fully human sequences, do not require humanization for e.g. therapeutical use in humans. It should also be noted that single variable domains can be derived from certain species of shark (for example, the so-called “IgNAR domains”, see for example WO 05/18629). Immunoglobulin single variable domains such as Domain antibodies and Nanobody (including VHH domains and humanized VHH domains), represent in vivo matured macromolecules upon their production, but can be further subjected to affinity maturation by introducing one or more alterations in the amino acid sequence of one or more CDRs, which alterations result in an improved affinity of the resulting immunoglobulin single variable domain for its respective antigen, as compared to the respective parent molecule. Affinity-matured immunoglobulin single variable domain molecules of the invention may be prepared by methods known in the art, for example, as described by Marks et al. (Biotechnology 10:779-783, 1992), Barbas et al. (Proc. Nat. Acad. Sci, USA 91: 3809-3813, 1994), Shier et al. (Gene 169: 147-155, 1995), Yelton et al. (Immunol.155: 1994-2004, 1995), Jackson et al. (J. Immunol. 154: 3310-9, 1995), Hawkins et al. (J. MoI. Biol. 226: 889896, 1992), Johnson and Hawkins (Affinity maturation of antibodies using phage display, Oxford University Press, 1996). The process of designing/selecting and/or preparing a polypeptide, starting from an immunoglobulin single variable domain such as a Domain antibody or a Nanobody, is also referred to herein as “formatting” said immunoglobulin single variable domain; and an immunoglobulin single variable domain that is made part of a polypeptide is said to be “formatted” or to be “in the format of” said polypeptide. Examples of ways in which an immunoglobulin single variable domain can be formatted and examples of such formats for instance to avoid glycosylation will be clear to the skilled person based on the disclosure herein. Immunoglobulin single variable domains such as Domain antibodies and Nanobody® (including VHH domains) can be subjected to humanization, i.e. increase the degree of sequence identity with the closest human germline sequence. In particular, humanized immunoglobulin single variable domains, such as Nanobody® (including VHH domains) may be immunoglobulin single variable domains in which at least one amino acid residue is present (and in particular, at least one framework residue) that is and/or that corresponds to a humanizing substitution (as defined further herein). Potentially useful humanizing substitutions can be ascertained by comparing the sequence of the framework regions of a _{naturally occurring VHH sequence with the corresponding framework sequence of one or more closely} related human VH sequences, after which one or more of the potentially useful humanizing substitutions JoSc-FrRo/denovoabody/842 (or combinations thereof) thus determined can be introduced into said VHH sequence (in any manner known per se, as further described herein) and the resulting humanized VHH sequences can be tested for affinity for the target, for stability, for ease and level of expression, and/or for other desired properties. In this way, by means of a limited degree of trial and error, other suitable humanizing substitutions (or suitable combinations thereof) can be determined by the skilled person. Also, based on what is described before, (the framework regions of) an immunoglobulin single variable domain, such as a Nanobody® (including VHH domains) may be partially humanized or fully humanized. Humanized immunoglobulin single variable domains, in particular Nanobody®, may have several advantages, such as a reduced immunogenicity, compared to the corresponding naturally occurring VHH domains. By humanized is meant mutated so that immunogenicity upon administration in human patients is minor or non-existent. The humanizing substitutions should be chosen such that the resulting humanized amino acid sequence and/or VHH still retains the favourable properties of the VHH, such as the antigen-binding capacity. Based on the description provided herein, the skilled person will be able to select humanizing substitutions or suitable combinations of humanizing substitutions which optimize or achieve a desired or suitable balance between the favourable properties provided by the humanizing substitutions on the one hand and the favourable properties of naturally occurring VHH domains on the other hand. Such methods are known by the skilled addressee. A human consensus sequence can be used as target sequence for humanization, but also other means are known in the art. One alternative includes a method wherein the skilled person aligns a number of human germline alleles, such as for instance but not limited to the alignment of IGHV3 alleles, to use said alignment for identification of residues suitable for humanization in the target sequence. Also a subset of human germline alleles most homologous to the target sequence may be aligned as starting point to identify suitable humanisation residues. Alternatively, the VHH is analyzed to identify its closest homologue in the human alleles, and used for humanisation construct design. A humanisation technique applied to Camelidae VHHs may also be performed by a method comprising the replacement of specific amino acids, either alone or in combination. Said replacements may be selected based on what is known from literature, are from known humanization efforts, as well as from human consensus sequences compared to the natural VHH sequences, or the human alleles most similar to the VHH sequence of interest. As can be seen from the data on the VHH entropy and VHH variability given in Tables A-5-A-8 of WO 08/020079, some amino acid residues in the framework regions are more conserved between human and Camelidae than others. Generally, although the invention in its broadest sense is not limited thereto, any substitutions, deletions or insertions are preferably made at positions that are less conserved. Also, generally, amino acid substitutions are preferred over amino acid deletions or insertions. For instance, a human-like class of Camelidae single domain antibodies contain the hydrophobic FR2 residues typically found in JoSc-FrRo/denovoabody/842 conventional antibodies of human origin or from other species, but compensating this loss in hydrophilicity by other substitutions at position 103 that substitutes the conserved tryptophan residue present in VH from double-chain antibodies. As such, peptides belonging to these two classes show a high amino acid sequence homology to human VH framework regions and said peptides might be administered to a human directly without expectation of an unwanted immune response therefrom, and without the burden of further humanisation. Indeed, some Camelidae VHH sequences display a high sequence homology to human VH framework regions and therefore said VHH might be administered to patients directly without expectation of an immune response therefrom, and without the additional burden of humanization. Suitable mutations, in particular substitutions, can be introduced during humanization to generate a polypeptide with reduced binding to pre-existing antibodies (reference is made for example to WO 2012/175741 and WO2015/173325), for example at at least one of the positions: 11, 13, 14, 15, 40, 41, 42, 82, 82a, 82b, 83, 84, 85, 87, 88, 89, 103, or 108. The amino acid sequences and/or VHH of the invention may be suitably humanized at any framework residue(s), such as at one or more Hallmark residues (as defined below) or at one or more other framework residues (i.e. non-Hallmark residues) or any suitable combination thereof. Depending on the host organism used to express the amino acid sequence, VHH or polypeptide of the invention, such deletions and/or substitutions may also be designed in such a way that one or more sites for posttranslational modification (such as one or more glycosylation sites) are removed, as will be within the ability of the person skilled in the art. Alternatively, substitutions or insertions may be designed so as to introduce one or more sites for attachment of functional groups (as described herein), for example to allow site-specific pegylation. In some cases, at least one of the typical Camelidae hallmark residues with hydrophilic characteristics at position 37, 44, 45 and/or 47 is replaced (see WO2008/020079 Table A-03). Another example of humanization includes substitution of residues in FR 1, such as position 1, 5, 11, 14, 16, and/or 28; in FR3, such as positions 73, 74, 75, 76, 78, 79, 82b, 83, 84, 93 and/or 94; and in FR4, such as position 103, 104, 108 and/or 111 (see WO2008/020079 Tables A-05 -A08; all numbering according to the Kabat). An “epitope”, as used herein, refers to an antigenic determinant of a polypeptide, constituting a binding site or binding pocket on a target molecule (e.g. a protein to which an immunoglobulin or part thereof, antibody, Fab, VHH or ISVD is binding). “Binding” means any interaction, be it direct or indirect. A direct interaction implies a contact (e.g. physical or chemical) between two binding partners. An indirect interaction means any interaction whereby the interaction partners interact in a complex of more than two molecules. An interaction can be completely indirect (e.g. two molecules are part of the same complex with the help of one or more bridging molecules but don’t bind in the absence of the bridging molecule(s)). An interaction may be partly direct or partly indirect: there is still a direct contact between JoSc-FrRo/denovoabody/842 two interaction partners, but such contact is e.g. not stable, and is stabilized by the interaction with one or more additional molecules. The term “binding pocket” or “binding site” refers to a region of a molecule or molecular complex, that, as a result of its shape and charge, associates with another chemical entity, compound, protein, peptide, antibody, Fab, single domain antibody or ISVD or VHH. An epitope could comprise 1, 2 or 3 amino acids in a spatial conformation, which is unique to the epitope. Generally, an epitope consists of at least 4, 5, 6, 7 such amino acids, and more usually, consists of at least 8, 9, 10 such amino acids. Methods of determining the spatial conformation of amino acids are known in the art, and include, for example, X-ray crystallography and multi-dimensional nuclear magnetic resonance. A “conformational epitope”, as used herein, refers to an epitope comprising amino acids in a spatial conformation that is unique to a folded 3-dimensional conformation of a polypeptide. Generally, a conformational epitope consists of amino acids that are discontinuous in the linear sequence but that come together in the folded structure of the protein. However, a conformational epitope may also consist of a linear sequence of amino acids that adopts a conformation that is unique to a folded 3-dimensional conformation of the polypeptide (and not present in a denatured state). In protein complexes, conformational epitopes consist of amino acids that are discontinuous in the linear sequences of one or more polypeptides that come together upon folding of the different folded polypeptides and their association in a unique quaternary structure. Similarly, conformational epitopes may here also consist of a linear sequence of amino acids of one or more polypeptides that come together and adopt a conformation that is unique to the quaternary structure. The term “conformation” or “conformational state” of a protein refers generally to the range of structures that a protein may adopt at any instant in time. One of skill in the art will recognize that determinants of conformation or conformational state include a protein's primary structure as reflected in a protein's amino acid sequence (including modified amino acids) and the environment surrounding the protein. The conformation or conformational state of a protein also relates to structural features such as protein secondary structures (e.g., α-helix, β-sheet, among others), tertiary structure (e.g., the 3-dimensional folding of a polypeptide chain), and quaternary structure (e.g., interactions of a polypeptide chain with other protein subunits). Posttranslational and other modifications to a polypeptide chain such as ligand binding, phosphorylation, sulfation, glycosylation, or attachments of hydrophobic groups, among others, can influence the conformation of a protein. Furthermore, environmental factors, such as pH, salt concentration, ionic strength, and osmolality of the surrounding solution, and interaction with other proteins and co-factors, among others, can affect protein conformation. The conformational state of a protein may be determined by either functional assay for activity or binding to another molecule or by means of physical methods such as X-ray crystallography, NMR, or spin labelling, among other methods. For a general discussion of protein conformation and conformational states, one is referred to Cantor JoSc-FrRo/denovoabody/842 and Schimmel, Biophysical Chemistry, Part I: The Conformation of Biological. Macromolecules, W.H. Freeman and Company, 1980, and Creighton, Proteins: Structures and Molecular Properties, W.H. Freeman and Company, 1993. A “paratope” as used herein refers to the antigen-binding site and is the part of an antibody which recognizes and binds to an antigen. The term "affinity", as used herein, generally refers to the degree to which an antibody or other binding protein (as defined further herein) binds to a target protein so as to shift the equilibrium of target protein _{and binding protein toward the presence of a complex formed by their binding. Thus, for example, where} an antibody and an antigen are combined in relatively equal concentration, an antibody of high affinity _{will bind to the antigen so as to shift the equilibrium toward high concentration of the resulting complex.} The equilibrium dissociation constant KD (or KD) is commonly used to describe the affinity between a ligand and a target protein, or an antibody and its antigen. KD is the calculated ratio of koff/kon, between the antibody and its antigen and thus measures the propensity of a complex to fall apart into its component molecules. The association constant (k_on or kon) is used to characterize how quickly the _{antibody binds to its target. The dissociation constant (koff or koff, also referred to as kdis, Kdis, Kd or kd)} is used to measure how quickly an antibody dissociates from its target and is expressed as number of units that dissociated from a target per second. Hence, the lower koff is, the higher the affinity towards the target. koff and thus also KD is inversely related to affinity. A high affinity interaction is characterized by a low KD, a fast recognizing (high kon) and a strong stability of formed complexes (low koff). “Amino acids” as used herein refer to the structural units (monomers) that make up proteins. They join together to form short polymer chains called peptides or longer chains called either polypeptides or proteins. These chains are linear and unbranched, with each amino acid residue within the chain attached to two neighbouring amino acids. Twenty amino acids encoded by the universal genetic code are naturally incorporated into polypeptides and are called proteinogenic or natural amino acids. Natural amino acids or naturally occurring amino acids are glycine (Gly or G), Alanine (Ala or A), Valine (Val or V), Leucine (Leu or L), Isoleucine (Ile or I), Methionine (Met or M), Proline (Pro or P), Phenylalanine (Phe or _{F), Tryptophan (Trp or W), Serine (Ser or S), Threonine (Thr or T), Asparagine (Asn or N), Glutamine (Gln} or Q), Tyrosine (Tyr or Y), Cysteine (Cys or C), Lysine (Lys or K), Arginine (Arg or R), Histidine (His or H), Aspartic Acid (Asp or D) and Glutamic Acid (Glu or E). As used herein, the terms “nucleic acid”, “nucleic acid sequence” or “nucleic acid molecule” are used interchangeably and refer to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof. Nucleic acids may have any three-dimensional structure, and may JoSc-FrRo/denovoabody/842 perform any function, known or unknown. Non-limiting examples of nucleic acids include a gene, a gene fragment, exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, control regions, isolated RNA of any sequence, nucleic acid probes, and primers. The nucleic acid molecule may be linear or circular. The nucleic acid may comprise a promoter, an intron, an enhancer region, a polyadenylation site, a translation initiation site, 5’ or 3’ untranslated regions, a reporter gene, a selectable marker or the like. The nucleic acid may comprise single stranded or double stranded DNA or RNA. The nucleic acid may comprise modified bases or a modified backbone. A nucleic acid that is up to about 100 nucleotides in length, is often also referred to as an oligonucleotide. “Nucleotides” as used herein refer to the building blocks of oligonucleotides and polynucleotides, and for the purposes of the present invention include both naturally occurring and non-naturally occurring nucleotides. In nature, nucleotides, such as DNA and RNA nucleotides comprise a ribose sugar moiety, a _{nucleobase moiety and one or more phosphate groups (which are absent in nucleosides). A nucleotide} without a phosphate group is called a “nucleoside” and is thus a compound comprising a nucleobase moiety and a sugar moiety. As used herein, “nucleobase” means a group of atoms that can be linked to a sugar moiety to create a nucleoside that is capable of incorporation into an oligonucleotide, and wherein the group of atoms is capable of bonding with a complementary naturally occurring nucleobase of another oligonucleotide or nucleic acid. Naturally occurring nucleobases of RNA or DNA comprise the purine bases adenine (A) and guanine (G), and the pyrimidine bases thymine (T), cytosine (C) and uracil (U). An “expression cassette” as used herein comprises any nucleic acid construct capable of directing the expression of a gene/coding sequence of interest, which is operably linked to a promoter of the expression cassette. Expression cassettes are generally DNA constructs preferably including (5’ to 3’ in the direction of transcription): a promoter region, a polynucleotide sequence, homologue, variant or fragment thereof operably linked with the transcription initiation region, and a termination sequence including a stop signal for RNA polymerase and a polyadenylation signal. It is understood that all of these regions should be capable of operating in biological cells, such as prokaryotic or eukaryotic cells, to be transformed. The promoter region comprising the transcription initiation region, which preferably includes the RNA polymerase binding site, and the polyadenylation signal may be native to the biological cell to be transformed or may be derived from an alternative source, where the region is functional in the biological cell. Such cassettes can be constructed into a "vector”. The term “vector” or alternatively “vector construct”, “expression vector” or “gene transfer vector” is intended to refer to a nucleic acid molecule capable of transporting another nucleic acid molecule to which it has been linked, and includes any vector known to the skilled person, including any suitable type, but not limited to, for instance, JoSc-FrRo/denovoabody/842 plasmid vectors, cosmid vectors, phage vectors, such as lambda phage, viral vectors, such as adenoviral, AAV or baculoviral vectors, or artificial chromosome vectors such as bacterial artificial chromosomes (BAC), yeast artificial chromosomes (YAC), or P1 artificial chromosomes (PAC). Expression vectors comprise plasmids as well as viral vectors and generally contain a desired coding sequence and appropriate DNA sequences necessary for the expression of the operably linked coding sequence in a _{particular host organism (e.g., bacteria, yeast, plant, insect, or mammal) or in in vitro expression systems.} Cloning vectors are generally used to engineer and amplify a certain desired DNA fragment and may lack functional sequences needed for expression of the desired DNA fragments. The construction of expression vectors for use in transfecting cells is also well known in the art, and thus can be accomplished via standard techniques (see, for example, Sambrook, Fritsch, and Maniatis, in: Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Laboratory Press, 1989; Gene Transfer and Expression Protocols, pp. 109-128, ed. E. J. Murray, The Humana Press Inc., Clif ton, N.J.), and the Ambion 1998 Catalog (Ambion, Austin, Tex.). The terms “identical” or percent “identity” in the context of two or more nucleic acid or amino acid sequences refer to two or more sequences that are the same or have a specified percentage of nucleotides or amino acid residues respectively that are the same, when compared and aligned (introducing gaps, if necessary) for maximum correspondence, not considering any conservative amino acid substitutions as part of the sequence identity. The percent identity can be measured using sequence comparison software or algorithms or by visual inspection. Various algorithms and software are known in the art that can be used to obtain alignments of nucleotide or amino acid sequences. The term “percent sequence identity” or “% sequence identity” or “percent identity” or “% identity” between two polynucleotide or polypeptide sequences refers to the number of identical matched positions shared by the sequences over a comparison window, taking into account additions or deletions (i.e. gaps) that must be introduced for optimal alignment of the two sequences. A matched position is any position where an identical nucleotide or amino acid is presented in both the target and reference sequence. Gaps presented in the target sequence are not counted since gaps are not nucleotides or amino acids. Likewise, gaps presented in the reference sequence are not counted since target sequence nucleotides or amino acids are counted, not nucleotides or amino acids from the reference sequence. One such non-limiting example of a sequence alignment algorithm is the algorithm described in Karlin et al., 1990, Proc. Natl. Acad. Sci., 87:2264-2268, as modified in Karlin et al., 1993, Proc. Natl. Acad. Sci., 90:5873-5877, and incorporated into the NBLAST and XBLAST programs (Altschul et al., 1991, Nucleic Acids Res., 25:3389-3402). In certain aspects, Gapped BLAST can be used as described in Altschul et al., 1997, Nucleic Acids Res. 25:3389-3402. BLAST-2, WU-BLAST-2 (Altschul et al., 1996, Methods in JoSc-FrRo/denovoabody/842 Enzymology, 266:460-480), ALIGN, ALIGN-2 (Genentech, South San Francisco, California) or Megalign (DNASTAR) are additional publicly available software programs that can be used to align sequences. In certain aspects, the percent identity between two nucleotide sequences is determined using the GAP program in the GCG software package (e.g., using a NWSgapdna.CMP matrix and a gap weight of 40, 50, 60, 70, or 90 and a length weight of 1, 2, 3, 4, 5, or 6). In certain alternative aspects, the GAP program in the GCG software package, which incorporates the algorithm of Needleman and Wunsch (J. Mol. Biol. (48):444-453 (1970)) can be used to determine the percent identity between two amino acid sequences (e.g., using either a BLOSUM 62 matrix or a PAM250 matrix, and a gap weight of 16, 14, 12, 10, 8, 6, or 4 and a length weight of 1, 2, 3, 4, 5). Alternatively, in certain aspects, the percent identity between nucleotide or amino acid sequences is determined using the algorithm of Myers and Miller (CABIOS, 4:11-17 (1989)). For example, the percent identity can be determined using the ALIGN program (version 2.0) and using a PAM120 with residue table, a gap length penalty of 12 and a gap penalty of 4. One skilled _{in the art can determine appropriate parameters for maximal alignment by particular alignment} software. In certain aspects, the default parameters of the alignment software are used. One skilled in the art will appreciate that the generation of a sequence alignment for the calculation of a percent sequence identity is not limited to binary sequence-sequence comparisons exclusively driven by primary sequence data. Sequence alignments can be derived from multiple sequence alignments. One suitable program to generate multiple sequence alignments is ClustalW2, available from www.clustal.org. Another suitable program is MUSCLE, available from www.drive5.com/muscle/. ClustalW2 and MUSCLE are alternatively available, e.g., from the EBI (European Bioinformatics Institute). In certain aspects, the percentage identity “X” of a first nucleotide sequence to a second nucleotide sequence is calculated as 100 x (Y/Z), where Y is the number of nucleotide residues scored as identical matches in the alignment of the first and second sequences (as aligned by visual inspection or a particular sequence alignment program) and Z is the total number of residues in the second sequence. If the length of a first sequence is longer than the second sequence, the percent identity of the first sequence to the second sequence will be higher than the percent identity of the second sequence to the first sequence. Different regions within a single polynucleotide target sequence that align with a polynucleotide reference sequence can each have their own percent sequence identity. It is noted that the percent sequence identity value is rounded to the nearest tenth. For example, 80.11, 80.12, 80.13, and 80.14 are rounded down to 80.1, while 80.15, 80.16, 80.17, 80.18, and 80.19 are rounded up to 80.2. It also is noted that the length value will always be an integer. According to the present application, the degree of identity, between a given reference nucleotide sequence and a nucleotide sequence which is a homologue of said given nucleotide sequence will _{preferably be at least 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%,} JoSc-FrRo/denovoabody/842 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%. The degree of identity is given preferably for a nucleic acid region which is at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90% or 100% of the entire length of the reference nucleic acid sequence. For example, if the reference nucleic acid sequence consists of 200 nucleotides, the degree of identity is given preferably for at least 20, at least 40, at least 60, at least 80, at least 100, at least 120, at least 140, at least 160, at least 180, or 200 nucleotides, preferably contiguous nucleotides. In a particular embodiment, the degree/percentage of similarity or identity is given for the entire length of the reference nucleic acid sequence. The term “amino acid identity” as used herein refers to the extent that sequences are identical on an amino acid-by-amino acid basis over a window of comparison. Thus, a “percentage of sequence identity” is calculated by comparing two optimally aligned sequences over the window of comparison, determining the number of positions at which the identical amino acid residue (e.g., Ala, Pro, Ser, Thr, Gly, Val, Leu, Ile, Phe, Tyr, Trp, Lys, Arg, His, Asp, Glu, Asn, Gln, Cys and Met) occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison (i.e., the window size), and multiplying the result by 100 to yield the percentage of sequence identity. According to the present application, the degree of identity, between a given reference amino acid sequence and an amino acid sequence which is a homologue of said given amino acid sequence will preferably be at least 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%. The degree of identity is given preferably for an amino acid region which is at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90% or 100% of the entire length of the reference amino acid sequence. For example, if the reference amino acid sequence consists of 200 amino acids, the degree of identity is given preferably for at least 20, at least 40, at least 60, at least 80, at least 100, at least 120, at least 140, at least 160, at least 180, or 200 amino acids, preferably contiguous amino acids. In a particular embodiment, the degree/percentage of similarity or identity is given for the entire length of the reference amino acid sequence. “Homologue” or “homologues” of a protein encompass peptides, oligopeptides, polypeptides, proteins and enzymes having amino acid substitutions, deletions and/or insertions relative to the unmodified protein in question and having similar biological and functional activity as the unmodified protein from which they are derived. The term “defined by SEQ ID No. X” or “as depicted in SEQ ID No. X” as used herein refers to a biological sequence consisting of the sequence of amino acids or nucleotides given in the SEQ ID No. X. For instance, a protein defined in/by SEQ ID No. X consists of the amino acid sequence given in SEQ ID No. JoSc-FrRo/denovoabody/842 X. A further example is an amino acid sequence comprising SEQ ID No. X, which refers to an amino acid sequence longer than the amino acid sequence given in SEQ ID No. X but entirely comprising the amino acid sequence given in SEQ ID No. X (wherein the amino acid sequence given in SEQ ID No. X can be located N-terminally or C-terminally in the longer amino acid sequence, or can be embedded in the longer amino acid sequence), or to an amino acid sequence consisting of the amino acid sequence given in SEQ ID No. X. _{De novo antibody design is expected to drive the next generation of antibody discovery, such as} therapeutic antibody discovery, because it offers epitope selection and allows simultaneous _{multiparameter optimization. In the present invention we have developed an in silico (or computational)} _{platform for de novo antibody design, designated herein as EvolveX, based on an algorithm (ModelX)} that enables complementarity determining region (CDR) design and antibody backbone moves using the empirical force field FoldX, and a genetic algorithm that optimizes both target binding affinity and thermodynamic stability of the designed antibodies. To demonstrate the applicability and potential of our pipeline we posed and met three antibody design challenges: (1) Improve the stability of a single- domain VHH antibody fragment (nanobody) that binds to mouse Vsig4 with high affinity (mVsig4); (2) Generate anti-mVsig4 variants that can bind to human Vsig4 ortholog (hVsig4) with high affinity while retaining their improved protein stability parameters; (3) Redesign the improved anti-mVsig4 antibodies _{such as VHHs to bind to human Interleukin-9 receptor, a new target unrelated to Vsig4, at a predefined} epitope with single digit nanomolar affinity. Collectively, our approach will open the way for complete _{de novo design of stable and efficacious VHHs against diverse protein targets and predefined binding} epitopes on diverse protein antigens. _{Exemplary embodiments will be described more fully hereinafter, in which example embodiments are} described. It should be understood that such systems, computer readable media, and methods may be _{embodied in many different forms and should not be construed as limited to the example embodiments} set forth herein. Rather, these example embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the claims to those of ordinary skill in the art. _{As used herein, the term “full length native protein” refers to a protein that is in its native or natural} state and unaltered by any denaturing agent such as heat, chemical mutation or enzymatic reactions. A wild-type protein would be considered a full-length native protein. The term full-length native protein sequence, as used herein, refers to the amino acid sequence found in the full-length native protein. _{As used herein “mutation” refers to a change in the amino acid sequence of a native protein. Mutations} can be described by using the native sequence and then identifying the specific acid that have been _{changed. A “mutant” refers to the protein that contains the mutation. A full-length mutant sequence} JoSc-FrRo/denovoabody/842 refers to the full amino acid sequence of the mutant protein, instead of describing the mutant as the amino acids that are different from the native protein. _{Terms such as “first”, “second”, and “within” are used merely to distinguish one component (or part of} a component or state of a component) from another. Such terms are not meant to denote a preference or a particular orientation and are not meant to limit embodiments of the disclosure. In the following detailed description of the example embodiments, numerous specific details are set forth in order to provide a more thorough understanding of the invention. However, it will be apparent to one of ordinary _{skill in the art that embodiments of the disclosure may be practiced without these specific details. In} other instances, well-known features have not been described in detail to avoid unnecessarily complicating the description. _{“Phage display technology” is well known in the art and rests on five key ideas: (a) bacteriophages can} express heterologous peptides or polypeptides fused to their coat proteins when transduced in host bacteria; (b) given a multitude of such peptides or polypeptides, a library of recombinant phages can be created that display all these variants on a coat protein of choice; (c) such a library of phages can be screened as a population for ability to bind (recognize) a target molecule in vitro; (d) the binders can be separated from the non-binders by washing them away in a process akin to washing sand away from gold dust (panning), and (e) the isolated binders can be analyzed for the sequence of the variant encoded within its genome. Since its conceptualization (Smith G P, 1985), this technology has proven invaluable for a variety of investigations that includes antibody discovery, epitope mapping, protein interaction site mapping, enzyme substrate discovery and molecular evolution (reviewed in Burton D R, 1995; Azzazy H _{M and Highsmith W E, 2002 and Almagro JC et al (2019) Phage Display Libraries for Antibody Therapeutic} _{Discovery and Development in Antibodies 8, 44).} The present invention provides in a first embodiment a computer-implemented method to produce an antibody directed to an epitope present in a target protein comprising the following steps: a_{. selecting an epitope from a 3-dimensional model of a target protein wherein said epitope} consists of a subset of amino acids present in said target protein, b_{. generating molecular docks between all possible side chain conformations of each amino} acid present in said epitope and antibodies present in a 3-dimensional library of antibody- polypeptide ligand interactions, c_{. eliminating molecular docks which form backbone clashes with the application of a} _{ForceField algorithm,} _{d. selecting in the remaining molecular docks between 5 and 60 amino acid positions present} in all the CDR sequences of the antibodies docked to the amino acids present in the epitope JoSc-FrRo/denovoabody/842 w_{herein said amino acid positions are at less than 8 ångström from the amino acid positions} of the epitope, e_{. mutating with a ForceField algorithm each amino acid position selected in step d), one at the} time, into all natural amino acids and discarding unstable antibodies and unpreferred inter- actions with the epitope, and obtaining a subset of allowed mutations, f_{. applying a genetic algorithm for between 500 to 2000 generations to combine allowed mu-} tations from step e) with random mutated positions in said selected CDR amino acid posi- tions of step d), for each generation calculate with a ForceField algorithm the stability of the antibody, the volume of cavities in the antibody and the volume of the cavities at the inter- face between the antibody and the epitope, g_{. obtaining a set of antibody sequences for which the conditions of a favorable interaction} _{energy with the epitope are obtained, which sequences have no cavities at the interface} _{between the epitope and the antibody neither there are cavities in the antibody itself,} wherein said favorable conditions are defined by a user defined preset threshold parameter i_{n the ForceField algorithm.} _{In yet another embodiment the invention provides a computer-implemented method for producing a} set of antibody sequences directed against an epitope present in a target protein, the method comprising the steps of: (_{a) selecting an epitope from a three-dimensional atomic-resolution model of the target} protein, wherein the epitope comprises a subset of amino acids of said target protein, (_{b) generating molecular dockings between all residues present in the selected} _{epitope and antibodies derived from a three-dimensional library of antibody–} polypeptide ligand interactions; (c) eliminating molecular dockings that result in unfavorable interactions, (_{d) identifying, for each docked antibody, between 5 and 60 amino acid positions located} in the complementarity-determining regions (CDRs), wherein each identified position is located at less than 8 ångströms from an amino acid position of the selected epitope, (_{e) introducing mutations, in each of the docked antibodies, for each amino} _{acid position identified in step (d), on one amino acid at the time, in all possible natural} _{amino acids and selecting a set of allowed mutations based on a multi-parameter} criterion and said multi-parameter criterion comprises: -_{calculating the stability of the antibody using an empirical ForceField} _{algorithm and/or} JoSc-FrRo/denovoabody/842 -_{calculating the stability of interactions between epitope and antibody using} an empirical ForceField algorithm, (f) applying, over 200-20,000 generations, for each docked antibody, a hybrid computational algorithm, said hybrid algorithm combining a Monte Carlo sampling method with a genetic algorithm to combine the identified set of allowed mutations of step (e), (_{g) producing a set of antibody sequences from the resulting combined mutations where} user-defined threshold parameters for score distributions are met, said scores c_{omprise :} _{- calculating the stability of the antibody with combined mutations using an} _{empirical ForceField algorithm and/or} _{- calculating the stability of interactions between epitope and antibody with} _{accumulated mutations using an empirical ForceField algorithm and/or} _{- calculating the aggregation propensity of the antibody with combined} _{mutations of regions in unfolded polypeptide chains.} _{In yet another embodiment the invention provides a computer-implemented method for producing a} set of antibody sequences directed against an epitope present in a target protein, the method comprising the steps of: (_{a) selecting an epitope from a three-dimensional atomic-resolution model of the target} protein, wherein the epitope comprises a subset of amino acids of said target protein, (_{b) generating molecular dockings between all residues present in the selected} _{epitope and antibodies derived from a three-dimensional library of antibody–} polypeptide ligand interactions; (c) eliminating molecular dockings that result in unfavorable interactions, (_{d) identifying, for each docked antibody, between 5 and 60 amino acid positions located} in the complementarity-determining regions (CDRs), wherein each identified position is located at less than 8 ångströms from an amino acid position of the selected epitope, (_{e) introducing mutations, in each of the docked antibodies, for each amino} _{acid position identified in step (d), on one amino acid at the time, in all possible natural} _{amino acids and selecting a set of allowed mutations based on a multi-parameter} criterion and said multi-parameter criterion comprises: -_{calculating the stability of the antibody using an empirical ForceField} _{algorithm and/or} JoSc-FrRo/denovoabody/842 -_{calculating the stability of interactions between epitope and antibody using} an empirical ForceField algorithm, (f) applying, over 200-20,000 generations, for each docked antibody, a genetic algorithm to combine the identified set of allowed mutations of step (e), (_{g) producing a set of antibody sequences from the resulting combined mutations where} user-defined threshold parameters for score distributions are met, said scores c_{omprise :} _{- calculating the stability of the antibody with combined mutations using an} _{empirical ForceField algorithm and/or} _{- calculating the stability of interactions between epitope and antibody with} _{accumulated mutations using an empirical ForceField algorithm and/or} _{- calculating the aggregation propensity of the antibody with combined} _{mutations of regions in unfolded polypeptide chains.} In yet another embodiment the invention provides a method wherein between steps d) and e) as herein _{before backbone moves are generated for the CDR sequences.} In yet another embodiment in step (f) the genetic algorithm is applied between 200 to 1000, between 300 to 1000, between 400 to 1000, between 500 to 1000, between 500 to 2000, between 500 to 5000, between 600 to 1000, between 600 to 2000, between 600 to 3000, between 600 to 4000, between 600 _{to 5000, between 1000 to 5000, between 1000 to 10.000 generations.} In yet another embodiment the elimination of molecular dockings that result in unfavorable _{interactions is carried out with a ForceField algorithm.} _{The term “backbone moves” refers to the exploration of alternative backbones for the CDR regions of} the antibodies. _{The term ‘genetic algorithm’ in computer science refers to a metaheuristic inspired by the process} _{of natural selection that belongs to the larger class of evolutionary algorithms. Genetic algorithms are} _{commonly used to generate high-quality solutions to optimization and search problems by relying on} _{biologically inspired operators such as mutation, crossover, and selection. In a genetic algorithm,} _{a population of candidate solutions (here mutations in CDR sequences) to an optimization problem is} evolved toward better solutions. Each candidate solution has a set of properties which can be mutated and altered; traditionally, solutions are represented in binary as strings of 0s and 1s, but other encodings _{are also possible. The evolution usually starts from a population of randomly generated individuals (here} _{variant CDR sequences), and is an iterative process, with the population in each iteration called} JoSc-FrRo/denovoabody/842 _{a generation. In each generation, the fitness of every individual in the population is evaluated; the fitness} _{is usually the value of the objective function in the optimization problem being solved (here stability and} _{affinity). The more fit individuals are stochastically selected from the current population, and each} _{individual's genome is modified (recombined and possibly randomly mutated) to form a new generation.} _{The new generation of candidate solutions is then used in the next iteration of the algorithm. Commonly,} the algorithm terminates when either a maximum number of generations (here for example between _{200 and 20000) has been produced, or a satisfactory fitness level has been reached for the population.} An excellent overview of the design and applications of genetic algorithms is available in Katoch S et al _{(2021) Multimedia Tools and Applications 80:8091-8126).} _{The term “a Monte Carlo sampling method” or “a Monte Carlo method” or “a Monte Carlo experiment”} refers to a computational algorithm that relies on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be deterministic in principle. Monte Carlo methods are mainly used in three distinct problem classes: optimization, numerical integration, and generating draws from a probability distribution. _{The term unfavorable interactions in step (c) refers to “backbone clashes” or “steric clashes” which refers} to an unphysical overlap of newly positioned backbone atoms with other backbone atoms. In other words, backbone clashes occur when there are steric hindrances or clashes between the backbone atoms of proteins. These clashes can exist in models, but not in real proteins. Unfavorable interactions can be measured with a ForceField but a ForceField is not strictly needed. Generally the distance is _{measured between the atoms of the backbone of the antibody and the atoms of the ligand and when} the distance is smaller than the sum of the Van der Waals radii then there exists a clash. _{ForceField algorithms are well known in the art. In the context of molecular modelling, a force field is} _{a computational model that is used to describe the forces between atoms (or collections of atoms)} _{within molecules or between molecules as well as in crystals. Force fields are a variety of interatomic} _{potentials. More precisely, the force field refers to the functional form and parameter sets used to} _{calculate the potential energy of a system of the atomistic level.} One example of a forcefield algorithm is FoldX which is an empirical forcefield designed for evaluating the impact of mutations on protein stability, folding, and dynamics. It allows rapid assessments of how amino acid substitutions affect protein properties. Another example of a forcefield algorithm Rosetta. Rosetta employs a physics-based forcefield and sampling techniques to explore conformational space. JoSc-FrRo/denovoabody/842 _{In yet another embodiment the invention provides a computer-implemented method wherein an} _{antibody phage display library is constructed with the antibody sequences produced in step g) of the} _{method claim 1 as herein described before.} _{In yet another embodiment the invention provides a computer-readable storage medium which stores} computer-executable instruction that, when executed by at least one processor, cause the processor to _{perform the in silico methods as herein described before.} In yet another embodiment the invention provides an apparatus comprising control circuitry configured _{to perform the in silico methods of the invention.} _{The “in silico methods” of the invention refer to the method steps excluding the construction of the} antibody phage display library. _{In yet another embodiment the invention provides a computer-readable storage medium which stores} computer-executable instruction that, when executed by at least one processor, causes the processor _{to perform the herein described in silico methods.} In yet another embodiment the invention provides an apparatus comprising control circuitry configured _{to perform the herein described in silico methods.} Systems of the disclosure can include an intranet-based computer system that is capable of communicating with various software. A computer system includes any type of computing device or communication device. Examples of such a system can include, but are not limited to, super computers, a processor array, distributed parallel system, a desktop computer with LAN, WAN, Internet or intranet access, a laptop computer with LAN, WAN, Internet or intranet access, a smart phone, a server, a server farm, an android device (or equivalent), a tablet, smartphones, and a personal digital assistant (PDA). Further, as discussed above, such a system can have corresponding software (e.g., user software, sensor _{device software). The software of one system can be a part of, or operate separately but in conjunction} with, the software of another system. Embodiments of the disclosure include a storage repository. The storage repository can be a persistent storage device (or set of devices) that stores software and data. Examples of a storage repository can include, but are not limited to, a hard drive, flash memory, some other form of solid-state data storage, or any suitable combination thereof. The storage repository can be located on multiple physical machines, each storing all or a portion of the database, AI platform, protocols, algorithms, or other stored data according to some example embodiments. Each storage unit or device can be physically JoSc-FrRo/denovoabody/842 located in the same or in a different geographic location. In embodiments, the storage repository may be stored locally, or on cloud-based serveries such as Amazon Web Services. In one or more example embodiments, the storage repository stores one or more databases, AI Platforms, protocols, algorithms, and stored data. The protocols can include any of a number of communication protocols that are used to send, receive, or send and receive data between the processor, datastore, memory and the user. A protocol can be used for wired and/or wireless communication. Examples of a protocols can include, but are not limited to, Modbus, profibus, Ethernet, and fiberoptic. Systems of the disclosure can include a hardware processor. The processor of the computer executes software, algorithms, and firmware in accordance with one or more example embodiments. The processor can be a central processing unit, a multi-core processing chip, SoC, a multi-chip module including multiple multi-core processing chips, or other hardware processor in one or more example embodiments. The processor is known by other names, including but not limited to a computer processor, a microprocessor, and a multi-core processor. The processor can also be an array of processors. In one or more example embodiments, the processor executes software instructions stored in memory. Such software instructions can include generating machine learning models, executing machine learning models, performing analysis on data received from the database, and so forth. The memory includes one or more cache memories, main memory, or any other suitable type of memory. The memory can include volatile or non-volatile memory. The processing system can be in communication with a computerized data storage system which can be stored in the storage repository. The data storage system can include a non-relational or relational data store, such as a MySQL or other relational database. Other physical and logical database types could be used. The data store may be a database server, such as Microsoft SQL Server, Oracle, IBM DB2, SQLITE, or any other database software, relational or otherwise. The data store may store the information identifying syntactical tags and any information required to operate on syntactical tags. In some embodiments, the processing system may use object-oriented programming and may store data in objects. In these embodiments, the processing system may use an object-relational mapper (ORM) to store the data objects in a relational database. The systems and methods described herein can be implemented using any number of physical data models. In one example embodiment, an RDBMS can be used. In those embodiments, tables in the RDBMS can include columns that represent coordinates. JoSc-FrRo/denovoabody/842 The tables can have pre-defined relationships between them. The tables can also have adjuncts associated with the coordinates. In embodiments, the systems of the disclosure can include one or more I/O (input/output) devices allow a user to enter commands and information into the system, and also allow information to be presented _{to the user or other components or devices. Examples of input devices include, but are not limited to, a} keyboard, a cursor control device (such as a mouse), a microphone, a touchscreen, and a scanner. Examples of output devices include, but are not limited to, a display device (e.g., a display, a monitor, or projector), speakers, outputs to a lighting network (such as a DMX card), a printer, and a network card. For example, the input devices can be used to enter data on native proteins and mutation sequences and assays. The input devices can also enter wanted functional data for a protein. The output devices can be used to output analysis data and/or engineered protein sequences resulting from AI protein design. Various techniques are described herein in the general context of software. Generally, software includes routines, programs, objects, components, data structures, and so forth that perform particular tasks or implement particular abstract data types. An implementation of these modules and techniques can be stored on or transmitted across some form of computer readable media. Computer readable media is any available non-transitory medium or non-transitory media that is accessible by a computing device. By way of example, and not limitation, computer readable media includes computer storage media. Embodiments of the disclosure use protein feature encodings to add physical or biological knowledge to amino acid sequences to create representations amenable to machine learning. As the choice of encoding varies based on the size and diversity of the input, as well as the task, several encoding methods can be implemented, allowing users to test and select the encodings most relevant to their problem. An AI Platform can be developed which can include the following encodings, for example: one-hot, autoencoders, amino acid property encoders, learned BLOSUM/MSA evolutionary encodings, sequence mutation representation relative to WT, secondary structure / solvent accessible surface area encodings, learned AA embeddings, POOL, Phoenix, and/or structural / graph / topological encodings. The above-described embodiments of the present invention can be implemented in any of numerous _{ways. For example, the embodiments may be implemented using hardware, software or a combination} thereof. When implemented in software, the software code can be executed on any suitable processor or collection of processors, whether provided in a single computer or distributed among multiple JoSc-FrRo/denovoabody/842 computers. It should be appreciated that any component or collection of components that perform the functions described above can be generically considered as one or more controllers that control the above-discussed functions. The one or more controllers can be implemented in numerous ways, such as with dedicated hardware, or with general purpose hardware (e.g., one or more processors) that is programmed using microcode or software to perform the functions recited above. One or more processors may be interconnected by one or more networks in any suitable form, including as a local area network or a wide area network, such as an enterprise network or the Internet. Such networks may be based on any suitable technology and may operate according to any suitable protocol and may include wireless networks, wired networks, or fiber optic networks. One or more algorithms for controlling methods or processes provided herein may be embodied as a readable storage medium (or multiple readable media) (e.g., a computer memory, one or more floppy discs, compact discs (CD), optical discs, digital video disks (DVD), magnetic tapes, flash memories, circuit configurations in Field Programmable Gate Arrays or other semiconductor devices, or other tangible storage medium) encoded with one or more programs that, when executed on one or more computers or other processors, perform methods that implement the various methods or processes described herein. In some embodiments, a computer readable storage medium may retain information for a sufficient time to provide computer-executable instructions in a non-transitory form. Such a computer readable storage medium or media can be transportable, such that the program or programs stored thereon can be loaded onto one or more different computers or other processors to implement various aspects of the methods or processes described herein. As used herein, the term "computer-readable storage medium" encompasses only a computer-readable medium that can be considered to be a manufacture (e.g., article of manufacture) or a machine. Alternatively, or additionally, methods or processes described herein may be embodied as a computer readable medium other than a computer-readable storage medium, such as a propagating signal. The terms "program" or "software" are used herein in a generic sense to refer to any type of code or set of executable instructions that can be employed to program a computer or other processor to implement various aspects of the methods or processes described herein. Additionally, it should be appreciated that according to one aspect of this embodiment, one or more programs that when executed perform a method or process described herein need not reside on a single computer or processor but may be distributed in a modular fashion amongst several different computers or processors to implement various procedures or operations. JoSc-FrRo/denovoabody/842 Executable instructions may be in many forms, such as program modules, executed by one or more computers or other devices. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data _{types. Typically, the functionality of the program modules may be combined or distributed as desired in} various embodiments. Also, data structures may be stored in computer-readable media in any suitable form. Non-limiting examples of data storage include structured, unstructured, localized, distributed, short-term and/or long term storage. Non-limiting examples of protocols that can be used for communicating data include proprietary and/or industry standard protocols (e.g., HTTP, HTML, XML, JSON, SQL, web services, text, _{spreadsheets, etc., or any combination thereof). For simplicity of illustration, data structures may be} shown to have fields that are related through location in the data structure. Such relationships may likewise be achieved by assigning storage for the fields with locations in a computer-readable medium that conveys relationship between the fields. However, any suitable mechanism may be used to establish a relationship between information in fields of a data structure, including using pointers, tags, or other mechanisms that establish relationship between data elements. Discussion The advent of computational methods for antibody design is revolutionizing the field, allowing for the _{targeted creation of antibodies with specific binding capabilities. In the present invention we} _{demonstrate the efficacy of an in silico antibody design pipeline, EvolveX, in designing antibodies, such} _{as Fab’s and VHHs, targeting predefined epitopes on protein targets. Our results highlight the versatility} and robustness of the approach, as evidenced by successful sequence design efforts starting from both _{high and low-affinity complexes, as well as targeting a novel protein interface. The success of the in silico} antibody design pipeline is underscored by the validation experiments conducted for both high and low- affinity complexes of VHH_WT with mouse and human Vsig4, respectively. Through a combination of sequence and structure optimization, and biophysical experimental validation, EvolveX was able to _{design small, targeted collections yielding high-affinity VHHs. Notably, the designed VHHs consistently} exhibited significantly enhanced thermal, colloidal, and chemical stability, indicating the robustness of _{the computational design process. Furthermore, our approach extends beyond known protein} _{complexes, as demonstrated by the successful design of VHHs targeting the hIL-9R^ interface. By} employing a novel docking method, leveraging a database of hotspot residues derived from crystal structures of antibody-ligand complexes, we were able to design VHHs with high affinity for hIL-9R^. This represents a significant advancement, as it showcases the capability of our computational framework to design antibodies targeting novel protein interfaces without relying on existing JoSc-FrRo/denovoabody/842 _{experimentally determined structural data. The designed VHHs exhibited not only high affinity but also} demonstrated thermodynamic stability comparable to or better than the wild-type antibody. Interestingly, comparison of the designed VHHs targeting mouse and human Vsig4 revealed differences _{in the mutation patterns within the complementarity-determining regions (CDRs), indicating the need} for tailored design strategies depending on the target species. Additionally, our invention shows interdependence between CDR2 and CDR3 sequences, as attempts to generate VHHs with only partial CDR sequences were unsuccessful, highlighting the importance of the complete paratope in antigen recognition. _{In conclusion, the present invention shows the power of computational antibody design in generating} VHHs with desired binding properties. The combination of molecular modelling, sequence optimization, and experimental validation offers a systematic and efficient approach for antibody engineering, with _{broad implications for industrial antibody development, drug development and diagnostics.} While several embodiments of the present invention have been described and illustrated herein, those of ordinary skill in the art will readily envision a variety of other means and/or structures for performing the functions and/or obtaining the results and/or one or more of the advantages described herein, and each of such variations and/or modifications is deemed to be within the scope of the present invention. More generally, those skilled in the art will readily appreciate that all parameters, dimensions, materials, and configurations described herein are meant to be exemplary and that the actual parameters, dimensions, materials, and/or configurations will depend upon the specific application or applications for which the teachings of the present invention is/are used. Those skilled in the art will recognize or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments of the invention described herein. It is, therefore, to be understood that the foregoing embodiments are presented by way of example only and that, within the scope of the appended claims and equivalents thereto, the invention may be practiced otherwise than as specifically described and claimed. The present invention is directed to each individual feature, system, article, material, kit, and/or method described herein. In addition, any combination of two or more such features, systems, articles, materials, kits, and/or methods, if such features, systems, articles, materials, kits, and/or methods are not mutually inconsistent, is included within the scope of the present invention. As used herein in the specification and in the claims, the phrase "at least one," in reference to a list of one or more elements, should be understood to mean at least one element selected from any one or more of the elements in the list of elements, but not necessarily including at least one of each and every element specifically listed within the list of elements and not excluding any combinations of elements in the list of elements. This definition also allows that elements may optionally be present other than the JoSc-FrRo/denovoabody/842 elements specifically identified within the list of elements to which the phrase "at least one" refers, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, "at least one of A and B" (or, equivalently, "at least one of A or B," or, equivalently "at least one of A and/or B") can refer, in one embodiment, to at least one, optionally including more than one, A, with no _{B present (and optionally including elements other than B); in another embodiment, to at least one,} _{optionally including more than one, B, with no A present (and optionally including elements other than} A); in yet another embodiment, to at least one, optionally including more than one, A, and at least one, optionally including more than one, B (and optionally including other elements); etc. In the claims, as well as in the specification above, all transitional phrases such as "comprising," "including," "carrying," "having," "containing," "involving," "holding," and the like are to be understood to be open-ended, i.e., to mean including but not limited to. Only the transitional phrases "consisting of and "consisting essentially of shall be closed or semi-closed transitional phrases, respectively, as set forth in the United States Patent Office Manual of Patent Examining Procedures, Section 2111.03. Examples _{Example 1: In silico rebuilding of an antibody paratope starting from a high affinity complex.} _{Wen Y et al (2017) Immunobiology 222(6):807-813) generated a VHH, Nb119 (in the current application} _{further referred to as VHH_WT (SEQ ID NO: 1)), through classical immunization against the extracellular} IgV domain of the mouse Vsig4 protein. This VHH showed high affinity towards mouse Vsig4 (mVsig4, KD ~5 nM), but a thousand-fold lower affinity towards human Vsig4 (hVsig4, K_D ~5 mM). The structure of the antibody-antigen complex was determined by X-ray crystallography for both orthologs. We used the crystal structure of the complex of the VHH to mVsig4 (VHH_WT-mVsig4, PDBID: 5IMM and 5IMO) to validate the capacity of EvolveX to rebuild an antibody paratope starting from a high affinity complex (Figure 1A). To this end, we removed the sequence information from CDR2 and CDR3 of the VHH_WT- mVsig4 complexes since these regions are most involved in binding to mVsig4: each amino acid residue in CDR2 and CDR3 (except Proline and Glycine) that resided within 8 Å from the ligand was mutated to Alanine. From this all-alanine starting point, the compatibility of each of the selected positions with all _{20 amino acids was explored using the PSSM function in FoldX, which yields the impact of each point} mutation on both the thermodynamic stability of the VHH, as well as interaction energy with the ligand _{(using the Analyse Complex function of FoldX)(see Materials and methods for details). Based on these} numbers, random starting sequences were generated to be compatible with the complex structure. These starting sequences were used as input for a genetic algorithm, which we ran for 500 generations JoSc-FrRo/denovoabody/842 featuring a recombination rate of 10% and a single point mutation rate of 90% for each generation. Selection was based on the metropolis criterion of the interaction energy computed by FoldX, with preset ceilings for the thermodynamic stability of the VHH (see Fig.1A and Materials and Methods). The resulting sequences were filtered based on thresholds for the FoldX prediction of interaction energy and VHH stability, as well as TANGO predicted aggregation propensity (Fernandez-Escamilla AM et al _{(2004) Nat. Biotechnol.22, 1302-1306). Additionally, Z-scores were calculated based on densities derived} from publicly available crystal structures of VHHs bound to protein ligands, for cavities within the VHH, cavities at the interface of VHHs with their ligand and VHH net charge (see Materials and Methods). _{Finally, a set of 6000 sequences were obtained with the lowest summed Z-scores. These sequences were} cloned in a phage display library. After enriching the library for mVsig4 binders the mVsig4 binding was analysed using the Amplified Luminescent Proximity Homogenous Assay (AlphaScreen, Figure 1B). This _{revealed clones with AlphaScreen signal to noise ratios (S/N ratio) similar- or higher than VHH_WT. Of} these VHH variants, we determined the dissociation velocity rate constant governing dissociation from _{mVsig4 (kd) using BioLayer Interferometry (BLI) (Figure 1C), which was quite similar in all cases. Binding} affinities (K_D) for mVsig4 candidates were determined using BLI (Figure 1D and 1E) and Surface Plasmon _{Resonance (SPR) (Figure 1E), revealing that four VHH candidates have affinities close to VHH_WT (Figure} 1E). The four VHHs were subjected to chemical denaturation analysis using urea and guanidine hydrochloride. Curve fitting indicated a significant increase in stability by about -2 kcal/mol for all four designed sequences (Figure 1F) compared to VHH_WT. Thermal protein unfolding (Tm) and aggregation onset temperatures (Tagg) were determined simultaneously using Intrinsic Tryptophan Fluorescence (ITF) (Figure 1IG) and Right-Angle Light Scattering (RALS), respectively (Figure 1H). The designed sequences showed increased thermal stability, about 5 °C in both T_m and T_agg, and the relative order in stability _{coincides quite well between the chemical and thermodynamic denaturation. Although EvolveX} computationally exploited 5 positions in CDR2 and 12 in CDR3 (~1x10²² theoretical combinations), the four selected candidates were very similar to each other in sequence, differing only at two positions where we found chemically similar amino acids: a Tyr or Phe at position 107 and an Asp or a Glu at position 112, respectively (Figure 1I). Interestingly the two more stable VHH sequences VHH_2 and 4 contain a Glu at position 112, while having a Phe or a Tyr at position 107 does not seem to have an important role in stability. Compared to wild type, the designed variants carried two mutations in the CDR2 loop and 5 or 6 in the CDR3 loop. I.e. EvolveX explores the WT sequence in the N-terminal part of CDR2, which is subsequently selected during phage display. The tryptophan at position 52 in CDR2 is involved in cation-pi stacking with Arginine 39 of mVsig4, which strongly distinguishes the hit sequences from the rest of the designed library. Additionally, the predicted aggregation propensity of the hit sequence is low as calculated with TANGO. The other two variables used to filter designed sequences, JoSc-FrRo/denovoabody/842 i.e. VHH stability and interaction energy predicted by FoldX show no clear differentiation between _{selected hits and the designed library. The successful identification of the multi-parametric optimized} sequences with EvolveX demonstrates the capability of our computational method in designing VHH CDR sequence combinations when starting from a structure of a high affinity complex. _{Example 2: In silico design of optimal CDR2/3 sequences targeting human Vsig4.} _{In a next step, we set out to identify high-affinity binders starting from a low-affinity complex, starting} from structural templates obtained from the crystal structures of the low micromolar affinity complex between VHH_WT and human Vsig4 (PDB ID 5IMK and 5IML, Figure 5A). The backbone conformation of _{CDR3 was adapted using the Bridging command in ModelX (Cianferoni D. et al (2020) Bioinformatics} 36(14):4208-10), using 5IMK as a starting point, adding 14 starting complex structures (See Materials and Methods). Sequence design was carried out as described above, yielding 6000 candidate sequences, Figure 5A. The mouse VHH_m and human VHH_h sequences were combined into one phage library, which was selected against hVsig4. We randomly picked colonies from both libraries for sequencing, yielding 82 unique sequences, which we ranked for binding to hVsig4 using classical ELISA. All 56 unique sequences were tested for off-rate using BLI, Figure 5C. The 3 clones with the best off-rates (VHH_h1 _{(SEQ ID NO: 2), VHH_h2 (SEQ ID NO: 3) and VHH_h3 (SEQ ID NO: 4)), the clone with the highest ELISA} _{signal (VHH_h4, (SEQ ID NO: 5)), and the clone with the highest expression level (VHH_h5, (SEQ ID NO:} 6)) were selected for purification and subsequent biophysical characterization. Remarkably, the measured affinity of the designed sequences situated in the low nanomolar affinities, _{meaning an improvement in affinity to human Vsig4 of up to 1000-fold over VHH_WT (Figure 5D and} _{5E). All VHHs were subjected to chemical and thermal denaturation. Chemical denaturation by urea} (Figure 5F) and guanidine hydrochloride (Figure 5G) indicated the five designed sequences had an increase in stability from -1.5 up to -3.5 kcal/mol compared with the original human VHH_WT (Table 1). Both Tm and Tagg were significantly improved for all designed sequences compared to human VHH_WT (Table 1). VHH_h2, VHH_h4 and VHH_h5 exhibited the strongest binding to hVsig4 (KD_BLI and KD_SPR ~ 5 nM), which is 1000-fold better than VHH_WT (KD_BLI ~ 5 µM), Figure 5E. VHH_h1 and VHH_h3 exhibited worse K_D values (10~40nM) to hVsig4 than the first three candidates, while still being substantially better than _{VHH_WT. An overview is shown in Table 2. In all cases we found common mutations to the five} nanobodies (WN to PF, DK to NR, F to D and E to N), and VHH_h2 and VHH_h5 differ only at one position. Interestingly, when comparing the sequences analyzed for hVsig4 and mVsig4, we found some positions that were mutated (although to different residues) in both cases at CDR2 (GS) and CDR3 (D and E). In the JoSc-FrRo/denovoabody/842 case of the hVisg4 we found more positions that were mutated indicating the need of larger sequence changes to ensure high affinity binding. To elucidate the biophysical parameters responsible for the increased stability of the designed sequences over the original VHH, we studied VHH_WT and VHH_h4 (the most stable binder to hVsig4) by nuclear magnetic resonance (NMR) spectroscopy. VHH_WT features a well-dispersed heteronuclear single-quantum correlation (HSQC) spectrum, where nearly all peaks could be unambiguously assigned. Interestingly, most residues in the three CDRs showed no backbone amide resonances. The paucity of the CDR signals was reported before for several other VHHs and attributed to a pronounced flexibility and conformational heterogeneity of the CDR loops (1-3). The HSQC spectrum of VHH_h4 is similar to that of VHH_WT, but contains more peaks, some of which could be assigned to the residues in the CDR3 loop. In addition, several VHH_h4 resonances not observed in the NMR spectra of the VHH_WT (and showing very weak signals in triple-resonance NMR experiments, precluding their unambiguous assignment) could belong to other CDRs. Overall, it appears that VHH_h4 undergoes a much less extensive conformational exchange upon target binding than the WT. Predicted from the backbone chemical shifts, the secondary structures of VHH_WT and VHH_h4 are very similar and agree well with that seen in the X-ray structure, which testifies to the high structural conservation of the VHH fold outside the CDRs. _{Example 3: In silico design pipeline for the de novo design of antibodies against defined targets.} _{A much bigger challenge in antibody design is the de novo design of a single domain or full-length} antibody against a defined epitope, since this involves docking of antibody scaffold onto the antigen, remodeling the loops and search for sequences that mediate high affinity binding. A docking method based on hotspot grafting was developed to dock VHH templates onto predefined epitope sidechains overlapping with the intended interface. The docking method places a template VHH near the epitope of choice as well as ensure that at least some of the interactions between the antibody and the hotspot residue are grafted simultaneously. All crystal structures of complexes between VHHs and protein ligands are first analyzed using FoldX to identify which amino acids on the epitope are predicted to contribute most to the interaction, termed hotspot residues (Figure 3A). These hotspot residue _{sidechains are varied by modelling the energetically likely rotamers using the Rotacloud command in} FoldX (Figure 3D, 3E and 3F). The hotspot database is then used to dock template VHHs onto the amino acids at the interface of the target protein (Figure 3B and 3C). JoSc-FrRo/denovoabody/842 _{Example 4: Validation of our de novo design pipeline by targeting human Interleukin-9 receptor alpha} To validate our approach, we selected as a target the crystal structure (PDB ID 7OX5) and AlphaFold _{models (Q01113) of human Interleukin-9 receptor alpha (IL-9R^). For de novo design, an epitope was} _{selected in the stem region of hIL-9R^ proximal to the membrane domain where the receptor would be} _{expected to interact with the equivalent stem region of the Interleukin-2 receptor gamma (γcR/IL-2Rγ)} and outside of the hIL-9 binding site (Figure 3G). Antibody binding at this interface would be expected to inhibit IL-9 downstream signaling, which could be a therapeutically beneficial approach in asthma and generally atopic diseases where IL-9 is seen as one of the key players in disease severity (Figure 3H). For _{this purpose, the crystal structure of the binary hIL-9:IL-9Rα (PDB ID 7OX5) and Alphafold models of} _{human IL-9Rα structure were used.} The 1.2M docks resulting from the hot-spot residue docking approach described above were filtered based on number of interacting residues and backbone clashes, leaving 900 docks. These docks were grouped by hotspot residue and template structure, leaving 15 groups of docks. Representative docks for each group were visually inspected for viability, judging was done based on CDRs involved, VHH orientation, interaction potential and proximity to glycosylation sites (Figure 3I). Interestingly, among the groups of docks that met our criteria, 72 different docks coincidentally used VHH_WT bound to hVsig4 as a template. We then selected this VHH_WT as our starting model to close the design process showing we could redesign a nanobody against a target keeping the KD and improving physical properties, increase affinity against an ortholog of the target, and finally and more challenging modify it to recognize a completely unrelated target. Ile111 of hVsig4 in complex with VHH_WT was superimposed on different conformations of Ile208 on structural models of hIL-9R^. This group was used as a starting point for the antibody design effort. The same sequence design method as with the mouse and human Vsig4 complexes was used for these docked templates, except here all three CDRs were explored simultaneously instead of only CDR2 and CDR3. Like with the mVsig4 case, the backbone conformation of the CDRs was not adapted for the 72 starting complexes, Figure 6A. The top 12405 sequences with the lowest summed Z-scores were cloned into a phage display library. The library was selected for _{binding to hIL-9R^ using several rounds of phage display. Finally, several hits were evaluated and the} best hit VHH showed single digit nanomolar affinity to hIL-9R^, Figure 6D. BLI measurements showed _{binding affinities of the same order of magnitude for multiple hit VHHs (Figure 6E and Table 2). To test} _{whether the hit de novo designed sequences bind specifically to hIL-9R^, AlphaScreen was performed} _{for hIL-9R^, hVsig4 and mVsig4, Figure 6F. This shows all sequences that bind hIL-9R^ do not bind to} either hVsig4 or mVsig4. Additionally, VHH_WT does not show any AlphaScreen signal for hIL-9R^. Most hit VHHs showed similar unfolding temperatures as VHH_WT, Figure 6G. Sequence logos show similar JoSc-FrRo/denovoabody/842 _{patterns among hit VHHs. SEQ ID NO: 7, 8, 9, 10, 11, 12, 13 and 14 of Table 2 depict 8 different hIL-9R^} binders. Materials and methods Vsig4 preparation Mouse recombinant Vsig4 (mVsig4) used in phage selection, screening, and VHH affinity measurement was bought from Sino Biological (50187-M08H-100) and biotinylated with EZ-Link NHS-PEG4-Biotin (Thermo Scientific, A39259) according to the manufacturer instructions. The mVsig4 material used for crystallization (amino acids 20-139, with C-terminal His tags) was cloned into plasmid pVDS101 (a _{modified version of pHEN6), expressed in TG1 E. coli strain, and purified from the periplasmic extract by} a Ni-charged column followed by a size exclusion chromatography (SEC) (Superdex 7516/600 column). Human recombinant Vsig4 (hVsig4) used in phage selection, screening, and VHH affinity measurement (amino acids 20-137, with C-terminal Avitag and His tags) was cloned into pVDS101 plasmid and _{expressed in AVB101 E. coli strain (AVIDITY) to allow in vivo biotinylation, following the suggested} protocol. Then the protein was extracted from periplasm and purified by Ni-NTA chromatography. The hVsig4 protein used in crystallization (amino acids 20-137, with C-terminal His tags), was cloned into _{pLTDA104 plasmid (also a modified version of pHEN6), expressed in TG1 E. coli strain and purified} following the same procedures with mVsig4. Production of recombinant hIL-9Rα _{Production of recombinant hIL-9Rα was done as previously described by de Vos et al (2022) BioRxiv} _{2022.2012.2030.522308). In brief, hIL9Rα (NP_002177.2, residues 40-261) with a N-terminal His-tag} _{followed by a thrombin cleavage site was expressed in E. coli BL21(DE3) using pET15b-hIL-9Rα expression} _{construct. Following refolding based on an in-house protocol (2), hIL-9Rα was purified by immobilized} metal affinity chromatography (IMAC) followed by SEC. The N-terminal His-tag was removed using thrombin (Merck). Undigested protein was removed by IMAC and the cleaved protein was purified via SEC using HBS pH 7.4 buffer. _{In vitro biotinylation of recombinant hIL-9Rα} _{In vitro biotinylation of hIL-9Rα was done using EZ-LinkTM Sulfo-NHS-LC-Biotinylation Kit (Thermo} Scientific™) following manufacturers protocol. The excess biotin was removed by a desalting step using the HiPrep 26/10 Desalting column (Cytiva). The biotinylated protein was aliquoted and flashfrozen in liquid nitrogen. JoSc-FrRo/denovoabody/842 AlphaScreen AlphaScreen FLAG (M2) detection kit (PerkinElmer, 6760613M) was used for the initial screening of the mVsig4 hits and hIL9R hits. In short, in a 384-well Optiplate (PerkinElmer, 6007290), 5µL of diluted biotinylated mVsig4 or hIL9R, 5µL of diluted periplasmic extracts in a 96-well plate from different clones, and 10µL of anti-FLAG conjugated acceptor beads (50µg/ml) were incubated together for 1h at room temperature in the dark. Then 5µL of streptavidin donor beads (100µg/mL) was added and incubated in the dark for 25-30 minutes. The signals were read (excitation 680nm; emission: 520-620nm) using the EnVision multilabel plate readers (PerkinElmer). ELISA screen Human Vsig4 hits were first screened in an ELISA assay. In brief, 100µL of biotinylated hVsig4 (2µg/mL) was coated in the ELISA plate in advance. The plates were blocked with PBS/1% BSA for 2h at room temperature. Subsequently, 100µL of periplasmic extracts were added per well, and incubated for 1.5h at room temperature. Then 100µL/well of anti-FLAG-HRP antibody (Sigma-Aldrich A8592) (1:2500 dilution) was added and incubated for 1 more hour. At last, TMB solution (50µL/well) was used to detect and after 1-5 minutes, 50µL/well of H₂SO₄ was added to stop the reaction. The plates were then read at 450nm of wavelength in a plate reader. Biolayer interferometry (BLI) off-rate assay The clones with unique sequences and significant AlphaScreen or ELISA signal (signal/noise, S/N > 3) were further selected, re-arrayed in a 96-well plate, and screened by a biolayer interferometry (BLI) off- rate assay on Octet RED96 (Sartorius). Ni-NTA biosensors (Sartorius, 18-5101) were first used to assess the expression of different clones. After equilibration of the biosensors for at least 10 minutes in 1x Sartorius/Octet kinetic buffer and an extra 120 seconds for the baseline, the biosensors were dipped into 10x diluted periplasmatic extracts for 300 seconds, followed by 300 seconds of dissociation in 1x kinetic buffer. Afterward, streptavidin (SA) biosensors (Sartorius, 18-5019) were used to measure the off rate of different clones. With the same pre-wet procedure and the baseline setting, biotinylated Vsig4 or IL-9R^material was first loaded on the biosensors, after which, the tips were sequentially submerged in baseline wells, association wells with diluted periplasm samples (300s), and finally into the dissociation wells (300s). Data was processed on the Octet RED analysis software. VHH expression and purification VHH candidates used for initial affinity measurement were in the vector pLTDA118 (a modified pHEN6 vector with C-terminal 3xFlag/6xHis tag) and expressed in the TG1 cells following the standard protocol JoSc-FrRo/denovoabody/842 _{(3). The purification of these VHHs was done by either the Ni-NTA chromatography or the AmMag™ SA} Plus System (Genscript, L01013) with Ni magnetic beads (Genscript, L00776). VHHs used in stability study and crystallization were cloned into the vector pLTDA104 (a modified pHEN6 vector with C-terminal 6xHis tag) and expressed in the TG1 cells following the same procedure. These VHHs were then purified by a Ni-charged column followed by size exclusion chromatography (SEC). Biolayer interferometry (BLI) affinity measurement Purified VHHs were used to determine the affinity with mVsig4 and hVsig4 on Octet RED96 (Sartorius). Briefly, streptavidin (SA) biosensors (Sartorius, 18-5019) were first equilibrated for at least 10 minutes in 1x Sartorius/Octet kinetic buffer. Subsequently, the tips were dipped into the biotinylated Vsig4 (3- 5µg/mL) target for 30-60 seconds. Afterward, the tips were sequentially submerged in the 1x kinetic buffer for the baseline (120s), VHH dilution samples for the association (300s), and in the end back to the 1x kinetic buffer for the dissociation (300s). Fitting and binding kinetics determination was performed with a 1:1 model interaction on the Octet RED analysis software. Surface plasmon resonance (SPR) affinity measurement Binding kinetics and affinity of various VHHs for the human and murine Vsig4 were evaluated by surface plasmon resonance on a BIAcore T200 instrument (Cytiva) with a running buffer composed of 10mM HEPES, 150mM NaCl & 0.005% Tween 20. The assay format involved ligand capture on a Sensor S Sensor Chip SA chip. Briefly, the biotinylated ligands were immobilized by non-covalent capture (binding to streptavidin) following the instructions provided with the Cytiva chip. mVSIG4 was captured on flow cell 2, and hVSIG4 was captured on flow cell 3, leaving flow cell 1 as a subtractive reference. The capture level of the mVsig4 was targeted between 100 and 200 resonance units, and the hVsig4 was targeted between 1200 and 1300 resonance units. A serial dilution of the VHHs was flowed over the immobilized Vsig4 (50μL/min for 1 minute) and allowed to dissociate for 4 minutes. The capture surface was regenerated with a 60-s injection of 3 M MgCl2 (50 μL/min for 1 min). A 3-fold concentration dilution series of each VHH variant ranging was used to analyze binding to mVsig4. For the VHH_WT, VHH_m1- VHH_m4 the concentration series ranged from 0.25nM to 20nM. For the VHH_h1-VHH_h5 the concentration dilution series ranged from 6.2nM to 500nM. A 3-fold concentration dilution series of each VHH variant ranging was used to analyze binding to hVsig4. For the WT and VHH_m1, the concentration dilution series ranged from 3.7nM to 300nM. For VHH_h1-VHH_h5 and VHH_m4 the concentration dilution series ranged from 0.25nM 20nM. For the VHH_m2 and VHH_m3, the concentration dilution series ranged from 2.1nM to 167nM. All sensograms were analyzed using a 1:1 Langmuir binding model with software supplied by the manufacturer to calculate the kinetics and binding constants. JoSc-FrRo/denovoabody/842 Thermal stability measurement The temperature at which 1.0 mg/mL VHHs unfold (T_m, unfolding temperature) and aggregate (T_agg, aggregation onset temperature) using an UNcle device (Unchained Labs, CA, USA). More specifically, the Tm and Tagg were determined simultaneously via intrinsic fluorescence (ITF) and static light scattering (SLS) measurements, respectively, during gradual heating from 15 to 95°C at 0.3°C/min. The excitation wavelength of 280 nm was used for both measurements, with the emission wavelength ranging from 250 to 720 nm for the intrinsic fluorescence and corresponding to 266 nm for the SLS. The intrinsic fluorescence values between 320 and 370 were used to calculate the barycentric mean values (BCMs), resulting in a smooth curve. Curve fitting with R-studio was used to calculate the Tm value corresponding to the inflection point of the sigmoid curve. The Tagg value was determined using manually set thresholds, to allow for proper comparison between variants, ignoring baseline noise. All samples were measured at least in triplicate. Chemical stability measurement Chemical denaturation was performed using both Urea (0-9.2M) and Guanidine Hydrochloride (GdnHCl, 0-5.5M) using a gradient of 24 steps. Pipetting the different steps was done using a robot (Opentrons OT-2) from a protein stock solution of 1 mg/mL in PBS with a final concentration of 0.04 mg/mL and freshly prepared denaturant stock solution of 10M Urea or 6M GdnHCl. All stock solutions were filtered using 0.22 ^M filter. After pipetting, plates were spun down in a centrifuge at 1000g for 2 minutes. The plates were incubated for 2 hours at 25^oC with a lid to avoid evaporation of the sample. Samples were measured using the standard protocol on the SUPR-CM (ProteinStable). Sequence Design: Template generation. The original PDB structures of the wildtype VHH were downloaded for the PDB. Two low affinity complexes of wildtype VHH bound to human Vsig4 (hVsig4; 5imk and 5iml) and two high affinity complexes of wildtype VHH bound to mouse Vsig4 (mVsig4; 5imm and 5imo). Sidechains were subsequently optimized using the FoldX command RepairPDB. Backbones were optimized using the ModelX command Vibrate. It generates 1 by 1 Degree rotations in a range for -20 to 20 Degrees around an imaginary axis along N and C for every amino acid, approximating phi, psi torsional rotations but reducing the number of rotations to one angle. The human bound structure 5imk was the starting point _{to explore alternative backbones for CDR3 using the Bridging command in ModelX. ModelX allows to} digest customized peptide fragments databases adapted to the different modeling requirements. The peptide fragments were used to graft CDR3 loop conformations generating conformational variability, _{in our case we created in situ a database using 120000 X-ray structures with resolution <= 2.5 Å.18544} JoSc-FrRo/denovoabody/842 backbone moves were generated and selected using these criteria, first they were filtered out those having backbone-backbone clashes and then selected those having a number of contacts bigger than 36, the remaining 85 models that passed the filters were ranked according to their FoldX interaction energy and 14 were selected, resulting in 14 structures with slight movements in CDR3. The 22 structures were _{subjected to de novo sequence design of CDR2 and CDR3.} Sequence Design: Explore positions. _{Each amino acid part of CD2 or CD3 (according to Chothia numbering scheme (Chothia C and Lesk AM} _{(1987) J Mol Biol 196, 901-917) and within 8 Ångström from the ligand was explored separately for each} possible amino acid using BuildModel, after which the effect of each mutation on stability and interaction energy was calculates using the FoldX commands BuildModel and AnalyzeComplex, respectively. Sequence Design: Reduce search space. To reduce the sequence space, the positions were filtered. Positions were removed where only the wildtype did not result in major destabilization of the VHH (ddG_binder < 1). Additionally, positions where the wildtype amino acid was best for VHH stability (ddG_binder rank = 1) and where mutations had little effect on VHH stability (ddG_binder variation < 0.1). Positions where the wildtype amino acid is Proline were not considered either. Mutations that destabilize the VHH (ddG_binder > 1 kcal/mol) or reduce the interactions energy (ddG_AC > 1kcal/mol) were removed from the search space. Sequence design: Genetic algorithm. _{Each start point is evolved through a process based on a genetic algorithm (Katoch S et al (2020)} _{Multimedia Tools and Applications 80:8091-8126). The 50 start points (threads) for each template are} used as input for the genetic algorithm. For each generation 10% of the 50 are recombined with each other and 90% is subjected to a point mutation. Recombination is done by randomizing the identities of each parent for each explored position and splitting the resulting list of mutations at a random point. Point mutations are selected for a random position, while selecting a new amino acid for that position _{semi-randomly according to a frequency table based on amino acid distributions of paratopes (Akbar R} _{et al (2021) Cell Rep.34, 108856).} The effect of the mutation event is calculated using FoldX commands BuildModel, AnalyzeComplex, _{Stability, TANGO (Fernandez-Escamilla AM et al (2004) Nat. Biotechnol. 22, 1302). Thresholds are set} manually for VHH stability (dG_binder) and VHH intraclashes (IntraClash_binder). A mutation in rejected if the difference with the parent is positive and the value of the mutant is above the manually set threshold. When these criteria are met, the main driver is interaction energy (ddG_AC < 0 kcal/mol) JoSc-FrRo/denovoabody/842 which allows the mutant to be accepted and the metropolis criterion is used when the interaction energy is decreased (ddG_AC >= 0 kcal/mol). If the mutant is accepted it is used as input for the next generation, _{if it is rejected the parent is used as input. EvolveX is run for 500 generations every time. Once for the 8} _{wildtype templates, both original and “vibrated” with ModelX shake command´to generate slight} _{backbone moves. Twice for the 14 backbone moved templates based on 5imk, in total about 500K} _{mutations were accepted.} Sequence selection R scripting was used to select from the 500K accepted sequences. Only sequences are considered that have a satisfactory interaction energy with the ligand (dG_AC < -17.5 kcal/mol), good VHH stability (dG_binder < -5 kcal/mol), low predicted aggregation propensity (total TANGO score < 450). For the remaining 58213 models, VHH intracavities (Cav_binder), interface cavities (Cav_Int) and VHH net charge _{(ChAb) were calculated using YASARA (Krieger E and Vriend G (2014) Bioinformatics Vol.30; 20, pp.2981-} 2982). Z-scores were calculated for all three variables based on their distributions in known VHH complexes in the PDB. Models with Z-scores above 2 for any of the three variables were excluded, leaving 38204 models. The three Z-scores were summed. The 6119 models with the lowest summed Z-score and designed to bind mouse Vsig4 were selected to be included in the library (AffinityMouse), 203 with the highest summed Z-score were selected as negative control (NegativeControlMouse). The 6104 models with the lowest summed Z-score and designed to bind human Vsig4 were selected to be included in the final library (AffinityHuman), 207 with the highest summed Z-score were selected as negative control (NegativeControlHuman). The unique sequences from the pooled models resulted in a library of 12405 in total, consisting of 3200 unique CDR2 sequences and 9884 unique CDR3 sequences. Ligand hotspot database generation All single domain antibodies bound to a protein ligand were extracted from the SabDab database. Each complex was repaired using FoldX command RepairPDB. Subsequently the energy of each amino acid in the ligand was calculated using FoldX command SequenceDetail, with and without the antibody present, yielding a contribution of each amino acid on the protein ligand (epitope) to the interaction with the antibody. A database was made with all ligand amino acid that contributed <-1.5 kcal/mol to the interaction energy. _{De novo design of IL-9R^ VHHs: Docking template generation} The crystal structure of the IL-9Ra/IL-9 complex was used as structural ligand template, consisting of 8 IL-9R models. Additionally, AlphaFold2 was used to generate structural ligand templates, both for the monomeric IL-9Ra and heterodimer of IL-9Ra/IL-2Rγ, each yielding 5 relaxed and unrelated models. JoSc-FrRo/denovoabody/842 _{De novo design of IL-9R^ VHHs: Ligand hotspot-based docking} 11 potential hotspot epitope amino acids on IL-9Ra were selected at the predicted interaction site with IL-2Rγ: Ala211, Arg 198, Glu210, Glu213, Ile201, Ile208, Leu207, Phe212, Thr177, Trp206, Val204. The hotspot database was used to superpose each hotspot ligand amino acid sidechain on the different conformations of the target epitope residue sidechains using YASARA, resulting in 1.2M docks. FoldX _{command AnalyseComplex was used to calculate the interaction energy variables of each dock. Docks} were removed with the following criteria: Interface.Residues < 20, Interface.Residues.BB.Clashing > 0. The remaining 80K docks were repaired at the interface with FoldX command RepairPDB. FoldX _{command AnalyseComplex was used to calculate the interaction energy variables for all remaining docks.} Docks were removed using the following criteria: Interface.Residues.VdW.Clashing > 1, Interface.Residues < 26, Van.der.Waals.clashes > 1, frequency of rotamer found < 20. The remaining 900 docks consisted of 15 different grafted template antibodies in different conformations on different template IL-9Ra models.72 of which used 5IML and 5IMK, which was coincidentally the VHH we used for the hVsig4 and mVsig4 campaigns. _{De novo design of IL-9R^ VHHs: Sequence design and selection} Sequence design and selection was identical to mVsig4 and hVsig4 design. Except that the final sequences were filtered based on RMSDs of all CDRs between predicted by NanobodyBuilder2 or IgFold and the template VHH used in the genetic algorithm. Table 1 KD BLI KD SPR m_{e CDR2 CDR3} (n _{Tm (oC) Tagg (o} C_m (M) C_m (M) _Na M) (nM) _C) (Urea) (GdnHCl) (hVsig4) (hVsig4) _{VHH_WT AIRWNGGSTY GRWDKYGSSFQDEYDY 3190 12800 79.3 81.7 7 3.0} _{VHH_h1 ..GPF.YE.. ...NR..FED..NF.. 14.2 5.7 86.2 86.1 ✰ 3.5} _{VHH_h2 ..SPF..N.. ...NR..FED..NF.. 4.5 4.6 89.6 89.5 ✰ 3.3} _{VHH_h3 ..GPF..T.. ...NR..EQD.YN.E. 37.6 13.5 91.0 90.2 ✰ 3.3} _{VHH_h4 ..GPF..H.. ...NR..FYD..N... 5.6 4.8 90.0 88.7 ✰ 3.6} _{VHH_h5 ..GPF..... ...NR..FED..NF.. 4.1 2.8 85.8 85.2 ✰ 3.3} _{(Note: CDR definition: AbM, ✰ means it was not denatured under the most extreme conditions)} JoSc-FrRo/denovoabody/842 Table 2

Claims

JoSc-FrRo/denovoabody/842 Claims _{1. A computer-implemented method for producing a set of antibody sequences directed against an} epitope present in a target protein, the method comprising the steps of: (_{a) selecting an epitope from a three-dimensional atomic-resolution model of the target} protein, wherein the epitope comprises a subset of amino acids of said target protein, (_{b) generating molecular dockings between all residues present in the selected} _{epitope and antibodies derived from a three-dimensional library of antibody–} polypeptide ligand interactions; (c) eliminating molecular dockings that result in unfavorable interactions, (_{d) identifying, for each docked antibody, between 5 and 60 amino acid positions located} in the complementarity-determining regions (CDRs), wherein each identified position is located at less than 8 ångströms from an amino acid position of the selected epitope, (_{e) introducing mutations, in each of the docked antibodies, for each amino} _{acid position identified in step (d), on one amino acid at the time, in all possible natural} _{amino acids and selecting a set of allowed mutations based on a multi-parameter} criterion and said multi-parameter criterion comprises: -_{calculating the stability of the antibody using an empirical ForceField} _{algorithm and/or} _{- calculating the stability of interactions between epitope and antibody using} an empirical ForceField algorithm, (f) applying, over 200-20,000 generations, for each docked antibody, a genetic algorithm to combine the identified set of allowed mutations of step (e), (_{g) producing a set of antibody sequences from the resulting combined mutations where} user-defined threshold parameters for score distributions are met, said scores c_{omprise :} _{- calculating the stability of the antibody with combined mutations using an} _{empirical ForceField algorithm and/or} _{- calculating the stability of interactions between epitope and antibody with} _{accumulated mutations using an empirical ForceField algorithm and/or} _{- calculating the aggregation propensity of the antibody with combined} _{mutations of regions in unfolded polypeptide chains.} _{2. A computer-implemented method according to claim 1 wherein between steps d) and e) alternative} backbones are explored for the CDR sequences present in the antibodies. JoSc-FrRo/denovoabody/842 _{3. A computer-implemented method according to claims 1 or 2 wherein in step (f) a hybrid} _{computational algorithm is applied, said hybrid algorithm combining a Monte Carlo sampling} _{method with a genetic algorithm to combine the identified set of allowed mutations of step (e) as} specified in claim 1. _{4. A computer-implemented method according to claims 1, 2 or 3 wherein an antibody phage display} library is constructed with the antibody sequences produced in step g). _{5. A computer-readable storage medium which stores computer-executable instruction that, when} _{executed by at least one processor, cause the processor to perform the method of claims 1, 2 or 3.} _{6. An apparatus comprising control circuitry configured to perform the method of claims 1, 2 or 3.}