FI131721B1

FI131721B1 - A structural protein, a medical product, an electrospun filament, photonic crystals, metamaterial, thermoresponsive glass and a method for preparing a product

Info

Publication number: FI131721B1
Application number: FI20236041A
Authority: FI
Inventors: Pezhman Mohammadi; Caj Södergård; Timo Laakko; Anssi Laukkanen; Merja Penttilä
Original assignee: Teknologian Tutkimuskeskus Vtt Oy
Priority date: 2023-09-20
Filing date: 2023-09-20
Publication date: 2025-10-16
Also published as: FI20236041A1; WO2025062064A1

Description

A structural protein, a medical product, an electrospun filament, photonic crystals, metamaterial, thermoresponsive glass and a method for preparing a product

Field of the application

The present application relates to elastin-like structural proteins and to products comprising the elastin-like structural proteins. The present application relates to methods for preparing the products.

Background

Taking inspiration from high-performance protein-based structural biomaterials such as silk, elastin, and resilin has led to the development of advanced functional — materials. The key aspect in biomimetic materials engineering of such structural proteins has been mainly related to the determination as well as their use of the primary sequences. This is often followed by randomized or rational modifications of the sequences toward desired functionalities. While the sequence information provides the foundation for our understanding of the structure-function relationships, the structural information remains elusive and lags significantly behind. This is because often the constituent protein building blocks exhibit partially disordered tendencies with tandem repeating motifs of structured and unstructured domains.

Typically, such proteins contain an ensemble of dynamically reconfigurable conformations with the lack of defined secondary structures, and it has been

O proven to be challenging to probe both in silico and in vitro. In contrast, the

N structured regions show a-helical structure with conformational switchable property

O (a-helical — B-sheet), crucially important during the material processing steps and © 30 the ultimate biophysical properties. Considering that these unique a-helical motifs

I are in a metastable state, their production, storage, and processing are strictly - controlled under evolutionary optimized physiochemical perturbance that prevents 3 premature conformational conversion. This underlines the notion that the 2 functionality of structural proteins transcends beyond the linear seguence and

I 35 static structure, but is intricately intertwined with a myriad of highly optimized

L processing conditions, as well as the dynamic and complex interplay of environmental factors that govern diverse aspects of protein behaviour, including folding, molecular self-assembly, phase behaviour, and mechanical properties,

which collectively shape the functional performance and adaptability of structural proteins.

Given, that it is significantly challenging to mimic complex physio-ecological conditions under laboratory setup, often such structural proteins are found to be significantly unstable and exhibit inferior properties to their biological counterparts.

WO2019/006374A1 discloses partially ordered polypeptides, which include a plurality of disordered domains and a plurality of structured domains. The disordered domain may comprise an amino acid sequence of [VPGXG]m, wherein

X is Val, or Ala, or mixture of Ala and Val, and wherein m is an integer from 1 to 50.

WO2021/163445 discloses compositions and methods that can provide complex protein-based structures that can be used in biomedical applications. The composition comprises: a disordered polypeptide having a transition temperature (Tt) and comprising an amino acid sequence of [VPGX'G]m, wherein X! is any amino acid except proline and m is 10 to 500; and a partially ordered polypeptide (POP) having a transition temperature of heating (Ttheating) and a transition temperature of cooling (Ttcooling), and comprising a plurality of disordered domains, wherein each disordered domain includes an amino acid sequence of [VPGX?G]n wherein X? is any amino acid except proline and n is 1 to 200, and a plurality of structured domains, wherein each structured domain includes a polyalanine domain, the polyalanine domain comprising at least 5 alanine residues and having at least about 50% of the amino acids in an alpha-helical conformation, wherein the disordered polypeptide's Tt is at least + 1°C compared to the POP's Tt-heating. &

N Modular Self-Assembling Peptide Platform with a Tunable Thermoresponsiveness

O via a Single Amino Acid Substitution, Jeong et al., Adv. Funct. Mater. 2018, 28, © 30 1803114 discloses elastin-like peptide amphiphiles (ELPAs) that exhibit a

I temperature-responsiveness that can be easily tuned via a single N-terminal - amino acid substitution at the final step of peptide synthesis. 3 2 Self-Assembly of Stimuli-Responsive Biohybrid Synthetic-b-Recombinant Block

I 35 Copolypeptides, Le Fer et al, Biomacromolecules 2019, 20, 254—272, discloses

Tn synthesis and original thermoresponsive behavior of hybrid diblock copolypeptides composed of synthetic and recombinant polypeptides.

Random and oriented electrospun fibers based on a multicomponent, in situ clickable elastin-like recombinamer system for dermal tissue engineering,

Gonzales de Torre et al., Acta Biomaterialia 72 (2018) 137-149, discloses a system to obtain fibers from clickable elastin-like recombinamers (ELRs) that crosslink in situ during the electrospinning process itself. with no need for any further treatment to stabilize them.

Rapid micropatterning by temperature-triggered reversible gelation of a recombinant smart elastin-like tetrablock-copolymer, Martin et a/., Soft Matter, 2010, 6, 1121-1124, discloses a simple, fast, water-based method to obtain micropatterned biocompatible gels from a recently described family of elastin-like amphiphilic multiblock copolymers that combines reversible thermogelling properties under mild, physiological conditions with a means of replica molding.

CN106220793 discloses a super compression electrical conductivity and magnetism response gel robot which comprises a robot body and three to eight side arms connected to the robot body, the side arm material comprises an elastin compound crystal glue-ferric oxide magnetic nano-particle composite material which comprises elastin compound crystal glue and ferric oxide magnetic nano particles loaded on the elastin compound crystal glue, and the elastin compound crystal glue comprises elastin, gelatin and carbon nano tube. Besides, the invention further relates to a preparation method and application of the super compression electrical conductivity and magnetism response gel robot.

There is need to find new stable structural proteins and stable structural protein- based materials with new or enhanced properties. There is also a need to find new

O products exhibiting new or enhanced properties and methods for preparing such

N products. 2 © 30 Summary = > The present approach could overcome challenges of prior art. It is noteworthy that 3 a relatively small set of innovative protein sequences emerged from a vast pool of 2 initial short helical seguences. These protein seguences could be effectively

N 35 expressed in expression systems, yielding stable structural proteins with novel

LL properties. This intricate process demanded the successful selection, development, and training of a sophisticated algorithm-based design system, just to identify potential candidate sequences.

A de novo design assisted through machine learning approach and advanced computational modelling was utilized. Having the capability to de novo design arbitrary sequences and simultaneously predict their structures using physical principles that govern molecular grammar of protein folding in a high throughput fashion provides an unimaginable source of data-driven inquiry and scientific analysis. In this new paradigm, sequence, as well as structural designs, could be guided from the bottom-up toward specific function by taking advantage of the entire protein sequence space to avoid instability constraints for most recombinantly produced structural proteins in vitro while maintaining the motif's functionality. To further overcome current limitations, a hybrid biomimetic - de novo design could be used. With this approach, a property wish list could be prepared to guide the de novo design.

Accordingly, herein a substantially fast and reliable targeted design workflow to predict a series of arbitrary a-helical conformation for templating new materials inspired by tropoelastin’s poly-alanine domain is outlined. De novo designed a- helices were introduced as the guest motifs with maintained conformational state and functionality however with greater molecular stability than their biological counterparts for diverse technical applications. This was carried out by integrating multiscale computational modeling (MCM) and CNN (Convolutional Neural

Network) deep neural network models (Fig. 1). The best performing de novo designs in terms of structural stability were encoded into extensively studied intrinsically disordered pentapeptide repeating motifs of tropoelastin. The new protein variants were produced recombinantly. Detailed experimental characterization revealed that all the incorporated de novo helices maintained their

O structural conformation as predicted. The elastin-like polypeptide (ELP) variants

N showed tuneable supramolecular self-assembly and phase behaviour evaluated

O both in silico, as well as in vitro. © 30

I Disclosed is also as a proof-of-concept the use of some of the variants in discrete - biocompatible and biodegradable functional structural materials. This includes i) 3 programmable thermoresponsive injectable matrices, ii) multiscale drug- 2 encapsulating vehicles with controlled release, iii) multifunction wound coverage

N 35 apparatus, iii) all-aqueous-based biobased photoresists for printing 2D/3D

LL microscopic architectures such as photonic crystals and soft protein-based micro- robotics, and iv) transmittance modulator for smart windows for solar light and heat regulating.

More particularly, a new class of partially disordered structural proteins, i.e. elastin-like polypeptides (ELP) were found through hybrid biomimetic - de novo design. This was mediated through the integration of computational modelling, 5 deep neural network, and recombinant DNA technology. This generalizable approach involves incorporating a series of de novo-designed sequences with a- helical conformation and genetically encoding them into biologically inspired intrinsically disordered repeating motifs. Disclosed is the effective translation of the predicted molecular designs in discrete structural and functional materials.

The present disclosure provides a structural protein comprising 2 or more repeating amino acid sequence units consisting of -a motif (VPGVG)n, wherein n is 2 or more, and -an alpha helical amino acid sequence selected from SEQ ID NOs:6—10 and/or an alpha helical amino acid sequence having at least 90% sequence identity with an amino acid sequence selected from SEQ ID NOs:6-10.

The present disclosure also provides a medical product comprising a pharmaceutical compound and one or more of the structural proteins.

The present disclosure also provides particles comprising one or more of the structural proteins and having an average diameter in the range of 20 nm — 3 um.

The present disclosure also provides an electrospun filament comprising one or more of the structural proteins.

O The present disclosure also provides a protein-based micro-robot comprising

N magnetic particles and one or more of the structural proteins.

O

I comprising one or more of the structural proteins. a 3 The present disclosure also provides photonic crystals comprising one or more of 2 the structural proteins.

I 35

L The present disclosure also provides metamaterial comprising one or more of the structural proteins.

The present disclosure also provides a thermoresponsive glass comprising a layer of material comprising reversible anisotropic interconnected mesoglobular network of one or more of the structural proteins arranged to undergo phase separation by the effect of temperature to obtain a change in transmittance.

The present disclosure also provides a method for preparing a product, the method comprising -providing one or more of the structural proteins in a dispersion or a solution, -forming the dispersion or the solution into a product comprising one or more of — the structural proteins.

The main embodiments are characterized in the independent claims. Various embodiments are disclosed in the dependent claims. The embodiments and examples recited in the claims and the specification are mutually freely combinable unless otherwise explicitly stated.

The present structural proteins show high molecular stability and exhibit properties, which enable obtaining a variety of products with different functional properties.

The present structural proteins can be used in varieties of high-added-value biomedical applications including thermoresponsive injectable matrices, drug encapsulation with controlled drug delivery and/or release, for example for use in inhalers; all-agueous-based photoresists, and in smart windows.

Product comprising the present structural proteins can be prepared with a variety

O of methods, such as spherical, filamentous and/or porous products. Preparation

N methods may include aerosol forming, electrospinning, printing, photolithography,

O additive manufacturing and the like. © 30

I The present structural proteins can be also used in applications relating to cell - adhesion, growth, and proliferation. The polypeptides exhibited no cytotoxicity 3 which makes the suitable for use in biomedical applications.

O

&

N 35 The present structural proteins can be used in tissue engineering by supporting

LL cell growth and regenerative cell migration, which enables providing safe and effective materials and compounds, such as for use in therapeutic applications.

The present structural proteins can be used for making structurally stable scaffolds with mechanical strength and flexibility, such as wound healing products.

The present structural proteins can be used in medical applications, such as applications involving bioactive agents, such as active pharmaceutical ingredients.

Formulations providing controlled release, administration and/or delivery of the bioactive agents can be provided.

The present structural proteins can be used in applications utilizing — electrospinning, for example as a spinning dope. These may produce smooth micro and/or nanofilaments, or filaments with higher structural complexity.

The present structural proteins can be used as biocompatible and biodegradable photoresist in photonic, electronic, tissue engineering, and soft micro-robotics applications. The structural proteins can be incorporated in 3D printing ink and products can be formed by additive manufacturing.

For example the present structural proteins can be used in microfabrication of magnetically controlled soft protein-based biodegradable micro-robots, which may be used for targeted delivery and diagnostics.

The present structural proteins can be used as a self-activating compound in smart windows with the ability to self-modulate the amount of solar radiation passing through the windows without the need for human intervention.

Brief description of the figures &

N Figure 1 shows an overview of the Al-empowered material scientist (AIMS)

O protocol for generating hybrid biomimetic - de novo designed elastin-like © 30 polypeptides (ELP) and their use in protein-based-material engineering. The

I flowchart includes a-helix design objectives (a) and suggestions from the expert in > the loop (b). (c) AIMS-GATER carried out data mining to identify homologous 3 templates based on provided sequence or structural information based on elastin's 2 a-helical motifs. The hydrogen bond estimation algorithm (DSSP) is used to

I 35 compute the relevant structural confirmation if there were no secondary structure

L homologs for the suggested seguences. From all the obtained models, conformationally sensitive dihedral geometrical angles (9 and O angle) were calculated. (d) As an input, the corresponding 2D inter-residue angular torsion was fed into the deep neural network as the training set, until the hidden correlation between the sequence, dihedral angles, and labeled property scales became visible that guides the subsequent step of the generation of completely new helices followed by predicting their dihedral angles. Structural stability for all the newly predicted helices was assessed using atomistic molecular dynamic simulation in a two-step procedure of energy minimization and relaxation. (e) Best performing AIMS predicted helices (AI.PHn) were selected by the expert in the loop based on the desired properties and the required functionalities. Selected helices were incorporated with an unstructured region of an elastin-like polypeptide (UnsELP) to make full-length hybrid proteins (see Methods and Table 1). Different in silico and in vitro tests were performed to validate the supramolecular self-assembly, structure, kinetics, and mechanics of the hybrid proteins. (f) The use of UnsELP- AI.PHn for a broad range of material applications.

Figure2 = shows screening and selection for the best performing AIMS predicted helices (AI.PHn) based on desired physicochemical properties and conformational stability. (a) The panels show 178 AI.PHn candidates with lengths ranging from 20-40 amino acids long and their labeled properties. Only the predicted candidates with a helix propensity score above 0.7 are shown and used in this study. (b) The plot shows the AIMS prediction score for all the AI.PHn in the pool with or without the first and last residues at the N and C-terminus that are prone to structural fluctuation, thus lowering the overall prediction accuracy. Only the

AI.PHn with a prediction score of above 0.8 were selected for this study. (c) To identify any structural unstable within the pool, all the AI. PHn were subjected to atomistic MD simulation. Only the candidates with an average RMSD value below 3 A were selected for further analysis after an initial 20 ns followed by an extended

O 100 ns simulation time. (d) Representative comparison between two cases with

N the RMSD value above and below 3 Ä. The panel also shows the overlaying

O conformations at various time points during MD simulation and 2D RMSD plots for © 30 each case. (e) The atomistic structure of the 10 best-producing variants after 100

I ns MD simulation and their corresponding RMSD profile. a 3 Figure 3 — shows all the predicted de novo designs maintained their helical 2 conformation under experimental conditions after incorporation with the

N 35 unstructured elastin-like polypeptide to make full-length hybrid proteins. (a) The - predicted atomistic model of ten best-producing intact UnsELP-AI.PHn through the use of Alphafold. The orange colour corresponds to AI.PHn, whereas the colour blue corresponds to UnsELP (b) Experimental circular dichroism (CD) spectrum for the alanine-rich containing AI.PHn variants. The panel also demonstrates predicted CD signals for the corresponding models by averaging the last 100 trajectories of a 100 ns MD simulation. (c) Experimental and predicted CD signals for the alanine-less containing AI.PHn variants. (d) Calculated secondary structure from experimental CD spectrum for all the corresponding samples in b and c. The percentage contribution from each fold is labelled and color-coded accordingly. (e)

Attenuated total reflection - Fourier transform infrared spectroscopy (ATR-FTIR) spectrum and peak deconvolution identifying corresponding secondary structures of UnsELP-AI.PH45 in solution. (f) 1D and 2D small angle x-ray scattering (SAXS) — signal from the UnsELP-AI.PH45. The experimental data (blue empty circles) was fitted with simulated SAXS data using the ATSAS software package indicated with the red solid line. The panel also shows a low-resolution ab initio model calculated from experimental data. Also, the atomistic model of UnsELP-AI.PH45 with the major dimensions are indicated. (g) 1D wide-angle x-ray scattering (WAXS) signal for the UnsELP-AI.PH45. The panel shows peak deconvolution, indicating characteristic peaks corresponding to helical conformations coloured orange versus blue for the unstructured region as well as water scattering signal. The panel also shows simulated WAXS in solution based on explicit-solvent all-atom molecular dynamic.

Figure 4 shows small-angle x-ray scattering (SAXS) spectra for the UnsELP-

Al.PHns. X-axis shows g(A-1), and Y-axis shows intensity (a.u.).

Figure 5 shows wide-angle x-ray scattering (WAXS) spectra for the UnsELP-

Al.PHns. -axis shows g(A-1), and Y-axis shows intensity (a.u.).

O Figure 6 shows tuneable supramolecular self-assembly and phase behaviour

N of full-length de novo designed UnsELP-AI.PHn. (a) Representative LLPS diagram

O for the UnsELP-AI.PH171 at different protein versus salt concentrations. The © 30 diagram was constructed according to the turbidity readout at 600 nm which

I detects coacervation accompanying phase-separation of the protein in the solution - at various temperatures ranging from 25 to 90°C with 5°C increments. (b) The 3 optical density (OD) readout as a function of temperature exhibits sharp and 2 reversible phase behaviour. The onset of coacervation temperature, as well as

I 35 — coacervate dissociation (Tt-heating & Tt-cooling), directly correlate with the

L physicochemical properties of each AI.PHn. The concentration of the protein was ug/ml and the salt concentration was 2 M in all cases. (c) Calculated Tt-heating,

Tt-cooling, and ATt-hysteresis for all the variants as in a. (d) The phase shift, Tt-

heating, Tt-cooling, and ATt-hysteresis strongly correlates with the concentration of the protein in the solution. (e) Thermal cycling (heated and cooled) of 50 pg/ml

UnsELP-AI.PH171 with 2 M salt, illustrating no perceptible changes in thermal behaviour and repeatable hysteresis. (f) Monitoring dynamics of phase separation by measuring the apparent hydrodynamic diameter (dh) for all the variants before and after coacervation using dynamic light scattering (DLS). Mean diameters are represented by solid lines and the standard deviations as shaded regions with similar color coding (N = 30).

Figure 7 shows seguence-dependent complex viscoelastic with increased mechanical properties. (a) Side-by-side comparison of phase-separated full-length

UnsELP versus UnsELP-AI.PH45 using high-resolution SEM images. Grayscale images were falsely coloured for better visibility. (b) Nanoindentation of both coacervate types after dehydration. From left to right the panel shows — representative loading-unloading curves, distribution of modulus, and hardness, extracted from hundred independent measurements (N=100). (c) The ensemble- averaged mean square displacement (MSD) of 1 um polystyrene tracer particles (PTP) was acquired for the UnsELP-AI.PH45 with 100 pg/ml before and after the phase transition using a dynamic light scattering (DLS). The panel also shows the plot of the normalized position autocorrelation function of the PTP and a conceptual scheme for the motion of a single PTP in a relevant micro-rheology setup. (d) The freguency-dependent complex viscosity of the UnsELP-Al.PH171 after condensates (e) Representative plot of the freguency-dependent viscoelastic moduli for the UnsELP-AI.PH171 after condensates. The crossover frequencies are indicated by black dashed lines. The elastic regime is shaded orange versus light blue for the viscous regime. (f) The zero-shear viscosity of all the other

O variants after condensation is ordered from lowest to highest. The panel also

N shows representative plots of the frequency-dependent viscoelastic moduli of all

O the other variants arrange from least to highest viscosity. © 30

I Figure 8 — shows large-scale molecular dynamics (MD) simulations of UnsELP- - AI.PHn phase separation. (a) All-atom explicit solvent system (water, Na*, and CI) 3 composed of 50 single repeats of UnsELP-AI.PH45 with temperature ramp from 5 2 to 50°C with 15°C increment for every 50 ns. The panel shows snapshots taken

I 35 throughout the length of the simulation to demonstrate the evolution of

L condensation as the temperature reaches and exceeds the LCST over time. (b) A closer inspection of the simulation reveals the multitude of molecular interactions.

These include hydrophobic, van der Waals, 11- Tr, cation- Tr, and hydrogen bonding that facilitates higher-order structural organization between helix-helix, helix-coil, and helix-coil domains. (c) Calculated the number of H-bonds between UnsELP and water, Al.PH45 and water, and between all the UnsELP-AI.PH45 throughout the length of the simulation. (d) Visual representation for obstruction of the interactions between water molecules and the AI.PH45 in the presence of Na+ and

Cl-. (e) Calculated solvent accessible surface area (SASA), and the RMSD throughout the length of the simulation. (f) Calculated cluster size and number of clusters throughout the length of the simulation. (9) Free energy profile of transient dimerization of helix-helix homodimers.

Figure 9 — shows biocompatibility of the UnsELP-AI.PHn. (a) CCK-8 (WST 8) cytotoxicity assay of WI-38 and MDA-MB-231 cell lines cultured for 4 days on substrates coated with UnsELP-Al.PH45 (N=3 in all cases). Uncoated tissue culture plates (TCP) were used as control. No significant difference could be found between the coated and uncoated surfaces. The panel also shows representative phase contrast and fluorescence images of the MDA-MB-231 cultured on the coated substrate. (b) Cytotoxicity assays for coated surfaces with UnsELP-

AI.PH87, UnsELP-Al.PH64 and UnsELP-AI.PH22 variants. The plot only shows the result for the MDA-MB-231 cell line. (c) The panel shows programmable depots made from UnsELP-AI.PH45. The reservoir on the left represents a non- coacervated formulation, whereas the one on the right shows the coacervated state. (d) The series of images on the left represent a depot with maintained self- assembly (first image T = O min, and last image T = 71 min ), whereas on the right a depot with controlled disassembly (first image T = O min, and last image T = 2.3 min) corresponds to the non-coacervated and coacervated formulations shown in c respectively. (e) Stability and tissue incorporation test of UnsELP-AI.PH45 by

O making injections into the wingette's subcutaneous space of a sacrificed domestic

N chicken. The injection site is indicated with a solid circle and the area around the

O depot with a dashed line. The panel also shows X-ray computed microtomography © 30 —(Micro-CT) from the site of injection. = > Figure 10 shows various strategies for drug encapsulation with controlled 3 delivery using the UnsELP-AI.PHn as the 3D scaffold. (a) Differential scanning 2 calorimetry (DSC) of the UnsELP-AIPH87 indicating its glass transition

N 35 temperature (Tg), crystallization temperature (Tc), melting temperature (Tm), and - degradation temperature (Ta), (b) Thermo gravimetric analysis (TGA), as well as differential thermal analysis (DTG) corresponding to a (c) SEM micrographs illustrating aerosol synthesized micro- and nanometer size spherical particle made at six different temperatures below the critical glass transition of UnsELP-Al.PH87. (d) The paracetamol release profiles for the micro-nano particles fabricated at 100°C, 130°C, and 150°C. The panel also shows particle size distribution for each case. (e) The release profile for three different drugs using particles fabricated at 130°C. (f) SEM micrographs of various electrospun morphologies of UnsELP-

AI.PH45. (g) Mean and standard deviation for tensile strength, Young's modulus, and tensile strain of morphologies corresponding to f. (h) The paracetamol release profiles from the morphologies corresponding to F.

Figure 11 shows versatile application of the UnsELP-AI.PHn as the next- generation biocompatible and biodegradable photoresist. (a) Surface micropatterning using conventional photolithography approach using UnsELP-

AI.PH64 as the photoresist. The panel shows bright field microscopy images at various exposure times. (b) Surfaces that are patterned with various shapes. (c)

For better visibility fluorescent images (falsely colored) are overlaid on top of SEM micrographs for better visibility. (d) 3D printing more complex geometries such as photonic crystals and woodpiles structures using a 2-photon lithography system with nano-and microstructural features by using the UnsELP-AI.PH22 as the photoresist. Grayscale images were falsely colored for better visibility. (e)

Schematic representation of printing setup for manufacturing soft protein-based micro-robots (named as “protobots’). The panel also exhibits how superparamagnetic nanoparticles (SPN) are embedded in the UnsELP-AI.PH22 scaffold through covalently crosslinking. (f) Dimensions and the shape of printed protobots. (

Figure 12 shows the use of thermoresponsive UnsELP-AI.PHn in the smart

O windows with self-regulating behavior. (a) Schematic representation of the

N transmittance modulation of smart windows for solar light and heat regulating. The

O concept relies on the formation of a reversible anisotropic interconnected © 30 —mesoglobular network of UnsELP-AI.PHn sandwiched between two panes of

I glass. (b) Image of two 10 cm? devices with different solution conditions fine-tuned - to immolate changes in daylight temperature of early summer (left), or midsummer 3 (right). (c) Transmittance spectra of the UnsELP-AI.PH45 at various temperatures 2 covering wavelengths ranging from ultraviolet, visible, and near-infrared. (d)

S 35 Transmittance modulation of UnsELP-AI.PH45 with the layer thickness from 1 to

I 10 mm at 15°C (<LCST) and 30°C (>LCST).

Figure 13 shows how in the structured regions of elastin alanine and lysine are dominantly present as conserved short a-helical conformations. The figure shows the most abundant sequences and their corresponding structures for human-origin elastin. All the structures were predicted using AlphaFold2.

Figure 14 shows 1800 de novo designed Al-predicted helix (AI.PHn) candidates assisted by AIMS with a length ranging from 20-40 amino acids long. Different labeled properties including helix propensity, polarity, hydrophobicity, hydrophilicity, pKa, bulkiness, and solvent accessibilities are indicated for each predicted candidate.

Figure 15 shows a comparison of various physicochemical properties such as length, hydrophobicity, hydrophilicity, polarity, pKa, bulkiness, solvent accessibility, helix propensity, and their average RMSD for 13 selected AI.PHn for microbial recombinant expression. n in the AI.PHn corresponds to the identification number of the predicted helices (table 1 shows the full list). All the selected candidates contain alanine residues similar to most naturally occurring silk-like sequences.

Therefore these were named the alanine-rich group. Comparisons were scored according to various scales indicated above each panel.

Figure 16 shows a comparison of various physicochemical properties such as length, hydrophobicity, hydrophilicity, polarity, pKa, bulkiness, solvent accessibility, helix propensity, and their average RMSD for 12 selected AI.PHn for microbial recombinant expression. n in the AI.PHn corresponds to the identification number of the predicted helices (table 1 shows the full list). None of the selected candidates contain alanine residues. This is in contrast with almost all the naturally

O occurring silk-like seguences. Therefore these were named the alanine-less group.

N Comparisons were scored according to various scales indicated above each

O panel. © 30

I Figure 17 shows AIMS predicted structural conformation for 25 selected Al.PHn > (both alanine-rich as well as alanine-less). n in the AI.PHn corresponds to the 3 identification number of the predicted helices (Table 1 shows the full list).

O

& < 35 Figure 18 shows results from experiments wherein all the selected de novo

Tn designs AI.PHn were incorporated and repeated four times into unstructured elastin-like polypeptide to make full-length hybrid proteins as ((VPGVG)15 —

AI.PHn)4. n in the Al.PHn corresponds to the identification number of the predicted helices (Table 1 shows the full list). The figure shows two Coomassie-stained

SDS-PAGEs indicating the production level of all the target proteins from the whole-cell extract. Only the constructs indicated by stars were selected due to substantially higher yields whereas all the other variants were excluded from the fourth study.

Figure 19 shows physicochemical properties of successfully expressed final five selected alanine-rich containing AI.PHn varients. n in the AI.PHn corresponds to the identification number of the predicted helices (Table 1 shows the full list)

Figure 20 shows physicochemical properties of successfully expressed final five selected alanine-less containing AI.PHn varients. n in the AI.PHn corresponds to the identification number of the predicted helices (Table 1 shows the full list)

Figure 21 shows a comparison between AIMS predicted structural conformation (colored) against AlphaFold — DeepMind prediction (gray) for the 10 final AI.PHn produced successfully in Escherichia coli (E.coli). n in the AI.PHn corresponds to the identification number of the predicted helices (Table 1 shows the full list).

Figure 22 shows calculated RMDS for 10 best producing variants after 100 ns

MD simulation.

Figure 23 shows intra-residue interaction network during 100 ns MD simulation for all the alanine-rich variants UnsELP-AI.PH87, UnsELP-AI.PH142, UnsELP-

AIPH162, UnsELP-AIPH171 and UnsELP-AI.PH45. Each residue is represented as a node. Two nodes are connected if either their side-chains are in close s proximity equal to or less than ~0.75 A.

N

O Figure 24 shows intra-residue interaction network during 100 ns MD simulation © 30 for all the alanine-less variants UnsELP-AI.PH18, UnsELP-AI.PH20, UnsELP-

I AI.PH134, UnsELP-AI.PH64 and UnsELP-AI.PH22. Each residue is represented > as a node. Two nodes are connected if either their side-chains are in close 3 proximity equal to or less than ~0.75 A.

O

& < 35 Figure 25 shows intra-molecular hydrogen bonding network during 100 ns MD

Ta simulation for all the alanine-rich variants UnsELP-AI.PH87, UnsELP-AI.PH142,

UnsELP-AI.PH162, UnsELP-AI.PH171 and UnsELP-Al.PH45. Each residue is represented as a node. Two nodes are connected if either their side-chains are in close proximity equal to or less than ~0.75 A.

Figure 26 shows intra-molecular hydrogen bonding network during 100 ns MD simulation for all the alanine-less variants UnsELP-AI PH18, UnsELP-Al.PH20,

UnsELP-AI.PH22, UnsELP-AI PH64 and UnsELP-AIPH134. Each residue is represented as a node. Two nodes are connected if either their side-chains are in close proximity egual to or less than -0.75 Ä.

Figure 27 shows experimental CD spectrum for UnsELP-AI.PH45, UnsELP-

AI.PH162, UnsELP-AI.PH171, UnsELP-AI.PH142 and UnsELP-AI.PH187 after one cycle of heating (20 to50°C) and cooling (50 to 20°C).

Figure 28 shows experimental CD spectrum for UnsELP-AI.PH64, UnsELP-

AIPH22, UnsELP-AI PH18, UnsELP-AI.PH20 and UnsELP-AI.PH34 after one cycle of heating (20 to 50°C) and cooling (50 to 20°C).

Figure 29 shows calculated secondary structure from averaged trajectories of 100ns MD simulation using the PDBMD2CD tool corresponding to the atomistic models of the ten best producing intact UnsELP-AI.PHn variants as shown in figure 3a modeled by AlphaFold2.11 Percentage contribution from each fold is labeled and color-coded accordingly.

Figure 30 shows Attenuated total reflection - Fourier transform infrared spectroscopy (ATR-FTIR) spectrum and peak deconvolution to identify corresponding secondary structures for all the variants. &

N Figure 31 shows experimental CD spectrum for two naturally occurring a-helices

O originated from human tropoelastin (Gene Bank ID: GBP83862.1) after one cycle © 30 of heating (20 to 50°C) and cooling (50 to 20°C). (a) UnsELP-H.TPE.HEL1 z (TPE.H.HEL1: AAAAKSAAKVAAKAQLRAAA), and (b) UnsELP-H.TPE.HEL2 > (TPE.H.HEL2: AAAAAAAAAAKAAKYGAAAGL). 3 2 Figure 32 shows experimental CD spectrum for two naturally occurring a-helices

I 35 but unrelated to tropoelastin (indicated herein as rational design) originated from

Bag-worm silk (Gene Bank ID: GBP83861.1) and Euprosthenops australis Major ampullate 2 silk (Gene Bank ID: AM490169.1) after one cycle of heating (20 to 50°C) and cooling (50 to 20°C). (a) UnsELP-BGWS (BGWS:

AAAAAAAAAEAAAAAAAAAAAA, and (b) UnsELP-EaMaSp1i (EaMaSp1:

AAAAAAAAAAAAAAA)

Figure 33 shows thioflavin T (ThT) assay for various protein versus salt concentrations at measured at 50°C corresponding to (a) UnsELP-TPE.H.HEL1 and b) UnsELP-TPE.H.HEL2. Plot showed independent of protein versus salt concentrations conformational conversion occurs from a-helical toward the more energetically favored B-sheets conformation. Changes in the relative fluorescent intensity correlated with the concertation. Higher protein and salt concentrations produce higher fluorescent signal.

Figure 34 shows thioflavin T (ThT) assay for various protein versus salt concentrations at measured at 50°C corresponding to (a) UnsELP-BGWS and b)

UnsELP-EaMaSp1. Plot showed independent of protein versus salt concentrations conformational conversion occurs from a-helical toward the more energetically favored B-sheets conformation. Changes in the relative fluorescent intensity correlated with the concertation. Higher protein and salt concentrations produce higher fluorescent signal.

Figure 35 shows time-dependent stability tests for all the de novo-designed

ELPs as well as naturally occurring and rationally designed ELPs. Panel shows changes in the relative fluorescence intensity (RFU) of Thioflavin T (ThT) as the function of time over the period of 40 days (N =3). Increase of RFU indicates conformational conversion to B-sheets and arrested state.

Figure 36 shows high resolution SEM images of the UnsELP-AI.PHrs illustrating

O elongated bicontinuous mesoglobular porous network with solid-like behavior.

O

N

O Figure 37 shows differences in the diffusion dynamics of both types of © 30 coacervates over the same period using Fluorescence recovery after

I photobleaching (FRAP) in the hydrated state corresponding to figure 8a. a 3 Figure 38 shows extracted stiffness from hundred independent measurements 2 corresponding to figure 8b.

N 35

LL Detailed description

In this specification, percentage values, unless specifically indicated otherwise, are based on weight (w/w, by weight, or wt%). If any numerical ranges are provided, the ranges include also the upper and lower values. The open term “comprise” also includes a closed term “consisting of” as one option. The diameters disclosed herein, unless specifically indicated otherwise, refer to the smallest diameter, and may be presented as average or number-average diameter and may be determined microscopically, such as by light microscopy and/or by electron microscopy. A suitable imaging software may be used. Disclosed dimensions may be measured by image analysis of microscope images, such as images from a light microscope, a field emission scanning electron microscope (FE-SEM), a transmission electron microscope (TEM), such as a cryogenic transmission electron microscope (CRYO-TEM), or an atomic force microscope (AFM).

A composition as used herein may refer to any compositions comprising one or more of the structural proteins, which may be in any form disclosed herein, optionally one or more other substances, such as agents and/or structures, optionally one or more solvents, supports and/or the like. A composition may be a solution or a dispersion. A formulation may refer to a composition formulated suitable for a specific use, such as for a specific medical use, for example injectable formulation, inhalable formulation, oral formulation, wound healing formulation and/or the like, which formulations can be prepared from the present structural proteins and/or products comprising thereof. A composition or formulation may be in a form of dry, paste, or liquid formulation, which may contain a suitable amount of solvent, such as water or aqueous solution.

The abbreviation AI.PHn used herein refers to AIMS predicted helices, which may

O be the present helical parts/seguences, wherein n is the number of the helical

N parts/seguences in the full protein. 2 © 30 The abbreviation UnsELP used herein refers to an unstructured region or

I sequence of an elastin-like polypeptide (ELP), which unstructured region or - seguence includes motif (VPGVG)n. 3 2 The full-length seguence of the structural protein includes a plurality of the motifs

I 35 and the helical seguences. The full-length seguence may also include further

L amino acids in the amino and/or in the carboxy terminus, such as a poly-histidine tag (His-6 tag) an antigenic epitope or a binding module, which facilitates purification. Carboxy-terminal further amino acids may comprise a poly-histidine tag HHHHHH, or a longer amino acid sequence, such as

SSKETAAAKFERQHMDSLEHHHHHH, and amino-terminal further amino acids may comprise MGKETAAAKFERQHMDSSA, such as presented in SEQ ID Nos 11-28. One example provides a sequence that excludes the further amino acids in the amino terminus and/or in the carboxy terminus.

The present application provides a structural protein comprising an amino acid sequence unit comprising a motif (VPGVG)n, and -an amino acid sequence selected from SEQ ID NOs:1—10 and/or an amino acid sequence having sequence similarity, such as at least 90% sequence similarity, for example at least 95% sequence similarity, with an amino acid sequence selected from SEQ ID NOs:1—10 or other suitable helical sequences disclosed herein. The sequence similarity may be determined for example by BLASTP, and/or by using BLOSUM62 scoring matrix. The sequence similarity may be sequence identity. In examples the amino acid sequence has at least 97% sequence similarity with an amino acid sequence selected from SEQ ID NOs:1—10 or other suitable helical sequences disclosed herein.

The sequence identity between two amino acid sequences can be determined for example as the output of "longest identity" using the Needleman-Wunsch algorithm (Needleman and Wunsch, 1970, J. Mol. Biol. 48:443—453) as implemented in the Needle program of the EMBOSS package (EMBOSS: The

European Molecular Biology Open Software Suite, Rice et al., 2000, Trends

Genet. 16: 276-277), for example version 6.6.0 or later. The parameters used are a gap open penalty of 10, a gap extension penalty of 0.5, and the EBLOSUM62 (EMBOSS version of BLOSUM62) substitution matrix. In order for the Needle

O program to report the longest identity, the -nobrief option must be specified in the

N command line. The output of Needle labeled "longest identity" is calculated as

O follows: Identical Residues x 100 / Length of Alignment - Total Number of Gaps in © 30 Alignment. = > The sequence similarity may comprise insertion, deletion and/or substitution of two 3 or one amino acids of an amino acid sequence selected from SEQ ID NOs:1-10 or 2 other suitable helical seguences.

I 35

L In examples the amino acid sequence is selected from SEQ ID NOs:1—10 and/or has the disclosed seguence similarity with an amino acid seguence selected from

SEQ ID NOs:1-10 or other suitable helical sequences, and/or the amino acid sequence is encoded by the nucleotide that hybridizes under high stringent conditions with the complement of the sequence encoding the mature polypeptide defined by the amino acid sequence is selected from SEQ ID NOs:1—10

Preferably the structural protein exhibits elastin-like properties, i.e. the structural protein is or may be considered as an elastin-like protein or polypeptide. The structural protein can exhibit other properties disclosed herein, such as ability of self-assembling into interconnected network through noncovalent interactions, preferably without covalent bonding.

The nis 1 or more, such as 2 or more. The n may be in the range of 2-100, such as in the range of 5-100, 10-100, 20-100, 40-100 or 41-100, or in the range of 2-80, 5-80, 10-80, 20-80, 40-80 or 41-80. Such a high number of repeats can be obtained by advanced expression systems, and provides products with increased size and yield. In many cases n may be in the range of 2-40, such as 2-20, 5-40, 5-20, or 2140. In one embodiment n is 10-20, such as 15, so the motif may be (VPGVG):5.

The amino acid sequence unit may be a repeating unit. In an embodiment the — structural protein comprises 2 or more, such as 2-30, 2-20, or 2-10 repeating amino acid sequence units.

In general not all amino acid sequences relating to the present types of elastin-like polypeptides could be successfully expressed in known expression systems, but — with the presently specified sequences, such as SEQ ID NO:1-10, it was possible to provide successful and efficient expression with a good yield. The present

O structural proteins may be provided as isolated and/or purified form, such as from

N an expression system. 2 © 30 One example discloses a polynucleotide comprising a nucleotide sequence

I encoding the structural protein disclosed herein. One example discloses an - expression vector comprising the nucleotide seguence. One example discloses a 3 host cell comprising the expression vector. A skilled person can implement the 2 expression system and express the proteins by using general knowledge in the

I 35 field of molecular biology and related art.

L

A recombinant expression vector may comprise the polynucleotide, a promoter, and transcriptional and translational stop signals. The various nucleotide and control sequences may be joined together to produce a recombinant expression vector that may include one or more convenient restriction sites to allow for insertion or substitution of the polynucleotide encoding the polypeptide at such sites. Alternatively, the polynucleotide may be expressed by inserting the polynucleotide or a nucleic acid construct comprising the polynucleotide into an appropriate vector for expression. In creating the expression vector, the coding sequence is located in the vector so that the coding sequence is operably linked with the appropriate control sequences for expression.

The recombinant expression vector may be any vector, such as a plasmid or a virus, that can be subjected to recombinant DNA procedures and can provide expression of the polynucleotide. The vector may be a linear or closed circular plasmid.

Arecombinant host cell may comprise the polynucleotide operably linked to one or more control seguences that direct the production of the present structural protein.

A construct or a vector comprising a polynucleotide is introduced into a host cell so that the construct or vector is maintained as a chromosomal integrant or as a self- replicating extrachromosomal vector as described earlier. The host cell may be any microbial cell useful in the recombinant production of the present structural protein, such as a prokaryotic cell or a fungal cell.

The amino acid sequence selected from SEQ ID NO:1—10 may be the same in each amino acid sequence unit (repeating unit). It may also be different, such as different in each amino acid sequence unit, or different in less than each amino acid seguence unit. &

N The helical sequences comprising any of the following amino acid sequences

O (SEQ ID NO: 1-10) may be included in the present structural proteins. © 30

I SEQ ID NO: 1 DAAAAAAAAAAAAAYAQKAAAAAAAKDAKK > SEQ ID NO: 2 DAAAAAAAAAAAAKYHDAAAAAAKDAKK 3 SEQ ID NO: 3 DAAAAAAAAAAAAYFHHAAAAKDAKK 2 SEO ID NO: 4 DAAAAAAAAAAADFGDAAAAKDAKK

I 35 SEGIDNO:5 DAAAAAAAAAADAAAAADAKK

U SEO ID NO: 6 EEERRREKEREREEERRRKKK

SEO ID NO: 7 EEEEEREKEDEEEEEEEKKE

SEO ID NO: 8 EEELLKKEVVLLEELLEELEELL

SEQ ID NO: 9 EEELLKREEKLLLELLLLEEELEELEELL

SEQ ID NO: 10 EEQEEEEDLQEEEVLEEEEEEEEEQEEEEEEVVVTK

SEQ ID NOs 1-5 represent alanine-rich sequences having alanine content in the range of about 64-76% by number, and SEQ ID NOs 6-10 represent glutamate- rich sequences (also called alanine-less sequences) having glutamate (glutamic acid) content in the range of about 42-75% by number. The amino acid sequence having a sequence similarity with an amino acid sequence selected from SEQ ID

NOs:1-10 may comprise one or more specific features and/or properties discussed herein for the identified sequences, such as an alanine-rich amino acid sequence, a glutamate-rich (glutamic acid rich) amino acid sequences or an alanine-less amino acid sequences, and/or other features and/or properties disclosed herein.

In one example the amino acid sequence is alanine-rich amino acid sequence, preferably having an alanine content of 64% or more, and it may be selected from

SEQ ID NOs:1-5. The alanine-rich variants were found to correlate with a faster liquid-liquid phase separation formation than the alanine-less amino acid sequences. This includes accelerated formation of LLPS due to reduced free energy, stronger molecular interactions, which can be initiated within a much lower range of salt concentration, protein concentration, and temperature, thus extending the duration of LLPS. The alanine-rich sequences provide faster self- assembly and better formation of the porous network.

In one example the amino acid sequence is glutamate-rich amino acid sequence, preferably having a glutamate content of 42% or more, also called alanine-less

O amino acid seguence, and it may be selected from SEO ID NOs:6—10. The

N glutamate-rich/alanine-less variants may be preferred for applications where LLPS

O needs to be induced at higher temperatures (delayed) with but also faster © 30 hysteresis needed. For instance, this is particularly relevant in applications such as

I smart windows or injectable matrices. a 3 The unstructured region may be the same for all the helical parts, such as 2 (VPGVG)1s. One example of an arrangement of the full-length sequence

I 35 comprises ((VPGVG):s — helical sequence)s. In one embodiment the structural

Tn protein comprises four repeating amino acid sequence units comprising -a motif (VPGVG)1s, and

-an amino acid sequence selected from SEQ ID NOs:1—10 and/or an amino acid sequence having at least 90% sequence similarity with an amino acid sequence selected from SEQ ID NOs: 1-10.

In embodiments the structural protein, or a mixture thereof, comprises one or more amino acid sequence(s) selected from SEQ ID NO: 11-28 and/or one or more amino acid sequence(s) having at least 90% sequence similarity, such as at least 95% sequence similarity, for example at least 98% sequence similarity with an amino acid sequence selected from SEQ ID NOs: 11-28 and/or the amino acid sequence is encoded by the nucleotide that hybridizes under high stringent conditions with the complement of the sequence encoding the mature polypeptide defined by the amino acid sequence is selected from SEQ ID NOs: 11-28, with or without the N-terminal and/or C-terminal additional sequences.

One example (SEQ ID NO: 11) of amino acid sequence unit (1-repeat unit) (including SEQ ID NO: 1) comprises:

VPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGV

PGVGVPGVGVPGVGVPGVGVPGVGDAAAAAAAAAAAAAYAQKAAAAAAAKDA

KK

One example (SEQ ID NO: 12) of 4-repeat unit (including SEQ ID NO: 1) comprises:

VPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGV

PGVGVPGVGVPGVGVPGVGVPGVGDAAAAAAAAAAAAAYAQKAAAAAAAKDA

—KKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGV

GVPGVGVPGVGVPGVGVPGVGVPGVGDAAAAAAAAAAAAAYAQKAAAAAAAK

O DAKKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVP

& GVGVPGVGVPGVGVPGVGVPGVGVPGVGDAAAAAAAAAAAAAYAOKAAAAAA

O AKDAKKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVG

© 30 VPGVGVPGVGVPGVGVPGVGVPGVGVPGVGDAAAAAAAAAAAAAYAQKAAAA z AAAKDAKK a 3 The following sequences (SEQ ID NO: 13-28) are embodiments of amino acid 2 seguences of the full structural proteins including four amino acid seguence units

I 35 (repeat units) and additional sequences in the amino terminus (MGKETAAAKFERQHMDSSA) and in the carboxy terminus (SSKETAAAKFERQHMDSLEHHHHHH).

pPMA94 (SEQ ID NO: 13):

MGKETAAAKFEROHMDSSAVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGV

PGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGDAAAAA

AAAAAAAAYAOKAAAAAAAKDAKKVPGVGVPGVGVPGVGVPGVGVPGVGV

— PGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGD

AAAAAAAAAAAAAYAOKAAAAAAAKDAKKUVPGVGVPGVGVPGVGVPGVGV

PGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGV

PGVGDAAAAAAAAAAAAAYAOKAAAAAAAKDAKKVPGVGVPGVGVPGVGV

PGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGV

— PGVGVPGVGDAAAAAAAAAAAAAYAOKAAAAAAAKDAKKSSKETAAAKFE

ROHMDSLEHHHHHH pPMA95 (SEO ID NO: 14):

MGKETAAAKFEROHMDSSAVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGV

— PGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGDAAAAA

AAAAAAAKYHDAAAAAAKDAKKVPGVGVPGVGVPGVGVPGVGVPGVGVPG

VGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGDAA

AAAAAAAAAAKYHDAAAAAAKDAKKVPGVGVPGVGVPGVGVPGVGVPGVG

VPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVG

— DAAAAAAAAAAAAKYHDAAAAAAKDAKKVPGVGVPGVGVPGVGVPGVGVP

GVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVP

GVGDAAAAAAAAAAAAKY HDAAAAAAKDAKKS SKE TAAAKFERQHMDSLE

HHHHHH pPMA96 (SEQ ID NO: 15):

MGKETAAAKFEROHMDSSAVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGV

10 PGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGDAAAAA

S AAAAAAAYFHHAAAAKDAKKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVG

O VPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGDAAAA

© 30 — AAAAAAAAYFHHAAAAKDAKKVPGVGVPGVGVPGVGVPGVGVPGVGVPGV z GVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGDAAA > AAAAAAAAAYFHHAAAAKDAKKVPGVGVPGVGVPGVGVPGVGVPGVGVPG x VGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGDAA 2 AAAAAAAAAAYFHHAAAAKDAKKSSKETAAAKFEROHMDSLEHHHHHH

S 35

Uu pPMA97 (SEO ID NO: 16):

MGKETAAAKFEROHMDSSAVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGV

PGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGDAAAAA

AAAAAADFGDAAAAKDAKKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGV

PGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGDAAAAA

AAAAAADFGDAAAAKDAKKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGV

PGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGDAAAAA

— AAAAAADFGDAAAAKDAKKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGV

PGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGDAAAAA

AAAAAADFGDAAAAKDAKKSSKETAAAKFEROHMDSLEHHHHHH pPMA98 (SEO ID NO: 17): — MGKETAAAKFEROHMDSSAVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGV

PGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGDAAAAA

AAAAADAAAAADAKKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVG

VPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGDAAAAAAAAA

ADAAAAADAKKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGV

— GVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGDAAAAAAAAAADAA

AAADAKKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPG

VGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGDAAAAAAAAAADAAAAAD

AKKSSKETAAAKFERQHMDSLEHHHHHH pPMA100 (SEQ ID NO: 18):

MGKETAAAKFEROHMDSSAVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGV

PGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGEEERRR

EKEREREEERRRKKKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVG

VPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGEEERRREKER

— EREEERRRKKKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGV

GVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGEEERRREKEREREE

10 ERRRKKKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPG

S VGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGEEERRREKEREREEERRR

2 KKKSSKETAAAKFEROHMDSLEHHHHHH © 30 z pPMA102 (SEQ ID NO: 19): > MGKETAAAKFEROHMDSSAVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGV x PGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGEEEEER 2 EKEDEEEEEEEKKEVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGV

O 35 — PGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGEEEEEREKEDE

I EEEEEEKKEVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGV

PGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGEEEEEREKEDEEEEEE

EKKEVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGV

PGVGVPGVGVPGVGVPGVGVPGVGVPGVGEEEEEREKEDEEEEEEEKKES

SKETAAAKFEROHMDSLEHHHHHH pPMA104 (SEO ID NO: 20): — MGKETAAAKFEROHMDSSAVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGV

PGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGEEELLK

KEVVLLEELLEELEELLVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPG

VGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGEEELLKKE

VVLLEELLEELEELLVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVG

— VPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGEEELLKKEVV

LLEELLEELEELLVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVP

GVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGEEELLKKEVVLL

EELLEELEELLSSKETAAAKFEROHMDSLEHHHHHH pPMA105(SEO ID NO: 21):

MGKETAAAKFEROHMDSSAVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGV

PGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGEEELLK

REEKLLLELLLLEEELEELEELLVPGVGVPGVGVPGVGVPGVGVPGVGVP

GVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGEE

— ELLKREEKLLLELLLLEEELEELEELLVPGVGVPGVGVPGVGVPGVGVPG

VGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPG

VGEEELLKREEKLLLELLLLEEELEELEELLVPGVGVPGVGVPGVGVPGV

GVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGV

GVPGVGEEELLKREEKLLLELLLLEEELEELEELLSSKETAAAKFEROHM

DSLEHHHHHH 10 pPMA109 (SEQ ID NO: 22):

S MGKETAAAKFEROHMDSSAVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGV

O PGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGEEOEEE

© 30 EDLQEEEVLEEEEEEEEEQEEEEEEVVVTKVPGVGVPGVGVPGVGVPGVG z VPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVG > VPGVGEEQEEEEDLOEEEVLEEEEEEEEEQEEEEEEVVVTKVPGVGVPGV x GVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGV 2 GVPGVGVPGVGVPGVGEEOEEEEDLOEEEVLEEEEEEEEEOEEEEEEVVV

O 35 — TKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPG

I VGVPGVGVPGVGVPGVGVPGVGVPGVGEEOEEEEDLOEEEVLEEEEEEEE

EOEEEEEEVVVTKSSKETAAAKFEROHMDSLEHHHHHH pPMA112 (SEQ ID NO: 23):

MGKETAAAKFEROHMDSSAVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGV

PGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGAAAAAA

AAAEAAAAAAAAAAAAVP GVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGV

— GVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGAAAAAAAAA

EAAAAAAAAAAAAVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVP

GVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGAAAAAAAAAEAA

AAAAAAAAAAVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVG

VPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGAAAAAAAAAEAAAAA

— AAAAAAASSKETAAAKFEROHMDSLEHHHHHH pPMA99 (SEQ ID NO: 24):

MGKETAAAKFEROHMDSSAVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGV

PGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGEEEEEE

— KEEEEEEEEEEEEEEEEKKKEVPGVGVPGVGVPGVGVPGVGVPGVGVPGV

GVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGEEEE

EEKEEEEEEEEEEEEEEEEKKKEVPGVGVPGVGVPGVGVPGVGVPGVGVP

GVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGEE

EEEEKEEEEEEEEEEEEEEEEKKKEVPGVGVPGVGVPGVGVPGVGVPGVG

— VPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVG

EEEEEEKEEEEEEEEEEEEEEEEKKKESSKETAAAKFEROHMDSLEHHHH

HH pPMA110 (SEQ ID NO: 25): — MGKETAAAKFEROHMDSSAVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGV

PGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGLEEKKE

10 KEEEKKKHLHILKHELKRKKKKVPGVGVPGVGVPGVGVPGVGVPGVGVPG

S VGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGLEE

O KKEKEEEKKKHLHILKHELKRKKKKVPGVGVPGVGVPGVGVPGVGVPGVG

© 30 — VPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVG z LEEKKEKEEEKKKHLHILKHELKRKKKKVPGVGVPGVGVPGVGVPGVGVP > GVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVP 3 GVGLEEKKEKEEEKKKHLHILKHELKRKKKKSSKETAAAKFEROHMDSLE n HHHHHH

O 35

L pPMA86 (SEQ ID NO: 26):

MGKETAAAKFEROHMDSSAVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGV

PGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGAAAAIA

AAAAFGGAAAAAAAAAKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPG

VGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGAAAAIAAA

AAFGGAAAAAAAAAKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVG

VPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGAAAAIAAAAA

FGGAAAAAAAAAKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVP

GVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGAAAAIAAAAAFG

GAAAAAAAAAKSSKETAAAKFERQHMDSLEHHHHHH pPMA88 (SEQ ID NO: 27): — MGKETAAAKFEROHMDSSAVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGV

PGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGAAAAIA

GAAAGFAAAAAAAKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGV

PGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGAAAAIAGAAAG

FAAAAAAAKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGV

— PGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGAAAAIAGAAAGFAAAA

AAAKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGV

PGVGVPGVGVPGVGVPGVGVPGVGVPGVGAAAAIAGAAAGFAAAAAAAKS

SKETAAAKFERQHMDSLEHHHHHH pPMA9O0 (SEQ ID NO: 28):

MGKETAAAKFERQHMDSSAVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGY

PGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGAAAAIA

IAAAIAAAAAGOASAAAAAIAAKVPGVGVPGVGVPGVGVPGVGVPGVGVPG

VGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGAAA

— AIAIAAAIAAAAAGOASAAAAAIAAKVPGVGVPGVGVPGVGVPGVGVPGVG

VPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVG

O AAAAIAIAAAIAAAAAGOSAAAAAIAAKVPGVGVPGVGVPGVGVPGVGVP

N GVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVP

2 GVGAAAAIAIAAAIAAAAAGQSAAAAAIAAKSSKETAAAKFERQHMDSLE

D 30 HHHHHH i The structural protein may be provided as and may refer to (a mixture of) two or 3 more structural proteins disclosed herein, or the structural protein may be provided 2 as and may refer to a structural protein comprising or consisting of only one type

N 35 — of structural protein.

L

Structural proteins may be obtained by the methods disclosed in the description and in the examples. The methods, or features thereof, may be combined, if applicable.

Disclosed is a method for generating de novo design sequences with alpha helical conformation, the method comprising -providing sequences and/or structural information based on desired design objectives, -data mining sequence database to identify homologous templates based on the sequences and/or structural information, -if no relevant secondary structure homologs are found to the provided sequences and/or structural information, carrying out hydrogen bond estimation algorithm (DSSP) to compute relevant structural conformation, -calculating two conformationally sensitive dihedral geometrical angles, W and ©, -obtaining generated data set and using the data set to train a first CNN (Convolutional Neural Network) based deep neural network model, -computing property values for each protein in the training set with the deep neural network model, preferably to obtain properties of each protein from the amino acid content, such as two or more of hydrophobicity, hydrophilicity, charge, bulkiness, pKa, polarity, solvent accessibility, and a-helix propensity, -generating a number of new candidate sequences with the deep neural network model by altering target property values within allowed limits and setting how many different values are to be generated within the limits, preferably also altering the length of generated sequences, filtering the result candidate sequences against assigned requirements by -performing an initial property check for the generated sequences based on

O the required limits and threshold values for each input property that can be

N computed based on the sequence, and

O -performing a similarity check for generated seguences within the training set, © 30 -predicting W and ® dihedral angles and the secondary structure by a second

I CNN (Convolutional Neural Network) based deep neural network model for all the > de novo designed sequences predicted by the deep neural network model, 3 preferably by using atomistic molecular dynamics simulation in a two-step 2 procedure of energy minimization and relaxation to assess structural stability for all

I 35 the newly predicted helices, wherein the second CNN (Convolutional Neural

L Network) based deep neural network model is trained by the same training set comprising primary and secondary structure, WW, and & dihedral angles,

-using the generated new sequences as input to the second CNN (Convolutional

Neural Network) based deep neural network model, -checking the resulting secondary structure against the constituent secondary structure, wherein if the accuracy is not within the required threshold value, such as about 0.9, ignoring the candidate sequence, -after, passing the remaining sequences with predicted structures into the MD simulation phase, which was used to further discriminate and filter for the highest stability of the generated novel sequence, and -carrying out a final similarity check (allowed similarity proportion) against existing proteins (for example search against UNIPROT) before the MD simulation, and as result, providing only novel sequences as de novo design sequences and/or for further consideration.

The sequences and/or structural information may be provided by a human expert based on desired design objectives. This however is a step involving random information that cannot be fully controlled. In the present successful selection of suitable input data, as well as successful training of the neural network models, allowed obtaining suitable sequence candidates for further consideration and expression experiments.

The first CNN (Convolutional Neural Network) based deep neural network model may be the AIMS-GATHER as described herein, or the like model, which acts as an “encoder”. The second CNN (Convolutional Neural Network) based deep neural network model may be the AIMS-PROT as described herein, or the like model, which acts as a “decoder”.

O For structural validation a suitable atomistic molecular dynamics simulation

N method may be used, such as CHARMM27 with an explicit TIP3P water model as

O the force field for the MD simulations. GROMACS or the like software may be © 30 used to run the simulations. = > The present structural proteins can provided, be and/or act as or in biocompatible 3 and/or biodegradable functional structural materials, which are useful in several 2 applications, such as in pharmaceutical or other medical applications and in other

I 35 applications, such as in functional materials for example for construction,

L automotive, electronics, optics, design and/or the like fields of technology. The properties of the present structural proteins could be modified and controlled, which enabled control over features such as rheological properties, optical properties, bioactive properties, and the like. The structural proteins can act as self-activating and/or self-assembling compounds in a variety of applications and uses.

It was found out that the helical parts of the present structural proteins can bind the proteins together and facilitate formation of elongated structures. This in one aspect has an impact to the properties of the present products. The present structural proteins are tuneable and can be provided to exhibit desired properties, such as tuneable supramolecular self-assembly, preferably by noncovalent intermolecular binding. The proteins can provide an elongated bicontinuous scaffold of an interconnected mesoglobular porous network.

The structural protein can be used for preparing a variety of products with a variety of methods, such as by electrospinning, by printing, by additive manufacturing, or by particle formation. The present disclosure provides methods for preparing products with one or more of the methods disclosed herein, the method comprising providing one or more of the structural proteins, preferably in a suitable form, such as in or as a solution or a dispersion, and forming into products by the selected method. The products may comprise only one type of the structural proteins, or the products may comprise two or more of the structural proteins, such as a mixture of two or more types of the structural protein. The “structural protein” as used herein may refer to one or more of the structural proteins.

The present disclosure provides a product comprising, consisting of and/or obtained from one or more of the structural proteins. The product comprises a form and/or a shape. The product may be any of the products disclosed herein. 5

N Particles 2 © 30 The present structural proteins may be formulated into particles, and provided as

I or in particles. The particles may substantially comprise one or more of the - structural proteins, such as the particles may be formed from raw material 3 comprising the structural proteins, for example wherein the particles comprise at 2 least 90% by weight or more, such as 95% by weight or more, for example 99% by

I 35 weight or more of the structural protein. The particles may be used with other

L materials, such as wherein existing particles are coated and/or impregnated with a fluid or other material comprising the structural proteins.

Particles comprising the present structural proteins can be formed with a variety of particle forming methods, such as by electrospraying, by spray drying, by mechanical granulation, by fluidized bed granulation or agglomeration, by crystallization, by membrane emulsification, by aerosol formation methods, such as by atomizing, and/or by any other suitable methods.

Particles may be formed from a fluid or a liquid, such as from a dispersion or a solution comprising the structural protein. The dispersion or the solution may comprise one or more further agents, such as rheology modifiers and/or binders, — such as organic polymers; bioactive agents; metal ions, such as cations and/or other substances commonly used in the art.

Micro- and/or nanometre size spherical particles comprising the present structural proteins can be prepared by using an atomizer, such as with a collision-type jet atomizer preferably connected to a heated laminar flow reactor and a low-pressure impact fractionator.

Spray drying may comprise transformation of a fluid material into dried particles, taking advantage of a gaseous hot drying medium. The process may comprise three major phases: atomization, droplet-to-particle conversion and particle collection. For example, a fluid (a liquid) is pumped to an atomizer, which breaks up the fluid feed into a spray of fine droplets. An atomizer is a device that produces a fine spray from a liquid. Then, the droplets are ejected into a drying gas chamber where the moisture vaporization occurs, resulting in the formation of dry particles. Finally, the dried particles are separated from the drying medium by using a suitable device, and recovered. &

N In one embodiment the particles have an average diameter in the range of 10 nm

O — 10 um, such as 10 nm — 5 um, 20 nm — 5 um or 20 nm — 3 um. The average © 30 diameter may be in the range of 20-1000 nm, such as 20-500 nm. Particles with a

I low diameter could be obtained, such as having an average diameter in the range > of 20-200 nm, such as 50-150 nm, for example 50-80 nm or 100-130 nm. It was 3 also possible to obtain particles with a higher average diameter, such as in the 2 range of 1-10 um, 1-5 um or 1-3 um. Figure 10c shows a variety of different

I 35 particle sizes, which could be obtained with a low size distribution.

L

A majority of the particles may have a spherical shape and a diameter in the range disclosed herein, such as determined by scanning electron microscopy (SEM)

and/or by laser diffraction analysis. The majority may refer to at least 50%, to at least 60%, to at least 70%, to at least 80%, to at least 90%, or to at least 95%, which may be determined by volume or by number. Preferably majority of the particles have a diameter in the disclosed range, preferably determined by laser diffraction method. Regarding the shape, majority, or preferably all or substantially all of the particles have the spherical shape, for example at least 80%, at least 90% or at least 95% have the spherical shape. The percentages and percentiles may refer to number of the particles (by number) or to volume of the particles (by volume).

The particle size and distribution of the particles can be determined by using any suitable particle size analyser, such as ones based on laser diffraction, and preferably by using a dedicated software, which is arranged to calculate and output the desired results. The particle size and distribution is preferably determined by using both electron microscopy, such as SEM, and laser diffraction analysis. The laser diffraction analysis or method can be carried out by using a suitable laser diffraction analyser or apparatus, and may be carried out in liquid suspension, such as in aqueous suspension. The laser diffraction apparatus may have a dedicates software which may provide desired measurement results.

The present particles may be specified as having a very high sphericity and roundness, preferably close to 1 each, such as 0.90 or more, for example 0.95 or more. Sphericity is a measure of how closely the shape of an object resembles that of a perfect sphere. Roundness is the measure of how closely the shape of an object approaches that of a mathematically perfect circle.

O Electrospun fibres and filaments

O

N

O In one embodiment the product, such as the medical product, comprises or is in © 30 the form of electrospun filaments comprising one or more of the structural proteins.

I Electrospinning is a fibre/filament production method that uses electric force to - draw charged threads of polymer solutions or polymer melts, preferably through a 3 needle, up to fibre diameters in the order of some hundred nanometres. The 2 obtained fibres and/or filaments may be microfibres and/or microfilaments (MF,

I 35 Fig. 10f), and/or nanofibres and/or nanofilaments (NF, Fig. 10f). The fibres and/or

L filaments may comprise integrated microbeads on microfilaments(MBF), nanoellipsoids on nanofilaments (NEF), or micro-ellipsoids on microfilaments (MEF), as shown in Fig. 10f. The average diameter of the filaments may be 300 nm or less, 200 nm or less, or 100 nm or less. The dimensions, such as diameters, and/or morphologies may be detected by electron microscopy, such as SEM.

In the present case a mixture of polymer and the present structural protein may be used as the raw material (polymer solution, dope), which may be a fluid or a liquid, such as dispersion or a solution comprising the structural protein. More particularly the structural proteins may be included in a spinning dope, and they may have an effect to the properties of the spinning dope, such as to rheology of the dope solution, which may facilitate the electrospinning process. Any suitable spinnable polymer commonly used in electrospinning may be used, such as water-soluble polymers and/or a polymers soluble in organic solvents. Examples of water- soluble polymers include polyvinyl alcohol (PVA), poly(ethylene oxide) (PEO), poly(acrylic acid) (PAA), polyvinyl pyrrolidone (PVP), polyethyleneimine (PEI), polyacrylamide (PAM) and the like. Natural polymers, such as cellulose-based polymers, may be also used, such as ethyl cellulose. A combination of polymers may be also used. The polymers may be provided for example as a solution of (distilled) water, or (distilled) water and ethanol.

The polymer solution, which may be called a spinning dope, may be provided through a syringe connected to a needle by using a pump, and the needle tip and a metallic collector ground are provided with high voltage electric current, such as in the range of 15-20 kV. By altering parameters such as voltage, concentration of the polymer and/or other substances, needle size, and flow speed and the like, it is possible to obtain different properties, such as microfilaments, microbeads, — microellipsoids, nanoellipsoids, nanofilaments and the like. The present structural proteins have an impact to the properties of the final electrospun fibres or

O filaments, and enable preparing new products and/or products with desired

N properties. 2 © 30 One or more bioactive agents, such as pharmaceutical compounds, may be

I included in the filaments, such as during the electrospinning or before or after it. - For example the bioactive agent(s) may be added to spinning dope or the raw 3 material used in the electrospinning, or the spun fibres/filaments may be treated 2 with a solution or a dispersion of the bioactive agent, for example to impregnate

I 35 and/or coat the filaments/fibres.

L

A filament is a term that may refer to a long fibre, and a filament yarn may refer to a yarn that is formed of one or more filaments running the length of the yarn. A fibre bundle presents an assembly of fibres that are aligned in a specific direction.

In comparison, the yarn is a continuous length of any interlocked fibres.

Electrospun nanofibres are mainly fabricated as randomly oriented fibre mats.

Nanofibres may be converted into continuously twisted bundles, such as nanofibre — yarns that can increase their mechanical strength. From electrospun nanofibres, two types of continuous nanofibre bundles could be created. Non-twisted nanofibre bundles are generally referred to as filament yarns, whereas twisted continuous fibre bundles exhibit all yarn characteristics. Nanofibres may be defined as fibres with an average diameter of less than 1000 nm.

The present disclosure provides an electrospun filament or fibre, such as nanofilament or nanofibre, or microfilament or microfibre, comprising one or more of the structural proteins, or a product obtained from the filaments or fibres, such as twisted or non-twisted bundles, or yarns. The electrospun filaments or fibres may be included in other materials, such as in a matrix and/or they may be in a form of a nonwoven.

Different products could be obtained by controlling and altering the spinning conditions and/or the dope composition. For example smooth and featureless — micro-and/or nanofilaments could be obtained, which enable providing a fast one- step release, or products with a higher structural complexity, such as comprising incorporated beads-on-string, which enable providing a slow dual-step release profile, when the products were used for drug release.

In one example the product comprises an electrospun scaffold, which may be used for example in tissue engineering applications, and which can be penetrated

O with cells to treat or replace biological targets. In one example the product

N comprises a fibrous wound dressing, which can isolate the wound from microbial

O infections and provide additional features as discussed herein. In further examples © 30 the product comprises a suture, an implant, a transdermal patch, an oral form and

I the like medical product. The product may be also a cosmetic product, which may > comprise one or more cosmetic agents.

S

2 Medical products

I 35

L The present disclosure provides a medical product comprising a pharmaceutical compound and one or more of the structural proteins. The medical product may be a sustained-release and/or a controlled-release product and/or formulation, such as a depot. The pharmaceutical compounds, which may include one or more pharmaceutical compounds, may be encapsulated in a vehicle comprising the structural protein and/or included in a matrix comprising the structural protein. In examples the product, such as a medical product, comprises an injectable or an inhalable matrix and/or a (programmable) thermoresponsive matrix, such as a programmable thermoresponsive injectable or inhalable matrix.

A pharmaceutical compound may be any suitable pharmaceutical or therapeutical compound, such as active pharmaceutical ingredient (API), nucleic acid, such as deoxyribonucleic acid (DNA) and/or ribonucleic acid (RNA) or variants or derivatives thereof, which is/are to be administered to a subject, such as a human subject. The pharmaceutical compound may be administered by any suitable route, such as by injecting, orally, topically and the like routes. The pharmaceutical compound may be targeted to be delivered to a target in a body. The pharmaceutical compound(s) may be included in the medical products in amounts providing a desired therapeutic effect when administered to a subject in a predetermined dosage.

The pharmaceutical compound may comprise one or more of bioactive, such as anti-tumor, anti-cancer, anti-bacterial, for example antibiotic, anti-asthmatic, anti- viral, anti-inflammatory, anti-allergic and analgesic compound(s) or agent(s). In examples the pharmaceutical compound comprises or is an analgesic and/or an antipyretic compound, ie. a painkiller, such as an opioid, non-opioid or a nonsteroidal anti-inflammatory drug (NSAID).

In one example the medical product comprises a wound covering product, such as

O a wound coverage apparatus and/or material, for example a wound dressing or

N part thereof, which may be multifunctional. The present structural proteins may be

O included as particles or fibres in the medical products or other products, and/or © 30 included in other materials, such as matrix materials and/or structural materials,

I such as textiles, nonwovens or the like, or polymeric materials, such as organic - polymeric materials. The medical products may comprise forms and products 3 disclosed herein, such as electrospun fibres or filaments.

O

& < 35 The pharmaceutical composition may be a pharmaceutical formulation, and it may

Tn comprise one or more bioactive agents in combination with the structural protein. If a solvent is present, it may be water or aqueous liquid, and/or organic solvent.

In one embodiment the medical product is an injectable composition. The injectable composition may be in a form of a dispersion comprising the structural protein in a suitable form, such as in a form disclosed herein, for example in the form of particles. The injectable composition may be a thermoresponsive composition, such as programmable thermoresponsive composition. The injectable composition may be aqueous composition, and it may comprise one or more further agents, such as pH adjusting agents, buffering agents, osmolality adjusting agents, and/or other agents commonly used in the art.

An aqueous composition may have a water content of 50% by weight or more, such as 70% by weight or more, for example 80% by weight or more, or . The aqueous composition may refer to the medical compositions or to other compositions disclosed herein, such as solutions and/or dispersions.

A formulation, or a composition, may be in a form of a paste, wherein it may comprise solvent, usually water, in a suitable amount to form a paste, such as in an amount in the range of 10-50% by weight, for example 12-48% by weight, 20— 48% by weight, or 12-30% by weight.

In one embodiment the medical product is an inhalable composition, such as a composition or formulation that can be used in an inhaler or a nebulizer. The pharmaceutical compound may be an asthma medication, such as short-acting beta-agonist, anticholinergics, oral corticosteroid, or a combination guick-relief medicines have both an anticholinergic and a short-acting beta-agonist.

An inhalable composition may be dry or substantially dry composition, such as

O having a moisture content of 10% by weight or less, such as 8% by weight or less.

N An inhalable composition comprises the structural proteins in a suitable form, such

O as in a form disclosed herein, for example in the form of particles disclosed herein. © 30

I In one embodiment the medical product comprises or is in the form of particles - having an average diameter in the range of 20 nm — 3 um, or another subrange 3 disclosed herein. g

I 35 The medical products are preferably sterilized, such as autoclaved, irradiated

L and/or chemically sterilized. The medical products may be provided as packed in sterile packings, which enable maintaining the product as sterile and in a desired moisture content.

Micro-robots

More complicated products could be also formed from the present structural proteins, such as micro-robots or the like micro or nano devices, which can be actuated by using internal and/or external energy and/or stimulus. The micro-robot may be also considered as a medical product. Micro-robots can be fabricated by using techniques, such as micro-contact printing, laser lithography, photolithography or other additive manufacturing techniques.

Disclosed is a protein-based micro-robot comprising one or more of the structural proteins and means for actuating the micro-robot, such as internal and/or external means for actuating. The means for actuating may comprise magnetic substances, such as particles or other parts, which can react to external magnetic field, wherein external magnetic field may be controllably provided to actuate the micro- robot. The means for actuating may comprise magnetic (nano)particles, for example on the surface of the micro-robot, such as on the body of the micro-robot, and/or in the body of the micro-robot.

The present micro-robots can be actuated, such as steered and/or directed, externally by using controllable mechanisms such as magnetic fields or ultrasonic waves. Magnetic actuation may be used for in vivo applications, wherein the magnetic nanoparticles in the micro-robot are controllably effected by external magnetic field to cause the micro-robot to spin. This will cause the micro-robot to proceed, i.e. to swim in the fluid, preferably to a desired target. Alignment of the magnetic nanoparticles defines an easy axis normal to the helical axis, thereby

O allowing rotational motion under rotating magnetic fields. In one example magnetic

N particles, such as superparamagnetic nanoparticles (SPN), are embedded in a

O scaffold of the present structural proteins, preferably through covalent crosslinking. © 30

I The micro-robot may comprise a suitable shape, such as a double-helical shape, a - spiral shape and/or a screw shape, preferably when the micro-robot is an 3 elongated, to enable movement of the robot in a fluid and/or a solution, such as in 2 a bodily fluid. The micro-robot may be actuated to spin around its

N 35 — longitudinal/helical axis, for example by applying rotational magnetic fields, which

LL causes the micro-robot to move in the fluid. In one example the micro-robot comprises a shape, such as a double-helical shape, enabling moving by rotational movement. In one example the micro-robot comprises a double spiral drill bit

(DSDB) designed specifically for fast-forward velocity propulsion by exerting torque around the helical axis after applying rotational magnetic fields. The micro- robot may have a length in the range of 10-30 um and/or a diameter in the range of 5-10 um. The dimensions and/or shapes may be determined microscopically.

The micro-robot may be a (synthetic) microswimmer and/or it may be a cargo carrier. A microswimmer is a microscopic object with the ability to move in a fluid environment. The present protein-based micro-robots are soft, biocompatible, biodegradable and/or bioresorbable, thus being ideal for biomedical applications.

The micro-robots may be injectable and may be injected into a body of a subject or to other targets, which may be medical or non-medical. The micro-robots may be provided and/or injected as a dispersion. The micro-robots may be used for targeting delivery of genes, drugs (pharmaceuticals) and/or other cargo to a target, such as to a cancer cell or to other site of disease or disorder through blood vessels and/or other fluid channels and/or areas. The cargo may comprise one or more substances, such as pharmaceutical compound, such as API, nucleic acid, such as deoxyribonucleic acid (DNA) and/or ribonucleic acid (RNA) or variants or derivatives thereof, other bioactive substances or agents, and/or the like substances. The cargo may be integrated to the micro-robot, for example to the surface thereof, in the structure thereof, such as impregnated with the micro-robot, to a cargo compartment, and/or in another suitable manner, which may allow release of the cargo, especially in the target. For example the micro-robot may be arranged to be biodegraded and/or enzymatically degraded in the target and/or after a period of time in a biological environment, such as in a body of the subject.

Micropatterns &

N The present disclosure provides a surface comprising micropatterning comprising

O one or more of the structural proteins. The micropatterning may comprise one or © 30 more types of patterns and/or forms, such as stripes, honeycomb, pillars, cross

I lines, and the like. Micropatterns may be formed onto a surface by using - techniques such as micro-contact printing, laser lithography, photolithography or 3 other additive manufacturing techniques.

O

& < 35 The micropatterns, also including nanopatterns, may comprise one or more

Tn patterns, such as repeating patterns, having one or more dimensions at a nanoscale and/or microscale range. For example the smallest dimension of a micropattern may be in the range of 100-5000 nm, such as 100— 1000 nm, 100-

3000 nm, 200-2000 nm, 500-5000 nm or 500-3000 nm. Examples of micropatterns can be seen in Figures 11 a—d. The patterns include for example honeycomb patterns, lines, raster patterns, crossed lines, circles, dots, text characters, grids and the like.

The surface may be a surface of a substrate. Suitable substrates or surfaces include glass, crystals, metals, plastics, silanes, such as polyacrylamide and polydimethylsiloxane (PDMS). — Micropatterns may be used in several applications, such as electrical applications, biomaterial applications and medical applications. In cellular biology, micropatterns can be used to control the geometry of adhesion and substrate rigidity.

Micropatterns enable for example precise and relatively rapid experiments controlling cell adhesion, cell migration, guidance, 3D confinement and microfabrication of microstructured chips.

Photonic crystals

The present disclosure provides photonic crystals comprising one or more of the structural proteins. Photonic crystals may be formed onto a surface by using techniques such as micro-contact printing, laser lithography, photolithography or other additive manufacturing technigues.

Photonic crystal may be defined as a structure, where the refractive index varies periodically causing changes in the optical properties. The intricate design of photonic crystal structures dictates their optical behavior enabling the control of

O the light propagation. Photonic crystals can be categorized based on their

N dimensionality, although these classifications are more about nomenclature than

O literal physical dimensions. © 30

I One-dimensional Photonic Crystals can be constructed through the layering of thin - films. By depositing these layers atop one another, scientists can engineer unigue 3 optical properties. One-dimensional photonic crystals can be used in applications 2 in various fields, from the creation of dielectric mirrors to the development of thin

N 35 film optics. They can be employed in coatings that offer both low and high - reflection on lenses and mirrors, and even in innovative color-changing paints and inks.

Two-dimensional Photonic Crystals can be fabricated through techniques such as photolithography or by drilling holes in a substrate, and these structures exhibit optical characteristics. Two-dimensional photonic crystals are indispensable in photonic crystal fibres, which have revolutionized fibre optic communication. They can serve as waveguides, enabling the efficient transmission of light signals across a range of applications.

Three-dimensional Photonic Crystals: The pinnacle of photonic crystal engineering, three-dimensional structures, offer a plethora of possibilities.

Fabrication methods include drilling at various angles, stacking multiple 2-D layers, direct laser writing, and the intriguing approach of inducing self-assembly of spheres within a matrix and then dissolving these spheres. Three-dimensional photonic crystals are indispensable in optical computing and photovoltaic cells.

They can provide complex structures like woodpile configurations, triply periodic — minimum surface (TPMS) achieved through advanced 3D lithography techniques.

The versatility of photonic crystals extends to a myriad of applications, including but not limiting the following examples, which can be provided in a form of a device for implementing the application, wherein the device comprises the photonic crystals or a functional part comprising the photonic crystals.

One example provides three-dimensional optical crystals in applications and devices relating to optical communication. They can be used to control the propagation of light, to provide and enable the development of efficient photonic crystal fibres, to facilitate high-speed data transmission in fibre optic networks, and the like.

O One example provides three-dimensional optical crystals in applications and

N devices relating to photonics integrated circuits: They can be used to

O subwavelength grating (SWG) waveguides, as components in photonic integrated © 30 circuits. These waveguides can serve various functions, from fibre-chip couplers to

I ultra-fast optical switches, and even biochemical sensors. a 3 One example provides three-dimensional optical crystals in applications and 2 devices relating to optical computing. Their unigue properties enable the creation

I 35 of novel optical components that can process information at the speed of light,

L thus revolutionizing computing technology.

One example provides three-dimensional optical crystals in applications and devices relating to photovoltaics. They can be applied in the development of advanced photovoltaic cells. By manipulating the behaviour of light, they can enhance the efficiency of solar energy conversion.

Metamaterial

The present disclosure provides metamaterial comprising one or more of the structural proteins. Metamaterials may be formed onto a surface by using techniques such as micro-contact printing, laser lithography, photolithography or other additive manufacturing techniques. The metamaterial or a product comprising the metamaterial may be designed and/or provided for applications for influencing wavelength of interest.

Metamaterials are engineered materials, which are designed to possess properties rarely observed in naturally occurring materials. They are formed by assembling various components, preferably organized in recurring patterns at scales smaller than the wavelengths of the phenomena they influence, ranging from nanoscale to macroscale. The metamaterial may be subwavelength grating (SWG) metamaterial. The patterns, which may comprise gratings, may have a grating period of less than 300 nm, such as in the range of 50-300 nm, for example 70— 250 nm. The properties of the metamaterials are not governed by the characteristics of their constituent materials, but rather by the designs of their structures. These structures are intricately shaped, sized, oriented, and arranged to bestow upon them unique capabilities.

O The metamaterial may comprise a plurality of unit cells, which may comprise or be

N formed of the patterns, which may be nanopatterns or micropatterns, and wherein

O the unit cell may refer to a distance of two adjacent pattern units, wherein the unit © 30 cell has a dimension, such as a smallest dimension, of 0.1 wavelength of interest

I or less, such as less than 0.1 wavelength of interest. a 3 Metamaterials are notable for their mechanical properties. Through engineering of 2 their structures, they can exhibit unconventional behaviours, including negative

I 35 stiffness and extreme elasticity, properties rarely encountered in natural materials.

Tn The present metamaterials can withstand specific mechanical stresses while simultaneously maintaining a low weight, small volume fraction, and compactness.

This makes them ideal for applications, where optimizing structural integrity and minimizing weight are of paramount importance, allowing them to effectively dissipate impact forces, dampen vibrations, and serve as shock-absorbing materials.

The versatility of metamaterials extends to a variety of applications, including but not limiting the following examples, which can be provided in a form of a device for implementing the application, wherein the device comprises the metamaterial or a functional part comprising the metamaterial.

One example provides metamaterials in applications and devices relating to electromagnetic cloaking, which can be used for manipulating electromagnetic waves to make objects invisible or "cloaked" to specific frequencies for use in military stealth technology and concealing objects from radar.

One example provides metamaterials in applications and devices relating to antennas and sensors, such as an antenna or a sensor comprising the metamaterial. Metamaterials can be used for the design of compact and efficient antennas and sensors for various applications, including telecommunications, medical imaging, and remote sensing, for example to enhance the efficiency and performance of wireless communication devices, leading to faster data transfer and improved signal reception. The sensors can be used for environmental monitoring, such as detecting pollutants in air and water.

One example provides metamaterials in applications and devices relating to optics and imaging. They can be used for controlling the path of light in unusual ways, enabling the development of super-resolution imaging devices, perfect lenses, and s optical cloaking devices.

N

O One example provides metamaterials in applications and devices relating to © 30 energy harvesting. They can be used for capturing and manipulate energy from

I sources like sunlight or radio waves, leading to advancements in solar energy and - wireless power transfer. 3 2 One example provides metamaterials in applications and devices relating to

I 35 acoustic and vibration control. They can be used for controlling sound propagation,

L for noise reduction and acoustic cloaking. In vibration control, they can provide materials with damping properties for instance in structural engineering.

One example provides metamaterials in applications relating to medical devices.

They can be incorporated into medical devices for imaging, diagnostics, and treatment. They can improve the performance of medical imaging equipment like

MRI and ultrasound machines.

One example provides metamaterials in applications and devices relating to aerospace, such as for use in the design of ultra strong, lightweight materials with unique electromagnetic properties, making aircraft more fuel efficient and less detectable by radar in the form of thin coating or structural component.

Manufacturing of products

The products disclosed herein may be formed and/or obtained from raw material comprising one or more of the structural proteins, such as fluid or liquid comprising — the structural proteins, as disclosed herein. In many cases the raw material may comprise a dispersion or a solution, but other raw materials may be used where applicable, such as dry or dewatered material, and/or already formed products such as particles, filaments, fibres and the like, which may be combined with other materials, products and/or substances.

The products disclosed herein, where applicable, such as the protein-based micro- robots, the surface comprising micropatterning, photonic crystals, and the metamaterial, and/or parts thereof, may be obtained by any suitable method, such as by additive manufacturing, i.e. by 3D printing. Products or parts thereof may be also prepared by conventional printing and other methods, for example when patterns with low height are desired on a surface of a substrate, such as a sheet.

O In such cases the structural protein may be provided in a suitable form for printing,

N such as inkjet printing, lithography, for example 3D lithography microlithography,

O or laser lithography, additive manufacturing/3D printing, preferably in the form of © 30 printing ink, such as in the form of an additive manufacturing ink. = > The present disclosure provides a method for preparing a product, the method 3 comprising 2 -providing one or more of the structural proteins in a fluid or a liguid, such as in a

I 35 dispersion or a solution, or in any other suitable form,

L -forming the fluid or the liguid, or the other form comprising one or more of the structural proteins, into a product comprising one or more of the structural proteins. The forming may include dewatering and/or forming one or more shapes of the product. The “into a product” may comprise “into a form of a product”.

In embodiments the forming is carried out with one or more of printing, additive manufacturing, particle forming, electrospinning, and lithography, such as laser lithography or microlithography, for example photolithography. The product may be any of the products disclosed herein. ‘Additive manufacturing” refers to methods and technologies that grow three- dimensional objects one superfine layer at a time. Each successive layer bonds to the preceding layer of melted or partially melted material. The objects are digitally defined by computer-aided-design software. This information guides the path of a nozzle or print head as it precisely deposits material upon the preceding layer. The printing process may comprise material extrusion, wherein the structural proteins contained in the printing ink are extruded through the nozzle or print head. The printing process may also comprise using laser, such as in direct energy deposition method, wherein an electron beam gun or laser melts feedstock or powder comprising the structural proteins. Methods such as micro-contact printing, laser lithography, and photolithography may be considered additional manufacturing methods.

The fluid or the liquid, such as the dispersion or the solution, comprising the structural protein may be a printing ink. The printing ink may be water-based and it may contain one or more additional substances, such as rheology modifiers and/or binding agents, which may comprise one or more organic polymers; bioactive agents and the like. &

N One example provides a method for preparing a product, comprising

O -providing a printing ink comprising one or more of the structural proteins, © 30 -forming the product or part of the product by printing from the printing ink. = > One example provides a method for preparing a product, comprising 3 -providing an additive manufacturing ink comprising one or more of the structural 2 proteins,

N 35 — -forming the product or part of the product by additive manufacturing from the

LL additive manufacturing ink.

With printing methods it was possible to obtain structures such as photonic crystals and woodpiles structures having a feature resolution down to 100 nm or below. Such structures were found durable, and they could maintain their structural integrity in solution and after drying and/or washing.

Lithographic methods can be used for preparing products. For example laser lithography, photolithography, micro-contact printing and surface micro-patterning were found useful methods for preparing products comprising patterns, such as micropatterns and woodpile structures. In photolithography the present structural proteins can be used as a photoresist.

The patterned products may be formed on a substrate, which is preferably rigid and has a smooth surface. Example of suitable surfaces include glass, crystal, metal and the like.

The fluid or the liquid, such as the dispersion or the solution, comprising the structural protein may be used for electrospinning, particle forming and/or for other forming methods where applicable.

It was found out that the present structural proteins having high molecular stability could tolerate high temperatures, such as high solution temperature and/or the use of laser, such as in laser lithography methods. Therefore products can be formed with methods utilizing high temperatures, and the obtained products can be used in applications involving high temperatures.

One example provides a method for preparing a micro-robot, comprising

O -providing an additive manufacturing ink comprising one or more of the structural

N proteins,

O -forming a body of a micro-robot or part of the body by additive manufacturing from © 30 the additive manufacturing ink, and/or by methods such as micro-contact printing,

I laser lithography, or photolithography, preferably - -providing magnetic particles, and 3 -incorporating the magnetic particles to the body of the micro-robot, preferably by 2 covalent crosslinking.

N 35

LL Thermoresponsive glass

The present structural proteins can be used as a self-activating compound in smart windows with the ability to self-modulate the amount of solar radiation passing through the windows without the need for human intervention, such as the use of blinds or curtains, or external stimuli, such as applied electric potential in electrochromic windows. Such products may be obtained by sandwiching a thin layer of the structural proteins or material comprising thereof between two panes of glass.

The present disclosure provides a thermoresponsive glass comprising one or — more of the structural proteins. The thermoresponsive glass may comprise a layer comprising one or more of the present structural proteins between two sheets of glass. The thermoresponsive glass may be prepared by providing a layer comprising one or more of the present structural proteins between two sheets of glass. The layer may have a thickness in the range of 100-2000 um, such as 500- 2000 um or 500-1500 um.

Preferably the layer comprises material comprising reversible anisotropic interconnected mesoglobular network of the structural protein arranged to undergo phase separation by the effect of temperature to obtain a change in transmittance.

The thermoresponsive glass may be used in smart windows. The effect is based on the reversible thermoresponsive behaviour and coacervation of the present structural proteins into an ultra-white condensate made from a multiscale anisotropically interconnected mesoglobular network of low-refractive-index protein, such as in the range of 1.3—1.5, with the ability to scatter solar radiation covering wavelengths ranging from ultraviolet, visible, and near-infrared (Fig. 12a—

O c, Fig. 6 and Fig. 36). The adaptive phase behaviour of the structural proteins

N provides the effect of darkening or lightening the glass, thus enabling darkening or

O lightening the interior of a building, for example. This enables for example © 30 modulating the indoor temperature for sustainable building design. = > At temperatures below the LCST, UnsELP-AI.PHn, remains soluble, and phase 3 separation does not occur (Fig. 12b and 12d). As the result, smart windows retain 2 their transparency, allowing solar radiation to penetrate, therefore warming up the

I 35 interior space. In contrast, when the temperatures exceed the LCST, phase

Tn separation of the structural proteins will be triggered and the window becomes opaque, blocking the sun's rays and preventing the excess heat accumulation indoors. As the temperature falls below the LCST once more, the condensates re-

dissolve, restoring the transparency of the smart windows and allowing sunlight to pass through again. This smart window thus exhibits a self-regulating behaviour that is crucial for maintaining a comfortable indoor environment while reducing energy consumption. In addition, the device could be fine-tuned to a very specific geographical location, time of day, or month of the year by altering the UnsELP-

Al.PHn variants exhibiting higher or lower LCST, solution conditions (protein and salt concentration), and even device layer thickness (Fig. 6 and Fig. 12b).

The present disclosure provides use of one or more of the present structural proteins for preparing any one of the products disclosed herein, and/or for obtaining any one of the properties disclosed herein in a suitable application.

Examples

Learning the grammar of elastin’s ordered-disordered sequences

Elastin is a naturally occurring biological elastomer that provides elastic recoil properties to vertebrate tissue. The primary physiological function of elastin is maintaining structural stability after repetitive contraction and extension of the tissues over a lifetime. The primary constituent building block of elastin is a monomeric protein precursor known as tropoelastin. It shares noticeable similarities to other protein-based fibrous materials such as silk, resilin, or high molecular weight gluten. At the sequence level, it is highly repetitive and low in complexity with the quintessential dip-block copolymer containing alternating hydrophobic and hydrophilic regions. At the structural level, it is made from consecutive unstructured (~80%) and structured (~20%) modules. Such

O heterogeneity of molecular conformation enables a high degree of mechanical

N elasticity and recoil. 2 © 30 In the intrinsically disordered regions proline, glycine, and valine favour transient

I fluctuation that implies structural disorder. They can be arranged into motifs of di, > tri, penta, and hexapeptides segments (such as GV, PGV, GVA, GGV, VPGVG, 3 and VAPGVG) with penta and hexapeptides motifs being the most abundant 2 interspecies repeats. In contrast in the structured regions, alanine and lysine are

I 35 dominantly present as conserved 12 to 21 residues long a-helical conformations.

Tn The A6 and AAAK motifs are found to be the most abundant repeats (Fig. 13).

Both the alanine and lysine with their mobile dihedral angles and presence of amide hydrogen exhibit low entropic penalty for conformational confinement, thus promoting the formation of a-helices with a low degree of hydration by adapting a compact water-excluding core (Fig. 13).

The incorporation of structured domains is crucially important for the multiscale liquid-liquid phase separation (LLPS) of tropoelastins. While the unstructured regions alone form a micrometer-sized coalescing colloidal suspension of liquid- like droplets, the introduction of helices has increased enthalpic and decreased entropic effect on the LLPS process mediated through water exclusion, self- association, and increased intermolecular interactions. In this case, the structured- unstructured conformation results in more arrested LLPS assemblies and results in the formation of a highly elongated bicontinuous scaffold of an interconnected mesoglobular porous network.

The current limitation revolves around the lack of spatiotemporal control in vitro — over the recombinantly produced building blocks containing the aforementioned motif, which frequently culminate in premature aggregations and inferior properties, mandating groundbreaking advancements in the field. Generally, a- helical conformation is found to be relatively stable under certain conditions, such as in the hydrophobic interior of a protein or in the presence of stabilizing interactions with other structural elements for example in enzymes, and transporters. The helices observed in structural proteins, in contrast, are intentionally designed by evolution to exist in a metastable state. While they can be stable and persist in their folded state under suitable conditions over a certain timescale, they can also be susceptible to transitioning to more thermodynamically favoured states conformations, such as B-sheets, and amyloid aggregates. This propensity for conformational changes is particularly pronounced in response to

O external influences, including fluctuations in temperature, alterations in pH, and

N changes in salt or protein concentrations, revealing the intricacies of their

O conformational dynamics and the delicate balance between stability and transition © 30 to alternative conformations. Thus, their conformational state is tightly regulated

I during expression, processing, and extracellular self-assemblies which cannot be - easily mimicked under laboratory conditions. 3 2 Creating a library of de novo designed a-helical domains

I 35

I The Al-empowered material scientist (AIMS) protocol predicted 1800 novel helices (Fig. 1) based on the criteria for having a length ranging from 20 to 40 residues (Fig. 13). Using the helix propensity feature map, candidates were strained to 178

AIMS predicted helices (AI.PHn, n = identification number of the helix), with a score above 0.7, while the rest were excluded from further analysis (Fig. 1a).

Additionally, 34 candidates were excluded based on the AIMS prediction accuracy score bellow 0.8, and those that exhibited the average root mean square deviation (RMSD) above 3 A after two-step atomistic molecular dynamic (MD) simulation (Fig. 1b—1d). In the end, from the pool of 144 de novo sequences, 25 variants were selected each with distinct physicochemical features for recombinant expression (Figs. 15-16). The complete list of selected variants is provided in

Table 1. Of the set, 13 of the 25 were grouped as alanine-rich AI.PHn due to the abundance of alanine resembling naturally occurring motifs (Figs. 15—17). The remaining 12 were group alanine-less AI.PHn comprising no alanine residues.

Recombinant production of full-length sequences by combining intrinsically disordered ELP and Al.PHn

Consequently, twenty-five hybrid - de novo designed ELPs variants were genetically engineered for recombinant expression in Escherichia coli. This was carried out by taking fifteen times repeats of intrinsically disordered ELP coding sequence (VPGVG):s, followed by a single AIPHn The tandem repeat of combined AI.PHn and (VPGVG):s were then repeated four times to create the full- length hybrid protein sequences named UnsELP-AI.PHn, where “n” corresponds to the identification number of predicted helices (Table 1). At the sequence level, the combination resulted in full-length UnsELP-AI.PHn contains 25+3% structured, and 75+3% unstructured regions. Even though all the 25 selected de novo designed helices passed stringent computational selection criteria, some of the corresponding full-length UnsELP-AI.PHn failed to produce during laboratory

O experiments for a variety of reasons. Typical issues include lack of soluble

N expression, aggregation, and yield. Out of 25 initially selected candidates, only 15

O successfully expressed the coding seguences. However, in the end only the 10 © 30 best-producing UnsELP-AI.PHn variants with substantially higher yields were

I selected (Fig. 2e and Figs. 18-20). Those that did not produce or had the least > expression level were excluded from further analysis. 3 2 Table 1. AIMS predicted helices.

I 35

AAAAIAIAAAIIAAAAGAAQSAAAAAIAAK AlLPH 4

Al.PH4), Al.PH4 ((VPGVG)1s — | UnsELP- None

EEELLKELEELELLELEEELL ALPH5

Al.PH5), Al.PH5 ((VPGVG)1s — | UnsELP- None

AAAAIAGAAAIIAAAAAAAK AlLPH 8

AI.PH8)4 Al.PH8 18 ((VPGVG)1s — | UnsELP- Yes - high

EEELLKKEVVLLEELLEELEELL AlLPH 18

Al.LPH18)4 AIPH18 ((VPGVG)1s — | UnsELP- Yes - high

EEELLKREEKLLLELLLLEEELEELEELL AI.PH 20

Al.PH20), Al.PH20 22 ((VPGVG)1s — | UnsELP- Yes - high

EEEEEREKEDEEEEEEEKKE AlLPH 22

AI.PH22)4 AI.PH22 36 ((VPGVG)1s — | UnsELP- None

AAAAIAAAAAIAIAAAAAIAAK AILPH 36

AI PH36)4 AI PH36

EEELLKREEKLLEEEEEEEELLLEEELEE | 43 ((VPGVG)1s — | UnsELP- None

AI.PH 43

LEELL Al.PH43), Al.PH43 ((VPGVG)1s — | UnsELP- Yes - high

DAAAAAAAAAAAAYFHHAAAAKDAKK AILPH 45

Al.PH45), Al.PH45 46 ((VPGVG)1s — | UnsELP- None

EELLLKLLLLLELELLLELL AI.PH 46

Al.PH46), Al.PH46

LO

N AAAAIAIAAAIIAAAAAAIAAAQSAAAAAIA | 48 ((VPGVG)1s — | UnsELP- None

N AI.PH 48 ' AK Al.PH48), Al.PH48

O

N 64 ((VPGVG)1s — | UnsELP- Yes - high 00 EEERRREKEREREEERRRKKK Al.PH 64

O AILPH64) ALPH64

I

& 70 ((VPGVG)1s — | UnsELP- Yes - low

AAAAIAGAAAGFAAAAAAAK AI.PH 70 — Al.PH70), Al.PH70 <t 2 DAAAAAAAAAAAAKYHDAAAAAAKDAK 87 ((VPGVG)1s — | UnsELP- Yes - high 0 AI.PH 87

N K AI.PH87)4 Al.PH87 &

L 97 ((VPGVG)1s — | UnsELP- Yes - low

AAAAIAIAAAIAAAAAGQSAAAAAIAAK AlLPH 97

AlLPHI7), AI.PH97

EEEEKKKEEKIKKKKKKKKKKKK AIPH 102 VPGVG):5 — | UnsELP-

Lh Jaemon ewe 123 ((VPGVG) 15 — | UnsELP- Yes - low

AAAAIAAAAAGGAAAAAAAAAK AI.PH 123

Al.PH123), Al.PH123 127 ((VPGVG) 15 — | UnsELP- Yes - low

LEEKKEKEEEKKKHLHILKHELKRKKKK AI.PH 127

AI.PH127) Al.PH127 oman | | [00000 | jm

AI.PH 129

KKDKK Al.PH129), Al.PH129 ms | |, [rome wan

AI.PH 134

EEEEVWTK Al.PH134), Al.PH134 142 ((VPGVG) 15 — | UnsELP- Yes - high

DAAAAAAAAAAADFGDAAAAKDAKK AI.PH 142

AI.PH142)4 Al.PH142 144 ((VPGVG) 15 — | UnsELP- Yes - low

EEEEEEKEEEEEEEEEEEEEEEEKKKE AI.PH 144

Al.PH144), Al.PH144 sas | | [rome wan [vere

AI.PH 162

AKK AI.PH162)4 AI. PH162 171 ((VPGVG) 15 — | UnsELP- Yes - high

DAAAAAAAAAADAAAAADAKK AI.PH 171

ALLPH171)4 Al.PH171

AAAAIAIAAAIIAAAAVAAAQSAAAAAIAA 177 ((VPGVG) 15 — | UnsELP- None

AI.PH 177

K ALLPH177)4 Al.PH177

AIMS predicted sequences showed a-helical conformation in vitro

The UnsELP region (i.e. VPGVG) is an intrinsically disordered sequence lacking a well-defined three-dimensional structure encountering a much greater conformational ensemble mismatch with the AI.PHn. Nevertheless, understanding

N the conformational dynamics of both regions is crucially important to

N comprehending their overall functional features in the solution. To this end, the use

O

T 10 of in silico and in vitro to perform structural characterization of the intact

O seguences were combined. As the starting point, atomistic models of the helices

E were first predicted for the ten best-producing UnsELP-AI.PHn through the use of — the well-established neural network AlphaFold2 for the accuracy comparison (Fig.

S 21). It was found that the AIMS predicted secondary structures to be identical to

S 15 the AlphaFold2 predictions. Similarly, we used AlphaFold2 to also predict the full-

O . . NU = length UnsELP-AI.PHn (Fig. 3a). The results showed considerable predictive power contemplating the major portions of the intact UnsELP-AI.PHn sequences (about 75%) are regarded as regions with low predictive confidence (i.e. UnsELP).

Nevertheless, in all cases, AI.PHn regions were accurately predicted as a-helices conformation. To better evaluate the overall stability of the ALPHn 100 ns MD simulations were performed. All the AI PHn maintained their conformation and eventually self-associate to form di or tetramers by the end of the simulation, except for the UnsELP-AI.PH22 and UnsELP-AI.PH134. Constructing the intra- residue Interaction network throughout the simulation period suggested hydrophobic interactions, as well as hydrogen bonding majorly, contribute to the multimerization of the de novo-designed a-helices (Figs 22-26). In all cases, the

UnsELP regions remained unstructured and adapted collapsed configuration towards the end of simulations.

These observations were followed experimentally using various spectroscopic techniques. First, circular dichroism (CD) was performed. Qualitatively, spectrums from all the Hybrid biomimetic - de novo-designed ELP variants strongly exhibited a-helical conformation (Fig. 3b and 3c). This is evident from two negative ellipticity bands, positioned between 205-210 and 220-225 nm. The spectrum’s shape did not change even during cyclic heating — cooling suggesting high conformational stability and melting temperature (Figs. 27 and 28). The experimental CD spectrums were further compared with predicted CD signals by averaging the structure from the last 100 trajectories of the 100 ns MD simulations performed previously. Exceeding the expectations, it was noted that predicted CD signals were in good agreement with the experimental measurements (Fig. 3b and 3c).

To gain a better understanding of the overall structural conformation, the percentage of secondary structures were quantitatively calculated and the folds from the experimental CD spectrums were determined (Fig. 3d). The a-helices

O formed the second major conformations in all cases (up to 29%). As expected, the

N random coils or disordered structures (indicated as “Others”) formed the major

O overall percentage (36—48%). However, this was in contrast with the secondary © 30 structure calculated from the predicted CD signal indicating a 63-80% contribution

I (Fig. 29). This may indicate that the UnsELP region does not remain completely > unstructured under experimental conditions. The UnsELP may partly adopt other 3 conformations such as turns (<3%), distorted helix (<15%), and right twisted 2 antiparallel (<11%) (Fig. 3d). Such arguments were confirmed plausible after

I 35 performing high-resolution Attenuated total reflection - Fourier transform infrared

Tn (ATR-FTIR) and deconvolute conformationally sensitive Amide-I peak (Fig. 3e and

Fig. 30). The a-helical contribution remained between 19-39%, whereas the random coil, turn, and B-strands, aggregated between 40-60%.

To gain insight into the low-resolution solution structure of the protein, small-angle x-ray scattering (SAXS) spectra (Fig. 3f) were collected. The results indicated that the protein was present in a monodisperse form within the solution. The scattering data were then used to generate a simulated body-ab initio model via direct fitting.

This model was able to accurately account for the globular shape of the monodisperse protein. Further analyses, including low-angle (Guinier) and high- angle (porod) analyses, demonstrated that the data had good quality and showed no significant aggregation. In our investigation of UnsELP-AI.PH45, we determined a porod volume (Vp) of 135,746 Ä3, a radius of gyration (Rg) of 33.46 £0.56 A, and a maximum dimension (Dmax) of 108.75 A. Subsequently, the scattering at wider angles was looked at to probe the molecular structures Fig. 3g. The wide-angle x- ray scattering (WAXS) diffraction patterns exhibited alpha-helical characteristics which could be identified by two main peaks, one at around 1.55 Å”! and the other at around 0.61 A! correspond to constructive interference of the (100) and (010) reflections, respectively. The (100) reflection arises from the periodicity of the alpha helix structure, whereas (010) reflection corresponds to the periodicity in the direction perpendicular to the helix axis in good agreement with the simulated

WAXS in solution based on explicit-solvent all-atom molecular dynamic Fig. 3g.

Additionally, the signal showed a much broader and defused halo ring above 1.55

A! indicative of mixed scattering from unstructured and water molecules. Similar trends were observed for the other variants measuring SAXS and WAXS signals (Figs. 4 and 5)

Al-PHn showed greater molecular stability than naturally occurring sequences

O The conformational stability of the de novo-designed a-helical were benchmarked

N with both naturally occurring and rationally engineered a-helices. Conseguently, a

O set of four proteins were designed. Two of them were made from naturally © 30 occurring a-helices originating from human tropoelastin helix (H.TPE.HELn) (Gene

I Bank ID: GBP83862.1) named UnsELP-H.TPE.HEL1, and UnsELP-H.TPE.HEL2. > Two unrelated but naturally occurring a-helices originating from Bag-worm silk 3 (Gene Bank ID: GBP83861.1) and Euprosthenops australis Major ampullate 1 silk 2 (Gene Bank ID: AM259067.1) were also used. They are indicated as rational

I 35 design and named UnsELP-BGWS and UnsELP-EaMaSp1 respectively. As a

I comparison to de novo-designed UnsELP-AI.PHn, the CD spectrum for these variants were also measured similarly after one cycle of heating and cooling (20—>50—20 °C). CD signals corresponded to a-helical conformation were noted in all cases before the additional salt (Figs 31 and 32). However, three distinct differences were found after the salt was added. Firstly, the spectrums immediately shifted toward B-sheet conformation. Secondly, the B-sheet signal became stronger as the temperature was ramped-up. Finally, the conformational conversion was completely irreversible even during the cooling cycles. To confirm these observations, the a-helical > B-sheet conversion was monitored using

Thioflavin T (ThT) fluorescence detection marker that specifically shows high affinity to the B-sheet structures (Figs 33 and 34) ThT exhibited enhanced intensity at 492 nm, characteristic of the formation of B-sheet structures for various salt versus protein concentrations. The intensity was substantially higher for formulations with higher concentrations. Time-dependent stability tests were also performed for the hybrid - de novo-designed ELPs variants against naturally occurring - rationally designed ELPs using ThT assay (Fig. 35). An increase in the relative fluorescence intensity (RFU) of ThT was noted as the function of time over the period of 40 days indicative of conformational conversion to B-sheets, predominantly for the rationally designed and naturally occurring variants.

Tuneable phase behaviour and supramolecular self-assembly

To explore systematically the condensation responses by simultaneously changing the solution conditions including the protein concentration, ionic strength, and temperature for all the UnsELP-AI.PHn were also set out. Changes in the turbidity of the samples were recorded by changes in absorbance at 600nm to detect condensation in solution indicative of phase separation. This resulted in creating accurate phase diagrams and mapping distinct miscibility phases for each variant.

It also enabled to detect how modulating solution conditions result in altering the

O critical phase separation temperature. About 1,170 solution conditions were tested

N for all the UnsELP-AI.PHn. For the salt, concentrations ranged from O to 2M (pH:

O 7.4), and for the proteins from O to 10 mg/ml (pH: 7.4). All formulation © 30 combinations were then repeated in temperatures ranging from 20 to 90°C with 5

I °C increments resulting in about 150-phase diagrams (Fig. 6d). a 3 In all cases, the shape of the coacervation boundary was concave up-decreasing. 2 Intermixing of the highest protein versus salt concentrations in most cases resulted

I 35 in turbidity change almost instantly by transitioning from transparent to turbid

Tn solutions in a matter of seconds. The observation was found to be predominantly true for lower critical solution temperature (LCST) at 40-50°C. Furthermore, it was observed that not all variants have the same phase separation kinetics. In order,

UnsELP-AI.PH45 > UnsELP-AI.PH162 > UnsELP-AI.PH18 > UnsELP-AI.PH171 >

UnsELP-AI.PH142 > UnsELP-AI. PH20 > UnsELP-AI.PH134 > UnsELP-AI.PH87 >

UnsELP-AI.PH22 > UnsELP-AIPH64 exhibited the fastest to slowest condensation dynamics. The order of the phase separation strongly correlated with the hydrophobicity of the residues, and the length of the AI.PHn, but inversely correlated with the polarity, hydrophilicity, and charged residues in the AI.PHn (Figs 19 and 20).

To better understand the onset of the assembly, one concentration combination close but over the condensation boundary was tested (Fig. 6e-4i). This was implemented to minimize the rapid increase in turbidity and to obtain slower assembly kinetics for more reliable measurements. Each variant exhibited different phase behaviour depending on the helix type. However, at the general level, the variants with the alanine-rich containing UnsELP-AI.PHn showed relatively faster phase separation than alanine-less helices (Fig. 6e, 6f). In addition, the transition temperature (Tt-heating) of the alanine-rich containing AI.PHn was found to be sharper in contrast to alanine-less variants. In all cases, the phase-separated solutions were found to be reversible resulting in clear solutions upon cooling (Tt- cooling) with the condition that the Tt-cooling must drop below the initial Tt- heating. This results in a lag between the Tt-heating and Tt-cooling, identified as thermal hysteresis (ATt-hysteresis). Importantly, the phase behaviour, Tt-heating,

Tt-cooling, and ATt-hysteresis strongly correlated with the concentration of the proteins (Fig. 7g). It was found that these parameters could be modulated effectively to fine-tune the quality of their functionalities with a continuum of physiochemical properties. Furthermore, such condensations can be cyclically formed and deformed with no perceptible changes in thermal behaviours with s repeatable hysteresis (Fig. 6h).

N

O Coacervation of UnsELP-AI.PHn relies on weak-interacting precursors © 30

I Following the observation, over the same temperature range, the dynamics of - phase separation was monitored by measuring the apparent hydrodynamic 3 diameter (dh) for all the variants before and after coacervation using dynamic light 2 scattering (DLS) and correlated that to the changes in the turbidity of the samples.

I 35 (Fig. 6i). This also enabled to get a direct readout for the onset of nucleation

L kinetics to the point of reaching an eguilibrium state. The experimental readouts suggested condensation of the UnsELP-AIPHn follows classical nucleation theory. This is predominantly evident from what seems to be two steps of nucleation-growth followed by a stationary phase (Fig. 6i).

As the first step, the average dh corresponding to the monomer size in water (10 °C) was measured. This ranged from 9 to 22 nm for all the variants. The size of the variant correlated strongly with the length as well as the number of charged residues in the AI.PHn For example, the alanine-less variants with a higher repulsive force such as UnsELP-AIPH22, UnsELP-AIPH20 and UnsELP-

Al.PH134 adopted a more extended shape where alanine-rich variants, such as

UnsELP-AI PH45, UnsELP-AI.PH162 and UnsELP-AI.PH171, exhibited compact conformations. Surprisingly a 3-4 times increase was noted in the dh values after the addition of the salt without altering the solution temperature (10%) predominantly for alanine-rich variants. This suggests the formation of precursor clusters of 3-4 weakly interacting monomers. In the first-order phase transition — described in the classical nucleation theory for a so-called super-saturated system, there exists a nucleation barrier with a requirement for critical precursor cluster size to grow and mature in response to changes in the solution condition. If the system does not pass the nucleation barrier above critical cluster size distribution, condensation does not occur.

To identify the existence of such a phenomenon, at the condensation temperature above and below the phase separation boundary was looked. For that, the temperature of the solution was gradually raised and maintained at 15°C. The formation of larger assemblies of about 50-80 nm was observed. The average dh remained unchanged independent of the incubation time and the UnsELP-AI.PHn variants. On the other hand, increasing the temperature to 20°C almost instantly

O initiated the formation of large mesoscale assemblies. Only with few minutes into

N the equilibration, assemblies with sizes of about 100-130 nm were observed. The

O constant increase of the assemblies implies no redissolution of the proteins with © 30 the surrounding solution which signifies the formation of structurally stable

I assemblies. Over the next temperature ranges, the assemblies were found to - rapidly grew up until reaching an eguilibrium state. A substantial increase in the 3 standard deviation in the late stationary phase was noted leading to the point that 2 measurements could not be considered reliable. This is potentially due to the

I 35 formation of non-uniformly sized assemblies, and that the DLS typically exhibits

L low resolution for large polydispersity. With this, the critical cluster size distribution and the nucleation barrier over which phase separation occurs were defined to be about 100-150 nm.

Reconfigurable structural morphology with increased mechanical properties

The role of liquid-liquid phase separation (LLPS) also described as coacervate or condensate is a ubiquitous mechanism in diverse biological functions such as subcellular membraneless organization, extracellular matrix, and many diseases.

The phenomena also play a key role as intermediate supramolecular structural self-assemblies toward the fabrication of most high-performance structural biomaterials such as fibre, adhesive, and composite. LLPS of elastin functions as an intermediate state during extracellular fibrillar matrix assembly. Moreover, diverse pseudo-biological analogues of elastin, ie., ELPs, provide self- condensation properties. These analogues are created from unstructured motifs found in natural elastin and can be controlled by substitution or adding guest residues, resulting in the modulation of miscibility of distinct ELP phases. The — ability to control the phase behaviour of ELPs enables adjustments in the critical phase separation temperature and demixing profile.

In order to delve deeper into the process of coacervation an investigation into the condensation of the helix incorporated UnsELP-AI.PHn was undertook. For the comparison UnsELP was also took. Spontaneous coacervation under various solution conditions for both cases were observed (Fig. 7a and Fig. 36). However, it was noted that each variant formed into a very distinct coacervate type with significant morphological differences. The UnsELP exhibited spinodal decomposition like condensates, reminiscent of liquid-like coacervated droplets formed after gravitational sedimentation, subsequent coalescence, and large area surface wetting. In contrast, all the UnsELP-AI.PHns showed highly elongated

O bicontinuous interconnected mesoglobular porous networks with solid-like

N behaviors. Independent of helix type, all the UnsELP-AI.PHn showed the same

I To discern the intricacies of the intermolecular interactions and diffusion kinetics of - the two coacervate types, the Fluorescence Recovery After Photobleaching 3 (FRAP) was employed on hydrated specimens (Fig. 37). The FRAP illustrated two 2 very different recovery patterns, with the UnsELP condensates exhibiting rapid

N 35 recovery suggesting high diffusion dynamics, with recovery from O to 80% within

LL 30 seconds, followed by a marginal increase to 90 % in the next 90 seconds. In contrast, the UnsELP-AI.PHn demonstrated a nearly two-fold decrease in diffusion rate, resulting in a modest overall recovery rate of slightly below 38%, indicative of a kinetically arrested state.

To expand on these findings, the utilization of nanoindentation was implemented that, until now, has yet to be applied in the context of supramolecular self- assembly and phase behaviours of structural proteins with reconfigurable structural morphology (Fig. 7a). This was carried out with the notion that the insights assimilated from these observations could most accurately provide a direct read-out for discerning quantitatively the differences in the mechanical properties. The measurements from UnsELP-AI.PHn exhibited elevated elastic modulus, hardness, and stiffness with peak statistical distributions of about 14.9

GPa, 1.3 GPa, and 4.3 x 104 respectively. However, this was 13.4 GPa, 0.6 GPa, and 1.2 x 104 for the UnsELP. The findings in fact indicated incorporating helices motifs markedly increases the mechanical properties (Fig. 7b).

Helix-dependent control over microrheological responses

Microrheology based on dynamic light scattering (DLS) was employed to probe how Al-PHn features govern the condensate viscoelastic properties at a sub- milligram concentration close to the onset of condensation with relatively fast self- assembly kinetics (Fig. 7c—f). These measurements were performed using backscatter detection mode offering the smallest length scale of resolution only accessible to the wavelength of the laser. This enabled to extract frequency- dependent shear moduli after phase separation up to 102-103 Pa over the broad time scale of up to 102—1008 s! certainly inaccessible with conventional mechanical oscillatory microrheology (<10? s'!). &

N To do the measurements 1 um polystyrene microspheres were actively dispersed

O in all the ELP solutions before triggering condensation. In such experimental © 30 conditions, the motion of the tracer particles is solely driven by the thermal

I fluctuations and gives rise to scattering intensity fluctuations (Fig. 7c). By > collecting the scattering photon autocorrelations with the delayed time range of 3 ~10%-10", the accessible frequency range within this window dictates the time 2 scales over which the particle displacements are detected. Intensity

I 35 autocorrelation can then be used to encode the average mean-sguared

I displacement (MSD) of the tracer particle within the scattering volume over the given time lag (Fig. 7c). Thus, the MSD of the tracer particles resident in the solution can be used to extract the freguency-dependent shear modulus of the condensate according to the generalized Stokes-Einstein relation (Fig. 7d, and 7e).

Intriguingly, it was found that the complex viscosity for the resulting condensates at the mesoscale directly correlated with the sequence as well as structure encoded interaction at microscopic properties of each AI-PHn with the recording zero shear viscosity ranging from 2 to 7 CP (Fig. 5f). Among all the variants

UnsELP-AI.PH18, UnsELP-AI.PH22 and UnsELP-AI.PH87 condensates exhibited the lowest viscosity with a Newtonian fluid behaviour. Furthermore, their complex modulus was found to be dominated by the viscous component (with G' < G") throughout the entire frequency range (-102—108 Hz) (Fig. 7).

This was in contrast with the condensate of UnsELP-AI.PH87, UnsELP-AI.PH162,

UnsELP-AI.PH20, UnsELP-AI.PH164, UnsELP-AIPH134, UnsELP-AI.PH142,

UnsELP-AIPH45 and UnsELP-AI.PH171 exhibited greater viscosities and shear- thinning flow (Fig. 7f). However, most importantly the resulting condensates highly resembled Maxwell fluids with G' > G". This is evident from the elastically dominant response at shorter frequencies versus liquid-like behaviour at higher frequencies with a single crossover between the two regimes signifying the average lifetime for the network bonding reconfiguration (Fig. 7f). In other words, the crossover refers to the inverse of the terminal relaxation time of the ELP’s condensate network, above which the condensate responds elastically, in contrast to the viscous behaviour below the given timescales.

Importantly, as evident the mechanical properties of the ELP condensates can be modulated with the physiochemical properties of the AI-PHn. It was noted that the

O stiffness of the condensate network has a strong correlation with hydrophobicity,

N molecular weight, and length. However, this was found to be inversely correlated

O with polarity and hydrophilicity indicative of the importance of noncovalent © 30 crosslinking through AI-PHn domains crucially important for network formation.

I Other factors such as the percentage helicity and chemical crosslinking also - module mechanical properties of the ELP network. 3 2 A multitude of molecular interactions govern the intricate process of LLPS

I 35

L To gain deeper insights into the intricate dynamics of LLPS, all-atom molecular dynamics (MD) simulations were employed (Fig. 8a—f). Given the constraints associated with large-scale simulations of full-length sequences, as opposed to the experimental characterization, it was opted to adopt a single repeat from the

UnsELP-AI.PHn for all the MD simulations. An explicit solvent system (including

Na", and CI) composed of either 10 or 50 single repeating motifs was performed with a temperature ramp up from 5 to 50°C with a 15°C increment for every 50 ns (Table 2).

The result successfully captured the process of phase separation during a thermal cycle for all the variants (Fig. 8a). After careful examination of the simulation, it became apparent that the mechanism of LLPS relay on numerous molecular interactions, including but not limited to hydrophobic, van der Waals, Tr- Tr, cation- 11, and hydrogen bonding for almost all the variants but also electrostatic interactions in some cases (Fig. 8b). These interactions facilitate higher-order structural organization mainly between helix-helix and helix-coil, but also coil-coil domains (Fig. 8b). At the onset of the simulation, it was observed that the oligomers were initially well-solvated and dispersed below the critical LLPS temperature (Fig. 8c—e). As the temperature of the system gradually increased, inter-association and the formation of clusters were noted, which ultimately coalesced to form larger assemblies (Fig. 8a and 8f). This was more prominent for

UnsELP-AI.PH45. During this transition, spherical clusters convert into denser rod assemblies, characterized by a less dynamic and more arrested state by reaching an equilibrium by the end of the simulation.

The results indicated a transformative metamorphosis upon surpassing the transition temperature, the UnsELP-AI.PH45 undergo a dehydration-induced collapse, leading to a reduction in their solubility and surface area requirements (Fig. 8c). The presence of Na” and CI ions showed to further promote this effect

O by hindering the interfacial interactions network between water molecules and the

N UnsELP-AI.PH45 (Fig. 8d). The change in the balance between entropic costs and

O enthalpic gains arising from this collapse tends to drive a thermodynamic © 30 preference for elongated rod-like structures over compact spherical ones, resulting

I in the formation of supramolecular assemblies with distinct morphologies. This > preference can be influenced by several factors, including the length, amino acid 3 sequence, and domain composition of the protein domains. Qualitative differences 2 were noted between the variants. Furthermore, it was also observed that AI.PHn

I 35 tends to act as a sticky multivalent domain forming a stapler unit in between the

L proteins by making intermolecular bridges assisting the formation of elongated morphology.

While there were notable distinctions between the experimental results and molecular dynamics (MD) simulation, a comparable trend towards the formation of elongated, rod-like assemblies could be observed. Specifically, the mesoglobular interconnected networks of UnsELP incorporated AI.PHn qualitatively exhibited a similar morphology, as evidenced by the data presented in Fig. 7a and Fig. 36. In contrast to UnsELP alone, which underwent gualitative adaptation towards a more spherical conformation, indicative of a coacervated state possessing a liguid-like nature under laboratory conditions Fig. 7a and Fig. 36.

The enthalpic stabilization of self-associations through helix-helix, helix-coil, and coil-coil domains that contribute to the formation of the coacervates was also explored. Due to the significant variability in calculating free energy resulting from the structural instability of complexes containing random coils, posing significant challenges for reliable analysis, it was opted to focus exclusively on the profile calculation of transient dimerization of helix-helix homodimers for all AIPHn variants (Fig. 8g). The findings indicate that the association of two helices is highly energetically favourable, with the minimum of the association profile between 0.5— 1.5 nm, corresponding to AGassoc ranging from -5 to -75 kJ/mol. In the order

UnsELP-AI.PH162 > UnsELP-AI.PH18 > UnsELP-AI.PH45 > UnsELP-AI.PH171 >

UnsELP-AI.PH20 > UnsELP-AI.PH64 > UnsELP-AI.PH142 > UnsELP-AI.PH87 >

UnsELP-AI.PH22 > UnsELP-AIPH134 exhibited lowest to highest AGassoc.

Generally the AI.PHn variants in the alanine-rich group exhibited lower free energy than those in the alanine-less, correlating with the faster LLPS formation.

Notably, any significant free energy barriers during the formation of helix-helix complexes were not observed. The only point of minimum free energy observed

O was the global minimum, which indicated that the helices were fully associated

N and maintained contact with each other along their longest dimension. It is also

O noteworthy to mention that while there is no guantitative calculation of the free © 30 energy change between helix-coil and coil-coil complexes, the abundance, type,

I and number of molecular interactions involved suggest that the formation of these - complexes is a highly energetically favourable process with substantial importance 3 during the LLPS process (Fig. 8b).

O

& < 35 The energetically favourable nature of the formation of these complexes can

Tn contribute to the observed hysteresis phenomena during the cooling cycle (Fig. 6b and 6c) The stability of these complexes, driven by favourable molecular interactions, allows them to persist even when the external conditions that initiated their formation have changed, implying that the transition between these two states is not immediate. This persistence leads to a time lag in the reversal of the complex, enhancing the observed hysteresis effect (Fig. 6b and 6c).

Diverse material engineering and application of biomimetic - de novo designed

ELPs

The UnsELP-AI.PHn can be used in varieties of high-added-value biomedical applications including thermoresponsive injectable matrices, drug encapsulation with controlled release, all-agueous-based photoresists, and self-activating compounds in smart windows (Fig. 9—12). To prove these concepts, four variants were selected, two from the alanine-rich set and two from the alanine-less set.

As the starting point, in vitro cytocompatibility assessment was carried out by culturing lung-derived fibroblast cells (WI-38) and epithelial human breast cancer cells (MDA-MB-231) on the surface of glass substrates coated with either

UnsELP-AI.PH45, UnsELP-AI.PH87, UnsELP-AI.PH22 or UnsELP-AI.PH64 (Fig. 9a, and 9b). The assay included the use of a colorimetric cell quantification kit (CCK-8 from Sigma-Aldrich) to detect the viability and proliferation of the cell lines for 4 days. No significant differences were found between the variants used for coating the substrates. All cases could support cell adhesion, growth, and proliferation with no apparent cytotoxicity in comparison to uncoated tissue culture plates (TCP) indicative of their potential use for biomedical applications (Fig. 9a, and 9b).

The first tested application related to the formation of programmable depots given

O the ability of the UnsELP-AI.PHn to undergo phase transition below body

N temperature (Fig. 6a—6d). This is motivated based on the fact that the UnsELP-

O AI.PHn solutions remain liquid and can be injected. However, shortly after © 30 equilibration at 37°C transformed into mechanically stable viscoelastic hydrogels

I (Fig. 8c-8f and 9c). Unlike other thermally triggered ELP depots that exhibit very > limited applications without chemical crosslinking, the UnsELP-AI.PHn has the 3 advantage of self-assembling into the highly porose multiscale interconnected 2 network mediated through noncovalent interactions (Fig. 8a). This also comes with

I 35 the added advantage that the depots can be programmed to have sustained and

L prolonged self-assembly, or controlled disassembly solely by alteration of the concentration of salt and protein concentration in the final injectable formulation (Fig. 9c and 9d). Such scaffolds can be used not only for controlled drug delivery but also for tissue engineering by supporting cell growth and regenerative cell migration. To this end, the stability and tissue incorporation of UnsELP-AI.PH45 with prolonged self-assembly was explored by making injections into the subcutaneous space of a wingette of a sacrificed domestic chicken. Following the 1 ml injection of UnsELP-AI.PH45, a diffusion and accumulation of the compound in the subcutaneous space was observed, resulting in the formation of an oval- shaped depot that was externally visible (Fig. 9e). Subsequent X-ray computed microtomography (Micro-CT) analysis confirmed the uniform and seamless diffusion of the compound, which was associated with a high network density in the surrounding tissue. These findings provide insights into the stability and tissue incorporation of UnsELP-AI.PH45, which are essential considerations for its application in clinical settings.

The second tested application related to drug encapsulation for controlled drug — delivery with a sustained release (Fig. 10). Multiscale 3D scaffolds with diverse morphologies were engineered for tuneable pharmacokinetics and biodistribution profile with a prolonging retention period. Micro- and nanometre size spherical particles were first fabricated by using a collision-type jet atomizer connected to a heated laminar flow reactor and a Berner-type low-pressure impact fractionator suitable for use in inhalers (Fig. 10a—10d). For that UnsELP-AI.PH8/ was used and the synthesis was carried out at six different temperatures below its critical glass transition and degradation (Fig. 10a and 10b). The size (diameter) of the particles is inversely correlated with the temperature of the flow reactor (the higher the temperature smaller the size). High-resolution SEM images showed sizes ranging from -20 nm to ~3 um in diameter (Fig. 10c). Acetylsalicylic acid, paracetamol, and levalbuterol were selected as the ligands for drug release

O experiments due to adeguate water solubility and being well-characterized

N previously. The effect of particle size on the release profile was first examined

O (Fig. 10d). A maximum drug release of approximately 40 to 50% was obtained in © 30 all cases. However, depending on the size of the particles different drug release

I maxima were noted. For the smallest particles synthesized at 100°C, fast-release > matrices of about 2 days were recorded, however, for larger particles data 3 exhibited maximum release between 6 and 10 days. It was also showed that 2 depending on the ligands the release profile varies likely due to their size,

I 35 hydrophobicity, and the strength of their interactions with the UnsELP-AI.PH87

Tn (Fig. 10e).

As alternative approach of making structurally stable scaffolds was also took in which mechanical strength and flexibility are crucially important, such as wound coverage apparatus (Fig. 10f and 10g). Using an electrospinning setup and alternating the solution as well as the spinning conditions, diverse nonwoven morphologies that exhibited very distinct release profiles were fabricated by using

UnsELP-AI.PH45 as the spinning dope (Fig. 10h). This included either smooth and featureless micro-/nanofilaments with a fast one-step release, or with the higher structural complexity of incorporated beads-on-string with the added advantage of a slow dual-step release profile (Fig. 8f).

It was also set out to explore the use of UnsELP-AI.PHn as the next-generation biocompatible and biodegradable photoresist with use in photonic, electronic, tissue engineering, and soft micro-robotics (Fig. 11). Given the high molecular stability of the UnsELP-AIPHn and resistance to increase in the solution temperature, we hypothesized the UnsELP-AI.PHn withstands exposure to the high-energy light source without undergoing molecular degradations. We test this hypothesis first by carrying out conventional micro-contact printing for wide-area surface patterning (Fig. 11a—11c). Through the use of different photomasks, we created diverse microstructures by exposing the spin-coated substrates to a high- intensity laser beam (about 900—1000 mW).

To better test the capability of our engineered photoresist for printing, we then used the state-of-the-art two-photon polymerization setup for the microfabrication of various shapes and morphologies. The result revealed the suitability of the ink for printing stable 3D structures with substantial complexities such as photonic crystals and woodpiles structures with feature resolution down to ~100 nm (Fig.

O 11d). All the printed geometries maintained their structural integrity both in solution

N and after drying capable of resisting vigorous washing treatments.

O

I microfabrication of magnetically controlled soft protein-based micro-robots (named > as “protobots”) with proposed applications for targeted delivery and diagnostics 3 (Fig. 11e—11h). The small scale of the protobots and the use of non-cytotoxic 2 biodegradable building blocks facilitate their translation for clinical applications with

I 35 the added advantages of tuneable tissue interaction, enhanced device integration,

L and minimal immune response. The protobots were microfabricated in the shape of a double spiral drill bit (DSDB) designed specifically for fast-forward velocity propulsion by exerting torque around the helical axis after applying rotational magnetic fields (Fig. 11f). 50 nm chitosan-coated superparamagnetic iron oxide nanoparticles were used as the magnetic transducers. During the printing, they were physically immobilized through covalent crosslinking in the UnsELP-AI.PH22 scaffolding network (Fig. 11e). This enabled steering and actuation of the DSDB- protobots through torgue-based magnetic propulsion along designated trajectories.

The biodegradability of the DSDB-protobots were also tested. This is based on the notion that once the task of any micromachines is accomplished there needs to be an easy mechanism to prevent their accumulation in the body to avoid undesirable chronic inflammation. One way is to physically retrieve the micromachines.

However, the easier approach is taking advantage of the body's naturally occurring and highly sophisticated waste management system to decompose the protobots completely into their smallest constituent building block. Given that our protobots described are 99% protein-based, proteolytic enzymes can easily facilitate their hydrolysis (Fig. 11g). To test this hypothesis, a cocktail of protease enzymes was used including trypsin, papain chymotrypsin or individually.

It was also set out to explore the use of UnsELP-AI.PHn as a self-activating compound in smart windows with the ability to self-modulate the amount of solar radiation passing through the windows without the need for human intervention (the use of blinds or curtains), or external stimuli (applied electric potential in electrochromic windows) (Fig. 12a). In lieu of these conventional methods for indoor thermal regulation, an innovative approach is presented that is cost- effective, scalable, and environmentally sustainable for energy-efficient indoor thermal regulation. The present method involves sandwiching a millimeter-thin layer of UnsELP-AI.PHn between two panes of glass. The smart windows rely on the reversible thermoresponsive behaviour and coacervation of UnsELP-AI.PHn

O into an ultra-white condensate made from a multiscale anisotropically

N interconnected mesoglobular network of low-refractive-index protein (n > 1.3—1.5)

O with the ability to scatter solar radiation covering wavelengths ranging from © 30 ultraviolet, visible, and near-infrared (Fig. 12a—12c, Fig. 6 and Fig. 36). The

I adaptive phase behaviour of the UnsELP-AI.PHn has the potential to darken or > lighten the interior of a building, therefore, modulating the indoor temperature for 3 sustainable building design. g

N 35 Methods

L

AIMS deep neural network architecture

Figure 1 shows the overview of the AIMS protocol for generating de novo design sequences with a-helical conformations assisted by the AIMS deep learning model. The protocol includes three main components: AIMS-GATHER, AIMS-

GENERATE, and AIMS-PROT. As the first step AIMS-GATHER performs data mining to identify homologous templates based on seguences or structural information provided by the expert in the loop based on desired design objectives.

If the AIMS-GATHER is impotent in finding secondary structure homologs to the suggested seguences, the hydrogen bond estimation algorithm (DSSP) is used to compute the relevant structural conformation. This is then followed by calculating two conformationally sensitive dihedral geometrical angles, W and O. In the AIMS architecture, the AIMS-GATHER provides input training data for the AIMS-

GENERATE and AIMS-PROT models, which are both CNN (Convolutional Neural

Network) based deep neural network models.

Inthe second step, the AIMS-GATHER generated data set are used to train AIMS-

GENERATE deep neural network model. The generated training set consisted of 95000 protein entries with a maximum sequence length of 750. Further, AIMS-

GENERATE computed property values for each protein in the training set. Thus, the model learns also about the properties of each protein from the amino acid content (such as hydrophobicity, hydrophilicity, charged, bulkiness, pKa, polarity, solvent accessibility, and a-helix propensity). This is essential while generating new sequences with the desired functionality for a given application. To predict new sequences the input for the model is the target secondary structure and property values. AIMS-GENERATE can generate any number of new candidate sequences. This is accomplished by altering target property values within allowed limits and setting how many different values are to be generated within the limits.

O Also, the length of generated seguences can be altered; it is determined by the

N length of input secondary structure. The result candidate sequences

O (discrimination and filtering phase) are then checked and filtered against the © 30 assigned requirements. First, an initial property check is performed for the

I generated sequences based on the required limits and threshold values for each - input property that can be computed based on the seguence. Further, a similarity 3 check for generated sequences within the training set is done.

O

& < 35 Finally, AIMS-PROT predicts W and & dihedral angles and the secondary

Tn structure for all the de novo designed sequences predicted by AIMS-GENERATE.

Thus, AIMS-GENERATE acts as an “encoder” and AIMS-PROT as a “decoder” in the AIMS architecture. Atomistic molecular dynamics simulation is used in a two-

step procedure of energy minimization and relaxation to assess structural stability for all the newly predicted helices. For training the AIMS-PROT model, the same training set (primary and secondary structure, P, and O dihedral angles) is used.

To predict, the generated new sequences are used as input to AIMS-PROT. Then, the resulting secondary structure was checked against the constituent secondary structure. If the accuracy is not within the required threshold value (e.g. 0.9), the candidate sequence is ignored. After, the remaining sequences with predicted structures are passed into the MD simulation phase, which was used to further discriminate and filter for the highest stability of the generated novel sequence.

Before MD simulation also final similarity check (allowed similarity proportion) against existing proteins (for example search against UNIPROT) is done, and as result, only novel sequences are included for further consideration.

Molecular dynamic simulation for structural validation

Atomistic molecular dynamics simulations were carried out by the CHARMMZ27 with an explicit TIP3P water model as the force field for the MD simulations and

GROMACS was used to run the simulations. The phi-psi angles predicted by the

AIMS-PROT were used to construct the starting 3D model for each helix. All the models for the predicted helices were converted to Protein Data Bank files (.pdb) by using PyMOL. These 3D structures were used as the starting points for the MD simulations. Proteins for which the conformation changed from the structure predicted by the AIMS-PROT during the MD simulation were not considered to be stable. To assess the stability of the structures, the structure of the protein during the ten last time steps of the MD simulations were compared to the structure of the prediction by the AIMS-PROT. These comparisons were done by calculating the

O root mean sguare deviations (RMSD) of all the atoms. If the average RMSD over

N these ten last time steps was more than 3 Å there were considered the protein to

O be unstable and excluded from further evaluation. For all the novel structures the © 30 stability was first assessed after 20 ns of simulation. The proteins that were

I unstable at this point were discarded and for the rest, the simulation time was - extended to 100 ns. The stability of each candidate was assessed again and the 3 unstable proteins were discarded if the average RMSD value exceeded 3 Å.

O

&

N 35 — Multi-protein MD simulations and general MD setup for the phase separation

L

The ten most stable proteins were further used for the multi-protein MD simulations. The details of the simulation setup are gathered in Table 2. For one system, i.e., UnsELP-AIPH45, we have performed large-scale MD simulations with 50 peptides in a simulation box. The NaCl concentration was set to 1.0 mol/l.

Also, for the UnsELP-AI.PH45, we have performed additional simulations for 0.0 mol/l NaCl. After solvation and ion addition, all the systems were energy minimized (50000 steps with a steepest descent minimization algorithm) and followed by 1 ns

NVT and 1 ns NPT equilibrations at 298 K. In the production run, the temperature was increased from 298 to 323 K in 15 K steps of 50 ns each for a total simulation duration of 200 ns. For the UnsELP-AI.PH45, the simulations were additionally continued for the next 200 ns at 323 K.

Table 2. Setup details for the multi-protein MD simulations, where g is charge per single protein, c in the protein concentration, and n number of proteins, water molecules, Na*, and CI ions.

D | ge] | clmgmi] | myten | Muse | me | na | Initial box size nm?] unseie | 0 | 25 | 14 [179729 | 3512 | 3512 | 17.9x17.9x17.9 apr | 6 | 25 | 10 | 182215] 3651 | 3571 | 184x181x184 i

S 15

O

T Separately, in order to determine the free energy of helix-helix interaction, the 3 structured part of the protein was extracted for all the constructs. For each one,

E the two helices were initially placed close to each other in a parallel manner and — 20 equilibrated in 20 ns NPT simulations. The final configuration was used to

S compute the potential of mean force (PMF). After the equilibration of each of the

S considered systems, the pulling simulation protocol was initiated. The force was

S applied to the center of masses of both helices to enforce their dissociation. The value for the corresponding force constant was 1000 kJ/mol/nm?. The helix-helix distance was chosen as the reaction coordinate along which 25-40 windows were selected, depending on the protein ID. Free energy was calculated using the umbrella sampling procedure. The data within each window were collected every 10 ps for a duration of 10 ns per window. The free energy profiles were constructed with the weighted histogram analysis method as implemented in

GROMACS (gmx wham tool). Statistical uncertainties were estimated using the

Bayesian bootstrapping of complete histograms. The temperature was set to 323

K.

In all simulations, the Bussi et al. stochastic velocity rescaling algorithm was used to control the temperature, and the Parrinello-Rahman algorithm was used for the barostat, where the time constants were 0.1 ps and 2.0 ps, respectively. The system pressure was set to 1 bar. The long-range electrostatic interactions were calculated using the PME method, while the van der Waals interactions were described using the Lennard-Jones potential and a 10 nm cut-off. LINCS algorithm constrains the bonds between H and heavy atoms in the protein, while for water molecules the SETTLE algorithm was used. A 2 fs time step was used for integrating the equations of motion. VMD software, Chimera, and ChimeraX have been used for the visualizations.

The cluster analysis, solvent accessible surface area, RMSD, and hydrogen bonding were calculated using the built-in Gromacs tools. For the solvent- accessible surface area determination, the probe radius was 0.14 nm. Hydrogen bonding was assessed based on the geometric criteria for which the acceptor— donor distance was less than 0.35 nm and the H-bond angle was less than 30°.

Interaction network created with the use of Cytoscape.

O Molecular cloning &

O Twenty-five engineered DNA seguences were synthesized by GeneArt gene © 30 synthesis service from Thermo Fisher Scientific. All the sequences were codon-

I optimized for expression in Escherichia coli. Table 1 illustrates all the predicted - seguences used in this work. Encoding seguences engineered by combining 3 fifteen times repeat of commonly used unstructured ELP sequence (VPGVG):s 2 followed by a single AIMS-PROT predicted helices. The tandem repeat of the

S 35 (VPGVG):1s and the Al-predicted helix was then repeated four times in each case

L to create a full-length protein sequence named UnsELP-AIPHn, where "n" corresponds to the identification number of each predicted helix (Table 1). All the synthetic coding fragments were cloned in frame with the C-terminal 6xHis-tag (to facilitate purification) using seamless golden gate assembly in pEt-28a (+) (kanR) protein expression vector (Novagen). 10-beta competent E. coli strain (with the genotype: A(araleu) 7697 araD139 fhuA NilacX74 galK16 galE15 eld 0680d/acZAM15 recA1 relA1 endA1 nupG rpsL (Str*) rph spoT1 A(mrrhsdRMS- mcrBC)) was used for cloning purposes (New England Biolabs). Four different expression strains were used for protein production. The resulting expression plasmids were either transformed into BL21 T7 express™ (fhuA2 lacZ::T7 gene1 [lon] ompT gal sulA11 R(mcr73::miniTn10-TetS)2 [dem] R(zgb 210::Tn10-TetS) endA1 A(mecrC-mrr)114::1S10) (New England Biolabs), BL21 Star™ (FompT hsdSB(rB-mB-) gal dem rne131 (DE3)) for the expression or E. coli strain BL21 (F ompT hsdSB (rB:mB') gal dem (DE3)) (ThermoFisher Scientific). In addition,

ClearColi® BL21 (DE3) (F— ompT hsdSB (rB- mB-) gal dem lon A(DE3 [lac] lacUV5-T7 gene 1 ind1 sam7 ninb]) msbA148 AgqutQAkdsD

AlpxLAlpxMApagPAlpxPAeptA) (Lucigen) was used to produce variants free from endotoxins for animal and cytocompatibility studies. During cloning and expression, Luria-Bertani (LB)-agar plates and LB-medium were used with kanamycin (50 pg/ml) and ampicillin (100 ug/ml) when appropriate.

Protein production

To carry out the expression, single colonies were picked from freshly prepared LB plates (grown overnight) and inoculated into 5 ml cultured media supplemented with kanamycin (50 ug/ml). 5ml starting culture was then incubated for 6-7 hours at 37°C 250 rpm until it was inoculated in 500 ml fresh LB media placed in a 2

Erlenmeyer flask that was then grown at 37°C 250 rpm for approximately 2-3 hours until cells entered a mid-log phase (OD at 600nm ~0.4). At this point,

O expression was chemically induced by the addition of 0.1 mM isopropyl B-D-1-

N thiogalactopyranoside (IPTG)(Sigma-Aldrich), and the temperature was decreased

O to 18°C. Only after 15-20 hours, post-induction cells were harvested by © 30 centrifugation at 16,000 xg, 15 min, 4°C. Cells were either frozen and stored at -

I 80°C or kept cool at 4°C until downstream purification steps. a 3 Protein purification g

I 35 To purify the proteins, cell pellets from 500mL culture were resuspended with 5 ml

L of Lysis Buffer containing 20 mM 4-(2-hydroxyethyl)-1-piperazine ethane sulfonic acid (HEPES) (Sigma-Aldrich), pH 7.5, 200 mM NaCl (Sigma-Aldrich), 20 mM

Imidazole (Sigma-Aldrich), 5 mM Mgcl2 (Sigma-Aldrich), 0.5 mg ml-lysozyme

(Sigma-Aldrich), 0.01 mg ml-DNase | (Sigma-Aldrich) and 1 x SIGMA FAST protease inhibitor cocktail (EDTA). The cells were first incubated with lysis buffer for 45 min at 4°C on a rotating platform before sonication (Qsonica 500) to physically break up the cells using 20-30% amplitude input, for 3 minutes with 2- second intervals (ON-OFF cycle) while keeping the cells on ice. Cell debris was then collected and discarded by centrifugation at 16,000 xg, 60 min, 4°C. The soluble fractions were filtered once with 0.20 um filters and then subjected to purification utilizing HisTrap FF immobilized metal affinity chromatography (IMAC) connected to an AKTA-Pure fast protein liquid chromatography system operated at 4°C. The binding buffer was contained 20 mM imidazole, 200 mM NaCl, pH 7.4, whereas elution buffer contained, 200 mM imidazole, 500 mM NaCl, pH 7.4.

Proteins were eluted from the column through gradient elution assisted by a method created in UNICORN 7 software by altering the eluent strength. For that concentration of imidazole was increased gradually until all the bound proteins — were detached from the column. Buffer exchange was carried out with the Econo-

Pac10 DG desalting prepacked gravity-flow columns (Bio-Rad) against 50 mM

Tris-HCI pH 7.4. The purified protein solutions were then subjected to three times additional washing steps with a final concentrating step using 20 ml sterile

Vivaspin 10 kDa protein concentrator spin columns. Protein concentration was measured by 280 nm absorbance with DS-11 FX Spectrophotometer and their size was analyzed using sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) by running the sample through 4-20% gradient gels (Bio-Rad). Coomassie Brilliant Blue stain was used to stain the SDS-PAGE gels and imaged using Bio-Rad ChemiDoc™ XRS Imaging System. Unless otherwise stated all the samples were kept frozen at -80°C until further analysis.

N Circular dichroism (CD)

N

O To collect the spectrum, Chirascan™ CD equipped with a temperature-controlled © 30 unit (accuracy of £0.1°C) was used while using a QS quartz cuvette with a path

I length of 1 mm (Hellma). Data acquisition was carried out across the wavelength - ranging from 190 to 250 nm. For the measurements setting of 1 nm bandwidth, 3 with a 1 nm step and averaging time of 0.5 s was used. Unless otherwise stated, 2 each measurement was 5 times repeated followed by averaging and smoothing

N 35 the corresponding spectrums. For the temperature ramp-up/down experiments,

Le 5% min? increments with 1 min equilibration time were used. Starting temperature used in all cases was 20°C to 50°C and back to starting temperature.

Temperatures higher than 50°C were not tested due to the reduction of signal.

Prediction CD signal from MD simulations

PDBMD2CD was used to predict CD spectra for all the variants from their structures derived from MD simulations by averaging the final 100 trajectories from 100 ns simulation using the default settings recommended earlier.

Secondary structure prediction from experimental CD signal

Secondary structure determination and fold recognition for both experimental and theoretical protein CD spectrums were carried out using BeStSel methodology as a batch job using the default settings described previously.

Attenuated total reflection Fourier-transform infrared spectroscopy (ATR-

FTIR)

Spectrums were collected using Perkin Elmer / Labsence Spectrum Two Polymer

QA/QC Analysis System FTIR spectrometer equipped with an ATR diamond crystal plate (PIKE Technologies GladiATR). Spectrums were collected in absorbance mode using 60 scans with 120 scan times ranging from 400 to 4000 cm”! and a resolution of 1 cm”. Unless otherwise stated to minimize the noise from water, all the specimens were dried completely for a minimum of 12 hours before measurement.

Scanning electron microscopy (SEM)

O SEM imaging was performed using a Zeiss FE-SEM field emission microscope

N with variable pressure, operating between 1 to 2 kV. Before the imaging, all 2 specimens were sputtered with a 2-3 nm platinum layer. ImageJ Fiji (version 2 30 1.47d) software was used for the visualization and analysis of the micrographs. = > Turbidity measurement 3 2 Phase separation was detected by changes in the turbidity of the protein samples

I 35 at 600 nm absorbance. Measurements were performed using Varioskan Flash™

Le Spectral Scanning Multimode Reader. Measurements were carried out directly on 96-well plates, by varying desired protein versus salt concentration with the final working volume of 300 ul. For the temperature ramp-up, an analog dry block heating system for microplates QBA2 with QDP-H/QDP-F was used in which the temperature was increased from 25°C to 90°C with 5°C min”! increments with 5 min equilibration time. Sequentially, plates were then transferred immediately to the Varioskan Flash™ after reaching the desired temperatures in each step with less than 10 seconds delay before the start of the run. Before every readout, samples were shaken for 2s. As the blank, the same volume of Milli-Q water, protein sample without salt, and salt without proteins sample were used.

Backgrounds were subtracted from the measurement readouts.

Thioflavin T (ThT) assay

ThT powder purchased from Sigma-Aldrich was first dissolved in Milli-Q water to make a stock solution with a concentration of 1 mM. To remove undissolved ThT and impurities from the stock solution, it was first centrifuged at 16,000xg for 15 min and then filtered through a single-use 0.2 um Sartorius Minisart™ Plus

Syringe Filters. Before the measurement, the ThT stock solution was diluted with a protein solution and Milli-Q water to about 0.01 mM working concentration. The mixture was allowed to equilibrate for about 1 hour minutes at 22+1°C carrying out the Fluorescence measurements. Measurements were performed using a — Varioskan Flash™ multi-mode microplate reader with an excitation wavelength of 450 nm and an emission wavelength of 492 nm at 22+1°C.

Dynamic light scattering (DLS)

The size of the proteins after phase separation was recorded with Malvern

Zetasizer Nano-ZS90 (Malvern Instruments Limited, Malvern, UK) at 173

O backscattering angles eguipped with a 633 nm laser (He-Ne) and temperature-

N controlled unit (accuracy of +0.5 °C) using Universal 'Dip' Cell Kit (Product Number

O ZEN 1002). All the protein samples and buffers were filtered using a 0.2 um filter © 30 — before carrying out the measurements. Sizes of the assemblies were calculated

I from the autocorrelation functions recorded by the instrument and the size - distributions were determined via the cumulate method implemented in the 3 Malvern software version 7.02. Results for the size are illustrated as mean with 2 standard deviation after fifteen measurements (N=15).

N 35

Le DLS based microrheology

Change in the viscoelasticity of the proteins before and after phase separation was recorded by the movement of suspending tracer particles in the protein solution and performing the microrheology measurements using Malvern Zetasizer Nano-

ZS90. For that carboxylate Polybead® microspheres 1.00 um (Polyscience) tracer particles were chosen according to optimal size and surface chemistry with minimal interaction with the specimens. The tracer particles were added to the protein solutions in such a mixing ratio illustrating at least 100-fold dominating scattering intensities. This was realized by adding 10 ul of the tracer particle to 1 ml of the desired sample. The mean square displacement (MSD) of tracer particles as a function of time was collected and the acquired data was compiled by a generalized Stokes-Einstein relationship to obtain viscoelastic parameters such as the elastic modulus (G'), viscous modulus (G"), and complex viscosity (cP). The

MSD, tär fri, can be extracted using the electric field autocorrelation function

FS (Tr as in equation 1. In which, g*{&) is the value of the correlation at time zero (intercept). Theoretically this is 1 in a perfect instrument and experimental condition, however it is in general accepted to be below 1 due to optical effects.

The g is the magnitude of the scattering vector, given by g = 477n sin(6/2)/A, where n is the solvents refractive index, A the wavelength of light, and 6 the scattering angle. ao 210) = st0)e (aarti) (1)

The complex shear modulus, G(s), can be then obtained through a unilateral

Laplace transform of the MSD using the generalized Stokes-Einstein relationship (eguation 2):

LO NN kor

O (2)

S where k s is Boltzmann's constant, 7 is the temperature and a is the radius of the

E probe. After which an estimation method, developed by Mason, is utilized to 5 convert G(s) into the complex modulus in the frequency domain, from which the G' 2 30 and G" are extracted using the Euler relationships.

O

N

S Fluorescence recovery after photobleaching (FRAP)

Leica TCS SP5 confocal microscope with FRAP booster (DM5000) and a

DD488/561 dichroid beam splitter at 63x/1.2 (water objective) was used to perform

FRAP analysis. Oregon Green 488 (carboxylic acid, succinimidyl ester, 6-isomer (Thermo Fisher)) was used to label the proteins. Laser at 488 nm with spot diameters of 5 um was used to bleach the samples while tracing the changes in intensity using the Leica AF Lite-TCS MP5, upon passing the emission through the 88/561 dichroid and detected by a standard photomultiplier. Equation (3) was used to fit the data and calculation the diffusion coefficient accordingly.

PER YE: Hef) nr] (3)

Nanomechanical testing

Nanomechanical tests were carried out using an iNano& nanoindenter (KLA Corp.,

USA) eguipped with a Berkovich diamond tip (Synton MDP, Switzerland) and an

InForce 50 MN electromagnetic force actuator. Data were collected using a 100 kHz acquisition rate, 20 us time constant, indentation strain rate 0.200 S'!, a dynamic indentation module (continuous stiffness measurement (CSM), and a target load of 45 mN. Oliver and Pharr method was used to calibrate the diamond area function (DAF) of the tip as described previously.

Laser lithography and surface micro-patterning

The optical setup equipped with 532 nm green DPSS with TTL Modulation High

Power Pumped Laser (CivilLaser) with an output of 1000mW equipped with an adjustable power supply and beam diameter of about 3 mm was used. The beam s was expanded and collimated to approximately x10 using a beam expander in a a form of Galileo telescope using a LC1054-A N-BK7 Plano-Concave lens @1/2", f =

N -25.0 mm (AR coating: 350-700 nm), and AC254-250-A Achromatic Doublet lens, f

S 30 = 250 mm, @1" (AR coating: 400-700 nm). All the lenses as well as the lens

E tubes, and internal and external threads were ordered from Thorlabs (USA). The — desired micropatterns for the lithography were designed and fabricated by jd-

S photodata (United Kingdom) on a 4" Chrome Photomask (negative - right reading

N chrome down) with guartz as the base material (90 mm? - 2.3 mm). The printing

S 35 setup was built in an inverted format to carry out contact printing. Moving from the bottom to the top, the laser was placed upside down followed by the beam expander, beam aligner, photomask, and glass slide (22 mm?, 0.13 mm thickness)

on the top. The entire path length of the platform was 250 mm, whereas the distance of the laser source to the exposed area on the sample was between 220 and 230 mm unless otherwise stated. To print the micropatterns, 3 ul Rose Bengal (100 mg/ml) as the photo cross-linker was intermixed with 27 ul of the desired protein sample (300mg/ml). The mixture was vortexed before spreading it on the surface of the glass slide in direct contact with the photomask and exposed to a laser for 15 minutes at 900-1000 mW. Micropatterns were developed by immersing the glass slide under Milli-Q water, followed by sequential washing steps using 30%, 40%, 50%, 60%, and 70% v/v ethanol. Samples were then completely dried under nitrogen flow and stored in a desiccator until use. 3D lithography

The general setup of the printing of 3D microstructures is depicted in Fig. 10e with variations of structure models and exposure parameters. The 3D printing was done by using a direct laser writing system (Photonic Professional by Nanoscribe

GmbH) with pulse rate of 80 MHz and a wavelength of 780 nm. A photosensitive aqueous mixture of the desired protein, Rose Bengal (Sigma-Aldrich), and ultrapure water was polymerized in the two-photon-absorption (TPA) process based on the designed 3D model of the microstructures. This solution was prepared by mixing 50 mg/ml and 20% wt. Rose Bengal both using water as the solvent. In each printing session, 75 ul protein mixture was dripped onto a 150 um thick glass substrate (30 mm, No. 1, VWR), and an oil immersion objective (100x,

NA=1.4, Zeiss) was snapped from the backside of the substrate. The microstructures CAD designs were sliced into layers with 400 nm spacing in the direction normal to the substrate in the printing program, and these layers were

O then printed by the laser using a bottom-up approach beginning from the

N substrate-solution interface. The woodpile structures were printed with scan speed

O varying from 40 to 60 um/s, and laser power varying from 3 to 35 mW. The Logo © 30 and IWP structures were printed with a fixed scan speed of 40 um/s and laser

I power of 25 mW. After the printing, the samples were immediately developed in - ultrapure water and then stored underwater until further use. It should be noted 3 that the minimal development time used in the experiment was 30 min.

O

&

N 35 Cell viability assay

L

The cytotoxicity for some of the de novo-designed proteins was tested by performing 2D cell culturing. For that, we used two model cell lines: Human normal lung fibroblasts (WI-38), as well as Human metastatic breast cancer cells (TNBC,

MDA-MB-231). Both cell lines were cultured in DMEM high glucose (11965092,

Gibco) with Fetal Bovine Serum (FBS, A3840001, Gibco, 10%) and

Penicillin/Streptomycin (15070063, Gibco, 1%) at 37°C in a humidified incubator with 5% CO2. CCK-8 (WST 8) colorimetric cell quantification kit (96992, Sigma) was used to detect the viability and proliferation of MDA-MB-231 breast cancer cell line and WI-38 fibroblast cell line on the samples for 4 days. The samples were placed on a 6-well plate and MDA-MB-231 and WI-38 cells were seeded on the sample with a density of 1x105 cells/mL and 2.5x10° cells/ml, respectively. 10%

CCK-8 solution was prepared freshly with complete cell culture media. For each measurement, the media from the cells was removed and 1 ml of CCK-8 solution was added to them. After 3 hours of incubation at 37°C in a humidified incubator with 5% CO2, the absorbances were measured at 450 nm wavelength for each cell line and each time point. CCK-8 only solution was used as the blank and cells plated on culture plates without samples were used as control groups. The average values from triplicate readings were calculated for each day and the average value for the blank was subtracted.

Differential scanning calorimetry (DSC)

About five milligrams of lyophilized samples were measured using an analytical scale into platinum alloy pans (TA Instruments, New Castle, DE, USA) and hermetically sealed and analyzed using a DSC instrument 250 (TA Instruments,

New Castle, DE, USA). Pans with samples were equilibrated at 150.00°C, isothermal for 1 min, then equilibrated at 25°C heated followed by ramping up to 300°C at 10°C/min with the final step of equilibration at 40.00°C. Nitrogen was

O purged in the DSC cell at 50 ml/min. Three replicates were analyzed per sample.

N The heating curves were analyzed with Universal Analysis Software (Version 3.9A,

I Thermo gravimetric analysis (TGA) & differential thermal analysis (DTA) a 3 The thermal degradation behavior of the cured samples was tested by using TA 2 Q500 (TA Instruments, New Castle, DE, USA). About 5 mg of lyophilized samples

N 35 sample weighted in a platinum alloy cup. The changes in the weight were recorded - in a nitrogen-rich environment scanning from ambient temperature to 900°C at a heating rate of 10°C /min.

Aerosol micro- and nanoparticles synthesis

Collision-type jet atomizer with nitrogen gas feeding was used to synthesize the spherical particles with sizes ranging from ~20 nm to ~3 um. Synthesis was carried out by using 2% w/v proteins solution in 50 mM phosphate buffer (pH 7.4).

The droplets were suspended at a nitrogen gas flow with a rate of 3 I/min connected to a heated (at 120°C) laminar flow reactor. Droplets were dried into solid spherical particles during flow, which were subsequently cooled at the reactor downstream with 30 I/min air flow and fractionated simultaneously with a Berner- type low-pressure impactor consisting of 10 stages with nominal cutoff diameters ranging from 30 nm to 8 um. For drug encapsulation, 5 mg of paracetamol was added to 1 ml of protein solution and allowed to dissolve completely overnight. For drug release studies 300 mg of the spherical particles were gently compressed in a tablet form. This was done to normalize the substantially large variation in the size of particles and minimize measurement error. Their release profile was quantified by equilibration in phosphate buffer (pH 7.0) for 15 days and reading the amounts of the drugs in the supernatant by measuring absorbances at 312 nm.

Electrospinning

Various formulations were used to fabricate six diverse morphologies seen in Fig. 9C. Unless otherwise stated either pure protein or a mixture of protein- polyethylene oxide (PEO) was used at different pH. Desired protein concentrations were obtained by dissolving lyophilized specimens in hexafluoroisopropanol (HFIP) or 1-butanol, whereas PEO was dissolved in Milli-Q water. The solutions were pumped through a 1 ml syringe connected to a needle (either 18 or 20

O gauge) with a blunt end to an automated Fusion 4000 micro-syringe pump

N (Chemyx). The needle tip and the aluminum foil collector ground were both

O connected to a high-voltage source between 17 to 21 kV. The high voltage was © 30 applied to the needle tip and the collector ground while pumping the solution at

I various rates (0.2-1 ml/h) and a distance between the needle tip and the collector > ground (cm). For produced beads (B) following parameters were used: 1-butanol- 3 ELP 20% w/v, 20 kV, gauge 20, 17.5 cm, 1 ml/h, pH 7.4. Micro-beads on micro- 2 filaments (MBF): HFIP-ELP 20% w/v, 20 kV, gauge 20, 10 cm, 1 ml/h, pH 11. < 35 — Micro-ellipsoids on micro-filaments (MEF): MQ-0.7% PEO, HFIP-ELP 50 %, 17 kV,

L gauge 18, 10 cm, 0.5 ml/h, pH 7.4. Nano-ellipsoids on nano-filaments (NEF): MQ- 0.7% PEO, HFIP-ELP 50%, 20 kV, gauge 18, 10 cm, 0.5 ml/h, pH 7.4.

Nanofilaments (NF): MQ-1 % PEO, HFIP-ELP 20% 21 kV, gauge 20, 15 cm, 0.25 ml/h, gauge 20, pH 7.4. Microfilaments (MF): MQ-1% PEO, HFIP-ELP 20%, 20 kV, gauge 20, 10 cm, 1 ml/h, pH 7.4. For drug encapsulation, 5 mg of either paracetamol, acetylsalicylic acid, or levalbuterol were added to 1 ml of spinning dope and allowed to dissolve completely overnight. Release profiles were quantified by equilibrating squired-shaped pieces (1 x 1 cm) in phosphate buffer - pH 7.0 for 15 days and reading the amounts of the drugs in the supernatant by measuring absorbances at 312, 330, and 220 nm respectively.

LO

N

O

N

O

©

O

I

= 5

O

N oo

N

L

Claims 1. A structural protein comprising 2 or more repeating amino acid sequence units consisting of -a motif (VPGVG)n, wherein n is 2 or more, for example wherein n is 2-100, such as wherein n is 10-20, and -an alpha helical amino acid sequence selected from SEQ ID NOs:6—10 and/or an alpha helical amino acid sequence having at least 90% sequence identity with an amino acid sequence selected from SEQ ID NOs:6-10. 2. The structural protein of claim 1, wherein the alpha helical amino acid sequence is selected from SEQ ID NO:6—10 and is the same in each amino acid sequence unit. 3. The structural protein of claim 1 or 2, comprising 2-30, 2-20, 2-10 or 3-5 repeating amino acid sequence units, for example 4 repeating amino acid sequence units. 4. The structural protein of claim 1 or 3, wherein the alpha helical amino acid sequence is selected from SEQ ID NOs:6—10 and/or an amino acid sequence having at least 95% sequence identity with an amino acid sequence selected from

SEQ ID NOs:6-10. 5. The structural protein of any preceding claims, comprising four repeating amino acid sequence units comprising a motif (VPGVG):s.

O 6. The structural protein of any preceding claims, comprising an amino

N acid sequence selected from SEQ ID NOs: 11-22 and 24-28 and/or an amino

O acid seguence having at least 90% seguence identity, such as at least 95% © 30 sequence identity, with an amino acid sequence selected from SEO ID NOs: 11— z 22 and 24—28. a 3 7. Particles comprising the structural protein of any of claims 1-6 and 2 having an average diameter in the range of 10 nm — 10 um, such as 20 nm — 3

N 35 um.

L

8. An electrospun fibre or filament comprising one or more of the structural proteins of any of claims 1-6, such as wherein the electrospun fibre or filament is in a form of a nonwoven. 9 The particles of claim 7 or the electrospun fibre or filament of claim 8 comprising a pharmaceutical compound. 10. A medical product comprising a pharmaceutical compound and one or more of the structural proteins of any of claims 1-6, such as wherein the medical product is a sustained-release and/or a controlled-release product. 11. The medical product of claim 10, wherein the medical product is an injectable composition or an inhalable composition. 12. The medical product of claim 11, as wherein the medical product comprises the particles of claim 7. 13. A protein-based micro-robot comprising magnetic particles and one or more of the structural proteins of any of claims 1-6, the micro-robot preferably comprising a double-helical or a double spiral drill bit shape enabling moving by rotational movement. 14. A surface comprising micropatterning, such as stripes, honeycomb, pillars or cross lines, comprising one or more of the structural proteins of any of claims 1-6. s 15. Photonic crystals comprising one or more of the structural proteins of

AN any of claims 1-6.

O

I of patterns comprising a grating period in the range of 50-300 nm, the - metamaterial comprising one or more of the structural proteins of any of claims 1— = 6.

O

& < 35 17. The protein-based micro-robot of claim 13, the surface comprising the - micropatterning of claim 14, the photonic crystals of claim 15, or the metamaterial of claim 16 obtained by additive manufacturing.

18. A thermoresponsive glass comprising a layer comprising one or more of the structural proteins of any of claims 1-6 between two sheets of glass, such as wherein the layer comprises material comprising reversible anisotropic interconnected mesoglobular network of the structural protein arranged to undergo phase separation by the effect of temperature to obtain a change in transmittance. 19. A method for preparing a product, the method comprising -providing one or more of the structural proteins of any of claims 1-6 in a dispersion or in a solution, forming the dispersion or the solution into a product comprising one or more of the structural proteins. 20. The method of claim 19, wherein the forming is carried out with one or more of printing, additive manufacturing, particle forming, electrospinning, and lithography, preferable wherein the product is the product of any of claims 7-18.

LO

N

O

N

O

©

O

I

= 5

O

N oo

N

L

Claims

i <2xml vexfiioers"i. 0 encodings="0vv-8"0 3 <!DOCTYPE ST26Sequencelisting PUBLIC "-//WIPO//DTD Sequence Listing 1.3//EN" "ST26SeguenceListing V1 3.dtd"> 3 <5T26SeguenceListing ovrigiosifvestlesthanogvusgelodestent dtdversilonrs"Vi 3" filaName= FREFIGRATO RR” soitwaresNamoess"WIPO Segovenoe" ssitwareVsrsionss"8.3.0% producniondates FRÖRI-0F-11"> 4 <ApplicantFileReference>BP305270</apniicantrileReference> > <ApplicantNsme languasstlodes"ti*>Teknologian tutkimuskeskus VTT Oy«</ApplicantName> € <InventionTitis ilansuamgetodes'sns*>A structural protein, a medical product, an electrospun filament, photonic crystals, metamaterial, thermoresponsive glass and a method for preparing a product</inventionTitle> 7 <SeguencefTotalOuantity>28</SeguenceTotaltuantity> 3 <SeguenceData segjusnoelUNumoersti”> 3 <INSDSeg> <INSDSeg length>30</INSDSag length> Li <INSDSeg moltypea>AA</INSDSeg moltyper TE <INSDSeg division>PAT</1NSDZeg division> 13 <INSDSeg feature-tablaw> id SINSDFPeaturer 35 <INSDFeature key-source</1INSDFesture kev> in <EINSDFeature location>1..30</INSDFssture locations i <ENSDFeature auals> <INSDOuaslifier> iv CINIDQualifier_name>mol_type</INSDUualifier named MU CINEDQualifier value>protein</iNSDdoalifier value” Zl </INSDQualifiers 27 <INSDOualifier jid="g2"> 22 <INSDOvelifier namerorganism</INSDOualifier named 24 <INSDQualifier vaius>synthetic construct </INSDOnualifier values </INSDCGuslifier> 25 </INSDFeature oSuals> TT </INSDFeature> 28 </ INSDSeg feature-table> Za <INSDSeg segusnce>DAAAAAAAAAAAAAYAOKAAAAAAAKDAKKA/INSDSeg sequence 29 </INSDSeg> 31 </SeguenceDatav 32 <SeguenceData seopusnoelDNitmbers"ä"> 33 <INSDSeq> 34 <INSDSeg length>28</INSDSeg length> <INSDSeg moltype>rAA</INSDSeg moltype> 35 SINSDSeq division>PAT</INSDSsg divisiorn> 27 <INSDSeg feature-table> 28 CINSDFsature> 38 <EINSDFeasture key>source</iNSDFeature kev> 43 <EINSDFeature location>1. .28</INSDFgature locations AT <INSDFeature quals> A <INSDOvslifier”y NS CINEDQualifier name>mol type</1NSDOoalifler name> ää <INSDQualifier value>protein</INSUOualifier valued an </ INSDOnalifier> 44 CINGDOualifier id="gd”> 37 CEINSDQualifier name>organism</1NSDouslifiesr name> an <INSnGualifier value>synthetic construct < / INSDOnalifier value> ES </1NSDOvalifier> ST </INSDFesture guals> S </INSDFealure- 52 </INSDSeg feature-table> Se <INSD5eg seguence>DAAAAAAAAAAAAKYHDAAAAAAKDAKK./INSDSsa sexuence> 84 </INSDSeg> </SeguenceData> 56 <SeguenceData setuenostiDNumhers"3*x DT <INSDSeu> DS <INSDSeg length>26</1NSDSeg length Tk <INSDSeg molityperAA</INSDSega moltype> SO <INSDSeg division>PAT</INSDSeg division> 53 <INSD5Seg Psature-table> a2 <INSDFgature> 83 <INSDFeature keyo>source</INSDFeature key då <INSDFeature location>1..26</INSDFeature location 55 <SINSDFeature ouals>

SS CEINSDQualifier> 7 «<INSDOueslifier neme>mol type</INSDOualifier name> EES <INSDQual:ifier value>protein</INSDOnalifier value> 53 <SÄINSDOKAlifier, FO <INEDQualifier id="a8"> Få <INSDGualifier name>organism/INSDOuslifisr name> TA <INSDGualifier value>synthetic construct </INSDOualifier values 753 </1N5Doualifier> Fa </INSDFeature uuals> vä </ INSDFeature> 35 </INSDSeg Fsature-table> K <EINSDSeg seguence>DAAAAAAAAAAAAYFHHAAAAKDAKKE</INSDSeg sequence TR </ INSDSeg> 79 </ SeguenceData> <Seguencelata seguenssliDNumhers"4*> Sh SINSDSeq> SX <INSDSeg length>25</INSDSega lenuth> 223 <INSDSeg moltype>AA</INSDSeg moltype> Sd <INSDSeg division>PAT</INSDSeg division» SS <INBDSeq fsature-table> Ba CSINSDFaeature> 57 <INSDFeature key>source</INSDFeature key> 33 <INSOFeature iocation>1. 25</INSDFeature locakion> KS <INSDFessture malls Sö <INSDOualifier> 31 <INSDQualifier neme>mol type</INSDOnalifier name 32 <INSDQualifier vaiue>protein</INSDOuslifier value, 33 </INSDOuRlifier> SA <INSDoOueslifier id="gS"> SS <INSDOualifier name>organism</iNSDousiifisr name> Gg <INGDQualifier value->synthetic construct </INSDQualifisr value» 27 </INSDQualifiers 28 </INSDFesture auais> 33 </INSDFsature> 199 </INSDSeg Feature-table> Idi <INSDSeq seguence>DAAAAAAAAAAADFGDAAAAKDAKK.</INSDSeg sequencer 102 </ INSDSeg> 103 </SeguenceData> 104 <Segusncebata seguanoaildumbnar="8Ts GE <INSDSeg> 108 <INSDSeg iength>21</1NSDSeg lesnuth> 107 <INSDSeg moltype>AA</INSDSsg moltypae> JSN <INSDSeq divilsion>PAT</INSDSea division 109 <ENSDSeg feasture-table> HES <INSDFeature> ii <INSDFeature _key>source</INSDFsature key> JS <INSDFeature 1iocation>1..21</iNsnFeature location> 133 <CINSDFeature oJguals> Tid <INSDOualifier> 118 <INSDQualifier name> mol type</INSDOuslifier named 118 <EINSDOvalifier valuey>protein</1NSDouslifier values? ANT </INSDO0ualifier> TiS <INSDQualifier id="qiQ”> 430 CINIDQualifier namevorganism</I1NSDOvalifier named Tal CINEDQualifieyr valuersynthetic construct </INSDOualifier value> 121 </ INSDOnalifier> 122 </INSDFeasture avais> 123 </INSDFeature> 124 </INSDSeg Feature-teble> 125 <INSDSeg seguence>DAAAAAAAAAADAAAAADAKK</INSOSeg seguence> 126 </INSDSeg> VET </SeguenceData> 128 <GeguenceData seguemroeiDNumDesys"8"> 123 <IN50Seg> 120 <INSDSeg length>21</1NSDSeg length» 131 <INSDSeg moltype>AA</INSDSesg moliype> 132 <INSDSeg division>PAT</INSDSea divisions 133 <INSDSeg fsasture-table> 134 <INSDFeaturs>

135 <INSDFesature key>source</iNSDFessture kay»

138 <INSDFsature location>l, .21</INshFeature location»

127 <INSDFesture oguals>

128 <INSoCualifier>

139 CEINSDQualifier name>mol type</1NSDOuslifiesr name>

148 <INSDOualifier valuevprotein</INSDOuslifisr valuas>

147% </INSDQualifisr>

TAS <INSDOualifier i="qlav>

LAS CINEDQualifier _namerorganism</INSDgualifier name>

Ladd <INSDQualifier value>synthetic construct </INSDQualifisr value»

145 <SÄINSDOKAlifier,

138 </INSDFeature auvals>

147 </INSDFeature>

148 </INSDSeg festure-table>

LAU <INEDSeg seuusnce> EEERRREKEREREEERRRKKK</iINSDSsa sequence

LEG </INSDSeg>

151 </Sequancelata> ig <fequenceData segaenocsiDNaomb>osrs="%">

152 CINED Saag»

154 <INSDSega length>20</INSDSsg lenoth>

155 CINSDSeq moltype>AA</INSDSesg moitype>

1586 <INSDSeg division>PAT</1NsDSeg division>

TDi <INSDSeg feature-table>

158 <INSDFeatbure>

155 <SINSDFeature Key>source</INSDFesasture kay»

LEG <INSDFeature location>1..20</iNSDFeature location

NEN <INSDFesture ouals>

182 <INSDOuslifier>

183 <INSDOualifier name>mol type</INSDOuslifisr name>

164 <INSDOualifier value>protein</INSDouasiifisr valued

165 </1NSDOvalifier>

188 <INSDOvalifier i1a="gls">

187 <INSDOueslifier namerorganism</INSDQualifier name>

148 <INSDQualifier vaine>synthetic construct </INSDQualifier_value>

168 </INSDOuRlifier>

158 </INSDFeature auais>

NIETO </INSDFesture>

VEE </INSDSeg feature-tabls>

173 <INEDSesg seguence>EEEEEREKEDEEEEEEEKKE</INSDSsgs seguence>

174 </INSDSeg>

175 </SeguenceData>

1795 <SeguenceData sequence iUNambar="8%>

177 <INSDSeg>

YOR <INSDSeg length>23</INSDSsza lenoth>

VTS <INSDSeg moltvpe>AA</INSDSeg moitype>

150 <INGDSega division>PAT</1N5DSeg division>

Tai CINEDSeq feature-tabls>

LEE <INSDFealure>

183 <INSDFesture key>source</INSDFesture key

184 <INSDFesture location>1. . 23</INSOFeature location

185 <INSDFeature auals> a8 <CINSDOualifier> iw <INSDOualifier name>mol_type</INSDQualifisy name>

1385 <INSDOnalifier value-protein</1NSDOvalifier value”

158 </1N5Doualifier>

120 <INSnDGualifier i1ä="gle">

131 <INSDQual:ifier namerorganism</INiDQualifier name

132 <INSDOualifier vaiuse>synthetic construct </1N5Do0ualifier value>

183 </ INSDOualafier>

104 </INSDFeature auais>

195 </INSDFeature>

ThE </INSDSeg feature-teble>

27 <INSDSeg semisnce>EEELLKKEVVLLEELLEELEELL</INSDSea seguence>

138 </ INSDSeg>

12% </ SaguenceData>y

209 <SeguenceData segjusnoelUNumöoerstust >

201 <INSDSsg>

202 <INSDSeg length>29</INSDSsg length>

SUS <INSDSeg moltypea>AA</INSDSeg moltype>

PERERA <INSDSeg division>PAT</1NSDZeg division>

TGS <INSDSeg feature-tablaw>

208 <INSDFeatlure>

207 <INSDFeature key-source</1INSDFesture kev>

208 <EINSDFeature location>1..29</INSDFssture locations

209 <INSnFeature auals>

218 <INSDOuaslifier>

231 <INSDOualifier name>mol type</I1NSDOvalifier named

JUN CINEDQualifier value>protein</iNSDdoalifier value”

23% </INSDQualifiers

214 <INSDOualifier id="gl8">

235 <INSDOvelifier namerorganism</INSDOualifier named

218 <INSDQualifier vaius>synthetic construct CAINGDQualifier valuen

ATT </INSDQualifier>

213 </INSDFeature oSuals>

TLS </INSDFeature>

TEO </ INSDSeg feature-table> vel <INSDSeg sequence>EEELLKREEKLLLELLLLEEELEELEELL</ IN3D Sag sequencer

222 </INSDS]GGJ>

223 </SeguenceDatav

223 <SeguenceData seopusnoelDitumoers"io0" v

225 <INSDSeg>

ES <INSDSesg length>36</INSDSeg length> ku? <INSDSeg moltype>rAA</INSDSeg moltype>

YS <INSDSsg division>PAT</INSDSsg division»

KES <INSDSeg feature-tablas>

220 <INSDoFeature-

23% <EINSDFeasture key>source</iNSDFeature kev>

232 <EINSDFeature location>1..36</INSDFgature locations

235 <INSDFeature quals>

234 <INSDQualifier>

ED CINEDQualifier name>mol type</INSbQualifier name>

FARE <INSDQualifier value>protein</INSUOualifier valued

227 </ INSDOnalifier>

228 SINSDOualifier id="q80">

239 CEINSDQualifier name>organism</1NSDouslifiesr name>

248 <INSnGualifier value>synthetic construct < / INSDOnalifier value>

281 </1NSDOvalifier>

JAN </INSDFesture guals>

PE </INSDFeature>

288 </INSDSeg feature-table>

245 <INSD5eg seguence-EEOEEEEDLOEEEVLEEEEEEEEEOEEEEEEVVVTK</INSDSea sequence

248 </INSDSeg>

237 </SeguenceData>

ZAR <SeguenceDatas seguenosiDNumosrs"313">

KAS <INSDSeu>

EY <INSDSeg lengthk>105</1NSDSeg iength>

221 <INSDSeg molityperAA</INSDSega moltype>

87 <INSDSeg division>PAT</INSDSeg division>

252 <INSD5Seg Psature-table>

253 <INSDFgature>

255 <INSDFeature keyo>source</INSDFeature key

258 <INSDFeature location>1..105</INSDFesture location> an <INGDFsature ouals>

ADS CEINSDQualifier>

Zi «<INSDOueslifier neme>mol type</INSDOualifier name>

LEG <INSDQual:ifier value>protein</INSDOnalifier value»

253 <SÄINSDOKAlifier,

252 <INSDoualifier id="ss3”>

2863 <INSDOualifier name>organism/INSDOuslifisr name>

284 <INSDGualifier value>synthetic construct </INSDOualifier values

205 <SINSDQualifier>

EEE </INSDFeature cuals> zed </ INSDFeature>

258 </INSDSeg Fsature-table>

258 <INSDSeg seguence>

VPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGV GDAAAAAAAAAAAAAYAQKAAAAAAAKDAKK< / INSOSS 4 Sequence> 270 </INSDSeg>

TÅ </SeguenceData>

KT <SeguenceData seguanosiDioahase="13%>

FE <INSDSeg>

278 <INSDSeg length>420</INSDSsg length>

BD <INSDSeg moktype>sAA</INSDSeq moltype>

278 <INSDSeg division>PAT</INSDSea divisions

<INSDSeg fsasture-table>

2508 <INSDFeaturs>

HIS <INSDFesature key>source</iNSDFessture kev>

ZSG <INSDFsgature location>1..420</INSUFeature losatien> voi <INSDFesture guals>

2822 <INSoCualifier>

283 CEINSDQualifier name>mol type</1NSDOuslifiesr name>

282 <INSDOualifier valuevprotein</INSDOuslifisr valuas>

285 </INSDCGuslifier>

238 <INSDOualifier i0="as8">

237 CINEDQualifier name>organism</INSbQualifier name>

258 <INSDQualifier value>synthetic construct

</INSDQualifisr value»

2823 <SÄINSDOKAlifier,

2892 </INSDFeature auvals>

293 </INSDFeature>

282 </INSDSeg festure-tahie>

YS SINSDSeg segusnce> VPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGV GDAAAAAAAAAAAAAYAQKAAAAAAAKDAKKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPG VGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGDAAAAAAAAAAAAAYAQKAAAAAAAKDAKKVPGVGVPGVGVP GVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGDAAAAAAAAAA AAAYAQKAAAAAAAKDAKKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVG VPGVGVPGVGVPGVGVPGVGDAAAAAAAAAAAAAYAQKAAAAAAAKDAKK </ TNED Sx go sequence

284 </ INSDSeg>

295 </SeguenceData>

285 <SeaguenceData seguenseilDiumb>eyrs"13"> a? <INSDSeg>

258 <INSDSeg length>464</INSDSeg length>

2533 <INSDSeg moltype-AA</INSDSeg moltype>

309 <INSDSeq divilsion>PAT</INSDSea division

301 <ENSDSeg feasture-table>

302 <INSDFeature>

303 <INSDFeature ksy>source</iINSDFeature key>

SUA <INSDFesture 1iocation>1..464</i1NSCFeature location»

SE SINSDFeature guals>

SE <INSDOualifier>

307 <INSDQualifier name> mol type</INSDOuslifier named

308 <INSDQualifier valuey>protein</1NSDouslifier value

309 </ INSDOualafier>

IAQ <INSDQualifier id="q28">

ER CINIDQualifier namevorganism</I1NSDOvalifier named

SYY <INSDthslifier value->synthetic construct

</INSDOualifier value>

Si: </ INSDOnalifier>

3348 </INSDFeasture avais>

335 </INSDFeature>

3160 </INSDSeg Feature-teble>

337 <INSDSeg seguence> MGKETAAAKFERQHMDSSAVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVG VPGVGVPGVGVPGVGVPGVGDAAAAAAAAAAAAAYAQKAAAAAAAKDAKKVPGVGVPGVGVPGVGVPGVGVPGV GVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGDAAAAAAAAAAAAAYAQKAAAAA AAKDAKKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVP GVGVPGVGDAAAAAAAAAAAAAYAQKAAAAAAAKDAKKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGV PGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGDAAAAAAAAAAAAAYAQKAAAAAAAKDAKKSSKET AAAKFERQHMDSLEHHHHHH< / INSD Seq segquence>

318 </ INSDSeg>

339 </SeguenceData>

EU <Segusncebata segdguenssiDiNumiseyrs"3St>

Ful <INSDSeg>

Re <INSDSeg length>456</INSDSeq length>

223 <INSDSeg moltype-AA</INSDSeg moltype>

224 <INSDSeq divilsion>PAT</INSDSea division

325 <ENSDSeg feasture-table>

328 <INSDFeature>

IEE <INSDFeature _key>source</INSDFsature key>

SG <INSDFesture 1ocation>1..456</i1NSCFeature location»

SER SINSDFeature guals>

330 <INSDOualifier>

321 <INSDQualifier name> mol type</INSDOuslifier named

332 <EINSDOvalifier valuey>protein</1NSDouslifier values?

333 </INSDO0ualifier>

334 <INSDQualifier id="q28>

335 CINIDQualifier namevorganism</I1NSDOvalifier named

336 <INSDthslifier value->synthetic construct

</INSDOualifier value>

327 </ INSDOnalifier>

238 </INSDFeasture avais>

239 </INSDFeature>

330 </INSDSeg Feature-teble>

341 <INSDSeg seguence> MGKETAAAKFERQHMDSSAVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVG VPGVGVPGVGVPGVGVPGVGDAAAAAAAAAAAAKYHDAAAAAAKDAKKVPGVGVPGVGVPGVGVPGVGVPGVGV PGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGDAAAAAAAAAAAAKYHDAAAAAAKD AKKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGV PGVGDAAAAAAAAAAAAKYHDAAAAAAKDAKKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVP GVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGDAAAAAAAAAAAAKYHDAAAAAAKDAKKSSKETAAAKFERO HMDSLEHHHHHH</INSDSeg sesguence>

342 </ INSDSeg>

3453 </SeguenceData>

344 <Segusncebata seguanoseibiumbar="189>

KE <INSDSeg>

Sas <INSDSeg length>448</INS0DSeg length>

307 <INSDSeg moltype-AA</INSDSeg moltype>

248 <INSDSeq divilsion>PAT</INSDSea division

339 <ENSDSeg feasture-table>

350 <INSDFeature>

352 <INSDFeature ksy>source</iINSDFeature key>

SEN <INSDFesture 1iocation>1..448</i1NSCFeature location»

303 SINSDFsature oguasls>

S5a <INSDOualifier>

3585 <INSDQualifier name> mol type</INSDOuslifier named

258 <INSDQualifier valuey>protein</1NSDouslifier value

357 </INSDO0ualifier>

3558 <INSDQualifier id="s30">

ING CINIDQualifier namevorganism</I1NSDOvalifier named

IEG CINEDQualifieyr valuersynthetic construct

</INSDOualifier value>

Sel </ INSDOnalifier>

252 </INSDFeasture avais>

283 </INSDFeature>

384 </INSDSeg Feature-teble>

365 <INSDSeg seguence> MGKETAAAKFERQHMDSSAVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVG VPGVGVPGVGVPGVGVPGVGDAAAAAAAAAAAAYFHHAAAAKDAKKVPGVGVPGVGVPGVGVPGVGVPGVGVPG VGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGDAAAAAAAAAAAAYF HHAAAAKDAKKV PGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVG DAAAAAAAAAAAAYFHHAAAAKDAKKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPG VGVPGVGVPGVGVPGVGVPGVGVPGVGDAAAAAAAAAAAAYFHHAAAAKDAKKSSKETAAAKFERQHMDSLEHH HHHH</INSDSeg seguenoe>

368 </ INSDSeg>

387 </SeguenceData>

350 <SeaguenceData seguenseilDiumb>eyrs"18">

363 <INSDSeg>

STG <INSDSeg length>444</INSDSeg length»

37 <INSDSeg moltype-AA</INSDSeg moltype>

372 <INSDSeq divilsion>PAT</INSDSea division

3753 <INSDSeg fsature-table>

ITA <INSDFeature>

IEE <INSDFeature _key>source</INSDFsature key>

KIERSI <INSDFesture 1iocation>1..444</i1NSCFeature location»

377 SINSDFsature oguasls>

SVE <INSDOualifier>

373 <INSDQualifier name> mol type</INSDOuslifier named

380 <INSDQualifier valuey>protein</1NSDouslifier value aad </INSDO0ualifier>

352 <INSDQualifier id="s32">

EI CINIDQualifier namevorganism</I1NSDOvalifier named

SJ CINEDQualifieyr valuersynthetic construct </INSDOualifier value> SEE </ INSDOnalifier> 383 </INSDFeasture avais> 387 </INSDFeature> 388 </INSDSeg feature-tablar> 339 <IN5SDSeg seguence> MGKETAAAKFERQHMDSSAVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVG VPGVGVPGVGVPGVGVPGVGDAAAAAAAAAAADFGDAAAAKDAKKVPGVGVPGVGVPGVGVPGVGVPGVGVPGV GVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGDAAAAAAAAAAADFGDAAAAKDAKKVPG VGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGDA AAAAAAAAAADFGDAAAAKDAKKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVY PGVGVPGVGVPGVGVPGVGVPGVGDAAAAAAAAAAADFGDAAAAKDAKKSSKETAAAKFERQHMDSLEHHHHHH </INSDSeg seguengoa> 300 </ INSDSeg> 391 </SeguenceData> SSE <Segusncebata segdguenssiDiNumiseyrs"37"> 353 <INSDSeg> 230 <INSDSeg length>428</INS0DSeg length> 295 <INSDSeg moltype-AA</INSDSeg moltype> 2985 <INSDSeq divilsion>PAT</INSDSea division 397 <ENSDSeg feasture-table> 3858 <INSDFeature> SSS <INSDFeature ksy>source</iINSDFeature key> JUU <INSDFesture 1iocation>1..428</1NSCFeature location» 401 <CINSDFeature oJguals> 4072 <INSDOualifier> 452 <INSDQualifier name> mol type</INSDOuslifier named 353 <EINSDOvalifier valuey>protein</1NSDouslifier values? 305 </INSDO0ualifier> 408 <INSDQualifier id="q3$"> A407 CINIDQualifier namevorganism</I1NSDOvalifier named AUS <INSDthslifier value->synthetic construct </INSDOualifier value> 443 </ INSDOnalifier> 410 </INSDFeasture avais> 331 </INSDFeature> 312 </INSDSeg Feature-teble> 4173 <INSDSeg seguence> MGKETAAAKFERQHMDSSAVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVG VPGVGVPGVGVPGVGVPGVGDAAAAAAAAAADAAAAADAKKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPG VGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGDAAAAAAAAAADAAAAADAKKVPGVGVPGVGVY PGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGDAAAAAAAAA ADAAAAADAKKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPG VGVPGVGVPGVGDAAAAAAAAAADAAAAADAKKSSKETAAAKFERQHMDSLEHHHHHH </INSDSeg seguengoa> ATA </INSDSeq> ALS </SeguenceData> 415 <Segusncebata segdguenssiDiNumiseyrs"38"> {17 <INSDSeg> 418 <INSDSeg length>428</INS0DSeg length» 418 <INSDSeg moltype>AA</INSDSsg moltypae> 322 <INSDSeq divilsion>PAT</INSDSea division 42% <ENSDSeg feasture-table> 422 <INSDFeature> AES <INSDFeature ksy>source</iINSDFeature key> ATA <INSDFPeature 1iocation>1..428</1NSCFeature location» ARE <CINSDFeature oJguals> AEE <INSDOualifier> 427 <INSDQualifier name> mol type</INSDOuslifier named 328 <EINSDOvalifier valuey>protein</1NSDouslifier values? 329 </ INSDOualafier> 430 <INSDQualifier id="s30"> AST <INSDQualifier namevorganism</I1NSDOvalifier named ASIN <INSDthslifier value->synthetic construct </INSDOualifier value> AIR </ INSDOnalifier> 424 </INSDFeasture avais> 335 </INSDFeature> 336 </INSDSeg Feature-teble> 437 <IN5DSeg sequence> MGKETAAAKFERQHMDSSAVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVG

VPGVGVPGVGVPGVGVPGVGEEERRREKEREREEERRRKKKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPG VGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGEEERRREKEREREEERRRKKKVPGVGVPGVGV PGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGEEERRREKER EREEERRRKKKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPG VGVPGVGVPGVGEEERRREKEREREEERRRKKKSSKETAAAKFERQHMDSLEHHHHHH </INSDSeg seguengoa>

A38 </ INSDSeg>

AG </SeguenceData>

JAN <SeaguenceData seguenseilDiumb>eyrs"138">

{43 <INSDSeg>

447 <INSDSeg length>424</INSDSeg length»

442 <INSDSeg moltype>AA</INSDSsg moltypae>

333 <INSDSeq divilsion>PAT</INSDSea division

335 <ENSDSeg feasture-table>

448 <INSDFeature>

AR? <INSDFeature ksy>source</iINSDFeature key>

AAI <INSDFesture 1iocation>1..424</1NSCFeature location»

44% <CINSDFeature oJguals> a50 <INSDOualifier>

451 <INSDQualifier name> mol type</INSDOuslifier named

352 <EINSDOvalifier valuey>protein</1NSDouslifier values?

353 </INSDO0ualifier>

A454 <INSDQualifier 1d="g38">

ADD <INSDQualifier namevorganism</I1NSDOvalifier named

A454 <INEDQualifier_valuersynthetic construct

</INSDOualifier value>

457 </ INSDOnalifier>

458 </INSDFeasture avais>

358 </INSDFeature>

360 </INSDSeg Feature-teble>

481 <IN5DSeg sequence> MGKETAAAKFERQHMDSSAVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVG VPGVGVPGVGVPGVGVPGVGEEEEEREKEDEEEEEEEKKEVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGV GVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGEEEEEREKEDEEEEEEEKKEVPGVGVPGVGVPG VGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGEEEEEREKEDEE EEEEEKKEVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGV PGVGVPGVGEEEEEREKEDEEEEEEEKKESSKETAAAKF EROHMDSLEHHHHHH</1NSDSeg seguence>

382 </ INSDSeg>

483 </SeguenceData”

ASA <SeguenceData soguennalliiumbar="20% >

485 <INSDSeg>

ASE <INSDSeg length>436</T1NSDSeg Tengtbh>

487 <INSDSeg moitype-rAA</INS0Seg moltype>

468 <INSDSeg division>PAT</INSDSeg division»

359 <INSDSeq fsature-tabler

370 SINSDFesature>

STL <INSDFeature key>source</INSnreature key>

Az <SINSDFeature iocation>1..436</1INSDFeasture location”

A773 <INSDFessture malls

474 <INSDOualifier>

475 <INSDQualifier neme>mol type</INSDOnalifier name

478 <INSDQualifier value>protein</INSDQualifier value,

377 </INSDOuRlifier>

378 <INSDOueslifier id="g80">

ATS <INSDOualifier name>organism</iNSDousiifisr name>

ASU <SINSDOualifier value->synthetic construct

</INSDOualifier value>

451 </INSDQualifiers

482 </INSDFesture auais>

482 </INSDFsature>

383 </INSDSeg Feature-table>

485 <INSDSeg seguence> MGKETAAAKFERQHMDSSAVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVG VPGVGVPGVGVPGVGVPGVGEEELLKKEVVLLEELLEELEELLVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGV PGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGEEELLKKEVVLLEELLEELEELLVPGVGVP GVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGEEELLK KEVVLLEELLEELEELLVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVP GVGVPGVGVPGVGVPGVGEEELLKKEVVLLEELLEELEELLSSKETAAAKFERQHMDSLEHHHHHH </INSDSeg seguencs>

Ang </INSDSeqr

An </SequenceDatar

438 <SeguenceData soguennalliiumbar="21%-

ATT <INSDSeg>

A540 <INSDSeg length>460</INSDSeg Tength>

431 <INSDSeg moitype-rAA</INS0Seg moltype>

492 <INSDSeg division>PAT</INSDSeg division»

393 <INBDSeq fsature-table>

394 SINSDEaature>

485 <INSDFeature key>source</INSnreature key>

448 <SINSDFeature iocation>1..460</1INSDFeasture location”

{07 <INSDFessture malls

4593 <INSDOualifier>

433 <INSDQualifier namermol_type</INilQualifier name

S00 <INSDQualifier value>protein</INSDQualifier value,

KIRIN </INSDOuRlifier>

HÖR <INSDQualifisr id="a$8>

533 <INSDOualifier namerorganism</INSDQualifisy name>

S04 <INGDQualifier value->synthetic construct

</INSDQualifisr value»

SOS </INSDQualifiers

Si </INSDFesture auais>

507 </INSDFsature>

548 </INSDSeg Feature-table>

509 <ENSDSeg seguencae> MGKETAAAKFERQHMDSSAVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVG VPGVGVPGVGVPGVGVPGVGEEELLKREEKLLLELLLLEEELEELEELLVP GVGVPGVGVPGVGVPGVGVPGVG VPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGEEELLKREEKLLLELLLLEEELEE LEELLVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGV GVPGVGEEELLKREEKLLLELLLLEEELEELEELLVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGV GVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGEEELLKREEKLLLELLLLEEELEELEELLSSKETAAAK FEROHMDSLEHHHHHH</iNSDSeg sequence>

S18 </INSDSeqr

SIA </SeguenceData”

SIE <SeguenceData szecuensealD>DNumbers"S8te

LLG <INSDSeg>

DIA <INSDSaqg length>488</INSDSeg lengtb>

G15 <INSDSeg moitype-rAA</INS0Seg moltype>

318 <INSDSeg division>PAT</INSDSeg division»

Hi? <INBDSeq fsature-table>

SIN SINSDEaature>

SEG <INSDFeature key>source</INSnreature key>

SG <SINSDFeature iocation>1..488</1INSDFeasture location”

Ss <INSDFessture malls

TE <INSDOualifier>

523 <INSDQualifier neme>mol type</INSDOnalifier name

520 <INSDQualifier vaiue>protein</INSDOuslifier value,

S25 </INSDOuRlifier>

525 <INSDouaslifier id="a$$">

Ba <INSDOualifier namerorganism</INSDQualifisy name>

BEd <INGDQualifier value->synthetic construct

</INSDQualifisr value»

LEG </INSDQualifiers

530 </INSDFesture auais>

S21 </INSDFsature>

532 </INSDSeg Feature-table>

533 <ENSDSeg seguencae> MGKETAAAKFERQHMDSSAVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVG VPGVGVPGVGVPGVGVPGVGEEQEEEEDLOQEEEVLEEEEEEEEEQEEEEEEVVVTKVPGVGVPGVGVPGVGVPG VGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGEEQEEEEDLOQEEEVLEE EEEEEEEQEEEEEEVVVTKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVG VPGVGVPGVGVPGVGVPGVGEEQEEEEDLOQEEEVLEEEEEEEEEQEEEEEEVVVTKVPGVGVPGVGVPGVGVPG VGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGEEQEEEEDLOQEEEVLEE EEEEEEEQEEEEEEVVVTKSSKETAAAKFERQHMDSLEHHHHHH< / INSDSeq sequencer

534 </ INSDSeg>

535 </SeguenceData”

536 <SeguenceData szecuensealD>DNumbers"S3">

237 <INSDSeg>

LEG <INSDSaqg length>432</INSDSeg lengtb>

S23 <INSDSeg moitype-rAA</INS0Seg moltype>

S540 <INSDSeg division>PAT</INSDSeg division»

S43 <INBDSeq fsature-table>

532 SINSDEaature>

SAS <INSDFeature key>source</INSDFeature key>

544 <SINSDFeature iocation>1..432</1NSDFeasture location”

TAL SINSDFesture ouslsv

DAT <INSDOualifier>

587 <INSDQualifier neme>mol type</INSDOnalifier name

S48 <INSDQualifier value>protein</INSDQualifier value,

549 </INSDOuRlifier>

558 <INSDOueslifier id="gSe">

S51 <INSDOualifier namerorganism</INSDQualifisy name>

SN <SINSDOualifier value->synthetic construct

</INSDOualifier value>

SIG </INSDQualifiers

554 </INSDFesture auais>

855 </INSDFsature>

558 </INSDSeg Feature-table>

Sy <ENSDSeg seqQuence> MGKETAAAKFERQHMDSSAVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVG VPGVGVPGVGVPGVGVPGVGAAAAAAAAAEAAAAAAAAAAAAVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVP GVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGAAAAAAAAAEAAAAAAAAAAAAVPGVGVPGV GVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGAAAAAAAA AEAAAAAAAAAAAAVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVG VPGVGVPGVGVPGVGAAAAAAAAAEAAAAAAAAAAAASSKETAAAKFERQHMDSLEHHHHHH </INSDSeg seguencs>

BHR </ INSDSeg>

SHG </SequenceDatar

560 <SeguenceData szecuensealD>DNumbers"S&te

SSI <INSDSeg>

LEE <INSDSaqg length>452</INShDSeg lengtb>

TR <INSDSeg moitype-rAA</INS0Seg moltype>

Sad <INSDSeg division>PAT</INSDSeg division»

585 <INBDSeq fsature-table>

566 SINSDEaature>

587 <INSDFeature key>source</INSnreature key,

568 <SINSDFeature iocation>1..452</1NSDFeasture location”

LEG <INSDFessture malls

570 <INSDOualifier>

STA <INSDQualifier neme>mol type</INSDOnalifier name

532 <INSDQualifier vaiue>protein</INSDOuslifier value,

573 </INSDOuRlifier>

STA <INSDQualifier 1d="aå48">

S75 <INSDOualifier name>organism</iNSDousiifisr name>

SEE CINSDOualifiexr value-synthetic construct

</INSDQualifisr value»

Ev </INSDQualifiers

5738 </INSDFesture auais>

573 </INSDFsature>

5834 </INSDSeg Feature-table>

Sad <ENSDSeg seguencae> MGKETAAAKFERQHMDSSAVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVG VPGVGVPGVGVPGVGVPGVGEEEEEEKEEEEEEEEEEEEEEEEKKKEVPGVGVPGVGVPGVGVPGVGVPGVGVP GVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGEEEEEEKEEEEEEEEEEEEEEEEKKK EVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPG VGEEEEEEKEEEEEEEEEEEEEEEEKKKEVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVG VPGVGVPGVGVPGVGVPGVGVPGVGVPGVGEEEEEEKEEEEEEEEEEEEEEEEKKKESSKETAAAKFERQHMDS LEHHHHHH</INSDSegq seguence>

SÅR </ INSDSeg>

BET </SequenceDatar

STÅ <SeguenceData szecuensealD>DNumbers"S5%>

505 <INSDSeg>

LEE <INSDSaqg length>456</INShDSeg lengtb>

587 <INSDSeg moitype-rAA</INS0Seg moltype>

S88 <INSDSeg division>PAT</INSDSeg division»

SSS <INBDSeq fsature-table>

590 CSINSDFaeature>

SSI <INSDFeature key>source</INSnreature key,

SON <SINSDFeature iocation>1..456</1INSDFeasture location”

REIN SINSDFesture ouslsv

Sut <INSDOualifier>

235 <INSDQualifier neme>mol type</INSDOnalifier name

598 <INSDQualifier value>protein</INSDQualifier value,

59% </INSDOuRlifier>

598 <INSDOueslifier id="g50">x

BGG <INSDOualifier namerorganism</INSDQualifisy name>

Sou <INGDQualifier value->synthetic construct

</INSDQualifisr value» S01 </INSDQualifiers £02 </INSDFesture auais> S002 </INSDFsature> 5533 </INSDSeg Feature-table> SÖN <EINSDSeg seguence> MGKETAAAKFERQHMDSSAVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVG VPGVGVPGVGVPGVGVPGVGLEEKKEKEEEKKKHLHILKHELKRKKKKVPGVGVPGVGVPGVGVPGVGVPGVGV PGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGLEEKKEKEEEKKKHLHILKHELKRK KKKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGV PGVGLEEKKEKEEEKKKHLHILKHELKRKKKKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVP GVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGLEEKKEKEEEKKKHLHILKHELKRKKKKSSKETAAAKFERQ HMDSLEHHHHHH< / INSDSaq_sagquenca 508 </ INSDSeg> 807 </SeguenceData” SOS <SeguenceData szecuensealD>DNumbers"S2&"> SVT <INSDSeg> S140 <INSDSeg length>436</T1NSDSeg Tengtbh> £11 <INSDSeg moitype-rAA</INS0Seg moltype> 32 <INSDSeg division>PAT</INSDSeg division» Sia <INBDSeq fsature-table> His <INSDFeature> an <INSDFeature key>source</INSnreature key> Sid <INSOFeature iocation>1. 436</INSDFeature location> S17 <INSDFessture malls £18 <INSDOualifier> £13 <INSDQualifier namermol_type</INilQualifier name 529 <INSDQualifier value>protein</INSDQualifier valuer S21 </INSDOuRlifier> DAR <INSDQualifisr 1d="glav> 025 <INSDOualifier name>organism</iNSDousiifisr name> SPER CINSDOualifiexr value-synthetic construct </INSDQualifisr value» SEE </INSDQualifiers £24 </INSDFesture auais> S27 </INSDFsature> SER </INSDSeg feature-table> HSAN <ENSDSeg seqQuence> MGKETAAAKFERQHMDSSAVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVG VPGVGVPGVGVPGVGVPGVGAAAATAAAAAFGGAAAAAAAAAKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVY PGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGAAAATAAAAAFGGAAAAAAAAAKVPGVGVP GVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGAAAATIA AAAAFGGAAAAAAAAAKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVP GVGVPGVGVPGVGVPGVGAAAAIAAAAAFGGAAAAAAAAAKSSKETAAAKFERQHMDSLEHHHHHH </INSDSeg seguencs> 539 </ INSDSeg> ssi </SeguenceData” 83: <SeguenceData szecuensealD>DNumbers"SY"e 833 <INSDSeg> Sä <INSDSeg length>424</INSDSeg Tengtbh> SID <INSDSeg moitype-rAA</INS0Seg moltype> S28 <INSDSeg division>PAT</INSDSeg division» SAT <INBDSeq fsature-table> HÖR <INSDFGature> 83% <INSDFeature key>source</INSnreature key> S40 <SINSDFeature iocation>1..424</1INSDFeasture location” RES <INSDFessture malls SÄ <INSDOualifier> Sää <INSDQualifier neme>mol type</INSDOnalifier name fad <INSDQualifier value>protein</INSDQualifier value, Kas </INSDOuRlifier> [SEE <INSDQualifisr id="sSa"> SAY <INSDOualifier namerorganism</INSDQualifisy name> G43 <INGDQualifier value->synthetic construct </INSDQualifisr value» DÄR </INSDQualifiers £80 </INSDFesture auais> S31 </INSDFsature> ARD </INSDSeg feature-table> Sh <ENSDSeg seqQuence> MGKETAAAKFERQHMDSSAVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVG VPGVGVPGVGVPGVGVPGVGAAAATAGAAAGFAAAAAAAKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGV

GVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGAAAATAGAAAGFAAAAAAAKVPGVGVPGVGVPG VGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGAAAAIAGAAAGF AAAAAAAKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGV PGVGVPGVGAAAATAGAAAGFAAAAAAAKSSKETAAAKFERQHMDSLEHHHHHH </ INSDSeg sequence

ARA </INSDSeg>

855 </SeguenceData>

858 <SeguenceData setuenostiDNumhers"S8">

857 <INSDSeg>

SIS <INSDSeg length>456</1INSDSegq iength>

BIS <INSDSeg molityperAA</INSDSega moltype>

H&G <INSDSeg division>PAT</INSDSeg division>

SEI <INSDSeg fsature-table>

SÅR <INSDFesature>

063 <INSDFeature keyo>source</INSDFeature key

ShA <INSDFeature location>1. 456</INSDFeature location>

565 <INSDFsatlure ouals>

RES CEINSDQualifier> sa <INSDQualifier namermol_type</INSDQualifier name>

SAE <INSDQual:ifier value>protein</INSDOnalifier value>

£03 <SÄINSDOKAlifier,

STD CINIDQualifier id="g5&"> aid <INSDGualifier name>organism/INSDOuslifisr name>

STA <INSDOualifier value>synthetic construct

</INSDOualifier values

SS <SINSDQualifier> vå </INSDFeature uuals>

75 </ INSDFeature>

SS </INSDSeq feature-table>

577 <INSDSeg seguence> MGKETAAAKFERQHMDSSAVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVG VPGVGVPGVGVPGVGVPGVGAAAATAIAAAIAAAAAGOSAAAAATAAKVPGVGVPGVGVPGVGVPGVGVPGVGV PGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGAAAAIAIAAATAAAAAGOSAAAAAT AAKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGV PGVGAAAAIATAAAIAAAAAGOSAAAAATAAKVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGVP GVGVPGVGVPGVGVPGVGVPGVGVPGVGVPGVGAAAATAIAAAIAAAAAGOSAAAAATAAKSSKETAAAKFERO HMDSLEHHHHHH</ INSDSe go_segquence>

ATS </INSDSeg>

78 </SeguenceData>

RG </ST26SequenceListing>