[go: up one dir, main page]

Academia.eduAcademia.edu
1 Title: New insights on adaptation and population structure of cork oak using genotyping by 2 sequencing 3 Running head: New insights on cork oak adaptation using GBS 4 Authors: Pina-Martins, F.1, Baptista, J.2, Pappas Jr G.3, & Paulo, O. S.1 5 1 Computational Biology and Population Genomics Group, Centre for Ecology, Evolution and 6 Environmental Changes, Departamento de Biologia Animal, Faculdade de Ciências, 7 Universidade de Lisboa, Campo Grande, 1749-016 Lisboa, Portugal 8 2 Department of Biology, CESAM, University of Aveiro, Aveiro, Portugal 9 3 Department of Cell Biology, University of Brasilia, Brazil 10 Corresponding author: Francisco Pina-Martins – f.pinamartins@gmail.com 11 Keywords: Genotyping by sequencing, West Mediterranean, local adaptation, risk of non12 adaptedness, association study, natural selection effects, Quercus suber. 13 Paper type: Primary Research Article 14 Accepted for publication in Global Change Biology. The final citation is: 15 Pina‐Martins, F., Baptista, J., Pappas, G., & Paulo, O. S. (2019). New insights into adaptation 16 and population structure of cork oak using genotyping by sequencing. Global Change 17 Biology, 25(1), 337–350. doi: 10.1111/gcb.14497 1 18 1 Abstract 19 Species respond to global climatic changes in a local context. Understanding this process, 20 including its speed and intensity is paramount due to the pace at which such changes are 21 currently occurring. Tree species are particularly interesting to study in this regard due to their 22 long generation times, sedentarism, and ecological and economic importance. Quercus suber 23 L. is an evergreen forest tree species of the Fagaceae family with an essentially Western 24 Mediterranean distribution. Despite frequent assessments of the species’ evolutionary history, 25 large-scale genetic studies have mostly relied on plastidial markers, whereas nuclear markers 26 have been used on studies with locally focused sampling strategies. In this work, “Genotyping 27 by Sequencing” (GBS) is used to derive 1,996 SNP markers to assess the species’ 28 evolutionary history from a nuclear DNA perspective, gain insights on how local adaptation is 29 shaping the species’ genetic background, and to forecast how Q. suber may respond to global 30 climatic changes from a genetic perspective. Results reveal (1) an essentially unstructured 31 species, where (2) a balance between gene flow and local adaptation keeps the species’ gene 32 pool somewhat homogeneous across its distribution, but still allowing (3) variation clines for 33 the individuals to cope with local conditions. “Risk of Non-Adaptedness” (RONA) analyses, 34 suggest that for the considered variables and most sampled locations, (4) the cork oak should 35 not require large shifts in allele frequencies to survive the predicted climatic changes. Future 36 directions include integrating these results with ecological niche modelling perspectives, 37 improving the RONA methodology and expanding its use to other species. With the 38 implementation presented in this work, the RONA can now also be easily assessed for other 39 organisms. 2 40 2 Introduction 41 Understanding how and at which rate species respond to global climatic change in their 42 environmental context is becoming an increasingly important question due to the pace at 43 which these are taking place (Kremer et al., 2012; Primack et al., 2009). To avoid obliteration, 44 species may respond to such changes by either altering their distribution range, or by adapting 45 to the new conditions. The latter can occur “instantly”, due to phenotypic plasticity, or across 46 several generations, by local adaptation (Aitken, Yeaman, Holliday, Wang, & Curtis-McLane, 47 2008). The kind of response species can provide is known to depend on factors like location, 48 distribution range, and/or genetic background (Gienapp, Teplitsky, Alho, Mills, & Merilä, 49 2008; Ohlemuller, Gritti, Sykes, & Thomas, 2006). 50 Tree species are characterized by sedentarism and long lifespan and generation times, allied 51 with generally large distribution ranges and capacity for long distance dispersal through 52 pollen and seeds (Kremer et al., 2012). These traits make them interesting subjects to study 53 regarding their response to global climatic changes (Thuiller et al., 2008). 54 In this work, we address the case of the cork oak (Quercus suber L.). With a distribution 55 ranging most of the West Mediterranean region (Figure 1), this oak species is the most 56 selective evergreen oak of the Mediterranean basin in terms of precipitation and temperature 57 conditions (Vessella, López-Tirado, Simeone, Schirone, & Hidalgo, 2017). European oaks in 58 particular, are known to have endured past climatic alterations, but how they can cope with 59 the current, rapidly occurring changes is not yet fully understood (Kremer, Potts, & Delzon, 60 2014; Kremer et al., 2012). Despite this tree’s ecological and economic importance, there is 61 yet much to learn regarding the consequences of global climatic change on its future (Benito 62 Garzón, Sánchez de Dios, & Sainz Ollero, 2008). 3 63 Some recent works have attempted to answer this very question, but focusing on range 64 expansion and contraction with the assumption of a genetically homogeneous species and 65 niche conservationism (Correia, Bugalho, Franco, & Palmeirim, 2017; Vessella et al., 2017). 66 Both these studies also highlight the need for a genetic study regarding the adaptation 67 potential of Q. suber. Unlike what happen in other oak species (Rellstab et al., 2016), studies 68 integrating genetic information and response to climatic alterations of Q. suber (eg. (Modesto 69 et al., 2014)) are rare and of small scale (Jose Alberto Ramírez-Valiente, Valladares, Huertas, 70 Granados, & Aranda, 2011). Even though this study made the important assessement that 71 some cork oak traits can be associated to genetic variants, its local geographic scope 72 combined with the relatively low number if used markers, limits its utility in a distribution 73 wide perspective. Large scale information regarding Q. suber’s gene flow patterns and local 74 adaptation dynamics is paramount to understanding the species’ potential to endure rapid 75 climatic changes through adaptation (Savolainen, Lascoux, & Merilä, 2013). 76 In general terms, to predict a species’ response to change (Kremer et al., 2012), it is 77 fundamental to know both its genetic architecture of adaptive traits (Alberto et al., 2013) and 78 evolutionary history (Kremer et al., 2014). However, the very nature of genetic and genomic 79 data hampers the distinction of selection signals from other processes (McVean & Spencer, 80 2006), especially demographic events (Bazin, Dawson, & Beaumont, 2010). In order to 81 disentangle population structure (mostly shaped by gene flow, inbreeding, and genetic drift) 82 and selection (Foll, Gaggiotti, Daub, Vatsiou, & Excoffier, 2014), recent methods incorporate 83 population structure information to detect adaptation (Gautier, 2015; Günther & Coop, 2013). 84 Likewise, methods to accurately estimate population structure should be performed without 85 loci known to be under selection (De Kort et al., 2014). 4 86 In non-model organisms like the cork oak, loci of adaptive value can potentially be identified 87 by two kinds of methods – outlier analyses and environmental association analyses. While the 88 former identify loci that depart from the expected allele frequencies as under selection (Foll & 89 Gaggiotti, 2008; Vitalis, Gautier, Dawson, & Beaumont, 2014), they do not indicate what 90 which loci is responding to (Gautier, 2015). The latter, while being able to associate the 91 markers to an external covariate, are limited to detecting linear relations, and cannot assert 92 wether or not the identified correlations are of causative nature (Gautier, 2015). 93 The evolutionary history of Q. suber has been studied in the past using multiple 94 methodologies and in different geographic ranges. The most recent large-scale studies on the 95 subject suggest that cork oak is divided into four strictly defined lineages (Magri et al., 2007; 96 Simeone et al., 2009). Two of these lineages range from the south-east of France, to Morocco, 97 including the Iberian peninsula and the Balearic Islands, a third lineage ranges from the 98 Monaco region to Algeria and Tunisia, including the islands of Corsica and Sardinia. The 99 fourth lineage spans the entire Italic peninsula, including Sicilia. Based only on plastidial 100 markers, these lineages have been shown to hardly share any haplotypes (Magri et al., 2007). 101 Notwithstanding, later works based on nuclear DNA have hinted at a different scenario, where 102 the species is not as strictly divided (Costa et al., 2011; J. A. Ramírez-Valiente, Valladares, & 103 Aranda, 2014). These works are, however, limited in either geographic scope or number of 104 markers to confidently conclude that such segregation is only present in plastidial markers. 105 Genomic resources represent a new way to study the genetic mechanisms responsible for local 106 adaptation (Rellstab, Gugerli, Eckert, Hancock, & Holderegger, 2015) through the use of 107 environmental association analyses, which correlate environmental data with genetic markers, 108 thus highlighting loci putatively involved in the adaptation process (Rellstab et al., 2016). The 109 same methods, can thus, in principle, be used to assess the degree of maladaptation to 5 110 predicted future local conditions (Rellstab et al., 2016). The Risk of Non-Adaptedness 111 (RONA) method was developed with this very goal (Rellstab et al., 2016). In short, for every 112 significant association between a SNP and an environmental variable, the RONA method 113 plots each location’s individuals’ allele frequencies vs. the respective environmental variable. 114 This is done for both the current value and the future prediction. A correlation between allele 115 frequencies and the current variable values is then calculated and the corresponding best fit 116 line is inferred. The distance between the fitted line and the two coordinates is then compared 117 per location and its normalized difference is considered the RONA value for each association 118 and location (which can vary between 0 and 1). In theory, the higher the difference in 119 conditions between the current values and the prediction, the more the studied species should 120 have to shift its allele frequencies to survive in the location under the new conditions. Despite 121 the innovation and importance of the method for the general scientific community, in the 122 original paper, RONA is applied only for the work’s case study (calculating RONA values for 123 several Swiss species of Quercus based on candidate genes), and no public implementation is 124 provided. Applying this kind of methodology to Q. suber would fill the gap mentioned in 125 (Correia et al., 2017; Vessella et al., 2017), that multidisciplinary approaches are required to 126 more accurately provide sound recommendations for the conservation of forests. 127 In the present work, a panel of Single Nucleotide Polymorphism (SNP) markers derived from 128 the Genotyping by Sequencing (GBS) technique (Elshire et al., 2011) was developed to 129 accomplish the following goals: (1) attempt to infer the species’ genetic structure and 130 evolutionary history, (2) detect signatures of natural selection, and (3) investigate the 131 adaptation potential of Q. suber based on the RONA method developed and presented on 132 (Rellstab et al., 2016). 6 133 3 Material & Methods 134 3.1 Sample and environmental data collection 135 In order to provide a comprehensive view of the species genetic background, samples were 136 collected from 17 locations spanning most of Q. suber’s distribution. Fresh leaves were 137 collected from six individuals from, Bulgaria, Corsica, Kenitra, Monchique, Puglia, Sardinia, 138 Sicilia, Tuscany, Tunisia and Var, and from five individuals from Algeria, Catalonia, Haza de 139 Lino, Landes, Sintra, Taza and Toledo for a total of 95 individuals (Table 1, Figure 1). It is 140 worth noting that trees from Bulgaria are not of natural origin, but rather the result of human 141 introduction from Iberian locations (Borelli & Varela, 2000; Petrov & Genov, 2004). 142 Most samples were collected from an international provenance trial (FAIR I CT 95 0202) 143 established at “Monte Fava”, Alentejo, Portugal (38°00’ N; 8°7’ W) (Varela, 2000), except 144 Portuguese and Bulgarian samples, which were collected directly from their native locations. 145 The collected plant material was stored at –80°C until DNA extraction. 146 Altitude, latitude and longitude spatial variables (Varela, 2000) were recorded for each of the 147 native sampling sites. Nineteen Bioclimatic (BIO) variables, BIO1 to BIO19 were collected 148 from the WorldClim database (Hijmans, Cameron, Parra, Jones, & Jarvis, 2005) at 30 arc149 seconds (~ 1 km) resolution for both “Current conditions ~1960-1990” and “Future” 150 predictions for 2070, using two different Representative Concentration Pathways (RCPs), 151 rcp26 and rcp85 for the following “Global Climate Models” (GCMs): BCC-CSM1-1, 152 CCSM4, GFDL-CM3, GISS-E2-R, HadGEM2-ES, IPSL-CM5A-LR, MRI-CGCM3, MPI153 ESM-LR and NorESM1-M (IPCC, 2014) as these are available under permissive licenses and 154 calculated for both rcp26 and rcp85. Instead of using the GCMs directly, an average of the 155 values was obtained for each coordinate, and merged into a single dataset, for both used RCPs 156 (Supporting Table 1 and 2 respectively). Data was extracted from the GeoTiff files using a 7 157 python script, layer_data_extractor.py (https://github.com/StuntsPT/Misc_GIS_scripts) as of 158 commit “bd36320”. 159 Correlations between present Bioclimatic variables were assessed using Pearson's correlation 160 coefficient as implemented in the R script eliminate_correlated_variables.R 161 (https://github.com/JulianBaur/R-scripts) as of commit “43e6553”, which resulted in the 162 exclusion of six variables due to high correlation (r>0.95). Each sampling location was thus 163 characterized by three spatial variables and 13 environmental variables (Supporting Table 3). 164 3.2 Library preparation and sequencing 165 Genomic DNA was extracted from liquid nitrogen grounded leaves of all samples collected 166 for this work using the kit "innuPREP Plant DNA Kit" (Analytik Jena AG), according to the 167 manufacturer's protocol. 168 The total amount of extracted DNA was quantified by spectrophotometry using a Nanodrop 169 1000 (Thermo Scientific) and integrity verified on Agarose gel (0.8 %). DNA samples were 170 then diluted to a concentration of ~100 ng/μl and plated for genotyping. 171 DNA samples were then outsourced to “Genomic Diversity Facility”, at Cornell University” 172 for genotyping using the “Genotyping by sequencing” (GBS) technique as described in 173 (Elshire et al., 2011). Samples were shipped in a single 96 well plate with one “blank” well 174 for negative control. Sequencing was performed according to the standard protocol on a single 175 Illumina HiSeq 2000 flowcell using the low frequency cutter enzyme “EcoT22I”, due to the 176 large size of Q. suber’s genome. 177 3.3 Genomic data analyses 178 The raw GBS data was analysed using the program ipyrad v0.7.24, which is based on pyrad 179 (Eaton, 2014), using an “anaconda” environment containing - MUSCLE v3.8.31 (Edgar, 8 180 2004) and VSEARCH v2.7.0 (Rognes, Flouri, Nichols, Quince, & Mahé, 2016). A denovo 181 sequence assembly was performed, but mtDNA and cpDNA reads were “baited” out by 182 ipyrad’s mode “denovo-reference” using the complete mitochondiral genomes of Populus 183 davidiana (KY216145.1) (Choi et al., 2017) , Pyrus pyrifolia (KY563267.1) (Chung, Lee, 184 Kim, & Kim, 2017) and Rosa chinensis (CM009589.1) (Raymond et al., 2018), and 185 chloroplastidial genomes of Quercus rubra (JX970937.1) (Alexander & Woeste, 2014), 186 Quercus aliena (KU240007.1) and Quercus variabilis (KU240009.1) (Yang et al., 2016). This 187 ensured that mtDNA and cpDNA reads were filtered from downstream analyses. Parameters 188 included GBS as datatype, clustering threshold of 0.85, mindepth of 8 and maximum barcode 189 mismatch of 0. Each sampling site had to be represented by at least three individuals for a 190 SNP to be called, except the locations of Kenitra and Taza, where only one individual was 191 required due to the lower representation of these sampling sites. Full parameters can be found 192 in Supporting Datafile 1. The demultiplexed “fastq” files were submitted to NCBI’s Sequence 193 Read Archive (SRA) as “BioProject” PRJNA413625. 194 Downstream analyses were automated using “GNU Make”. This file, containing every detail 195 of every step of the analyses for easier reproducibility can be found in gitlab 196 (https://gitlab.com/StuntsPT/Qsuber_GBS_data_analyses, tag “v03”). For improved 197 reproducibility, a docker image with all the software, configuration files, parameters and the 198 Makefile, ready to use is also provided 199 (https://hub.docker.com/r/stunts/q.suber_gbs_data_analyses/, tag “v03”). The intent is not to 200 allow the analyses process to be treated as a “black box”, but rather to provide a full 201 environment that can be reproduced, studied and modified by the scientific community. 202 Processed data from ipyrad was then filtered using VCFtools v0.1.14 (Danecek et al., 2011) 203 with the following criteria: each sample has to be represented in at least 40 % of the SNPs, 9 204 and after this each SNP has to be represented in at least 80 % of the individuals. Furthermore, 205 due to the relatively small sample size, the minimum allele frequency (MAF) of each SNP has 206 to be at least 0.03 for it to be retained. 207 In order to minimize the effects of linkage disequilibrium, downstream analyses were 208 performed using only one SNP per locus, by discarding all but the SNP closest to the centre of 209 the sequence in each locus. This sub dataset was obtained using the python script 210 vcf_parser.py (https://github.com/CoBiG2/RAD_Tools/blob/master/vcf_parser.py) as of 211 commit “0893296”. 212 All file format conversions were performed using PGDSpider v2.1.0.0 (Lischer & Excoffier, 213 2012), except for the BayPass and SelEstim formats, where the scripts geste2baypass.py 214 (https://github.com/CoBiG2/RAD_Tools/blob/master/geste2baypass.py) and gest2selestim.sh 215 (https://github.com/Telpidus/omics_tools) as of commit “b99636e” and “f74f66b” 216 respectively were used, since the used version of PGDSpider does not handle either of these 217 formats. 218 Descriptive statistics, such as Hardy-Weinberg Equilibrium (HWE), F ST and FIS were 219 calculated using Genepop v4.6 (Rousset, 2008). The same software was further used to 220 perform Mantel tests to determine an eventual effect of Isolation by Distance (IBD) by 221 correlating “'F/(1-F)'-like with common denominator” with “Ln(distance)” following on 222 1,000,000 permutations. This test was performed excluding individuals sampled from 223 Bulgaria due to their introduced origin. 224 3.4 Outlier detection and environmental associations 225 Outlier detection was performed using two programs: SelEstim v1.1.4 (Vitalis et al., 2014) (50 226 pilot runs of length 1,000 followed by a main run of length 10⁶, with a burnin of 1,000, a 10 227 thinning interval of 20, and a detection threshold of 0.01) and BayeScan v2.1 (Foll & 228 Gaggiotti, 2008) (20 pilot runs of length 5,000 followed by a main run of 500,000 iterations, a 229 burnin of 50,000, a thinning interval of 10, and a detection threshold of 0.05) (full commands 230 and parameters are available in Supporting Datafile 2), since these methods show the lowest 231 rate of false positives (Narum & Hess, 2011; Vitalis et al., 2014). Only SNPs indicated as 232 outliers by both programs were considered outliers for the purpose of this work. This was 233 done to further reduce the chance of false positives, which is a known issue in this type of 234 analyses (Gautier, 2015; Vitalis et al., 2014). 235 The software BayPass v2.1 (Gautier, 2015) wrapped under the script Baypass_workflow.R 236 (https://gitlab.com/StuntsPT/pyRona/blob/master/pyRona/R/Baypass_workflow.R) from 237 pyRona v0.1.3 was used to assess associations of SNPs to environmental variables using the 238 “AUX” model (20 pilot runs of length 1,000, followed by a main run of length 500,000 with a 239 burnin of 5,000 and a thinning interval of 25). Any association with a Bayes Factor (BF) 240 above 15 was considered significant. Association analyses were performed excluding 241 individuals from Bulgaria sampling site for the same reasons as in the Mantel tests. 242 Sequences containing outlier loci or SNPs associated to an environmental variable were 243 queried against the genome of Q. lobata (Sork et al., 2016) v1.0 using BLAST v2.2.28+ 244 (Altschul et al., 1997) with an e-value threshold of 0.00001. 245 3.5 Population Structure 246 Two distinct methods were used for clustering the individuals in order to understand the 247 general pattern of individual or population grouping, namely, Principal Components Analysis 248 (PCA) and MavericK (Verity & Nichols, 2016), which is based on STRUCTURE (Pritchard, 249 Stephens, & Donnelly, 2000). 11 250 The PCA was performed with 251 (https://github.com/CoBiG2/RAD_Tools/blob/master/snp_pca_static.R) snp_pca_static.R as of commit 252 “bb2fc45”. 253 In order to correctly interpret clustering analyses results, it is important to estimate the value 254 of “K”, which represents how many demes the data can be clustered into. The software 255 MavericK is especially interesting for cluster estimation due to its innovative method for 256 estimating “K”, called “Thermodynamic Integration” (TI), which has shown superior 257 performance in this task relative to other methods (Verity & Nichols, 2016). Analysis was 258 divided in two stages: an initial single “pilot” stage which ran for 5,000 iterations, with a 259 burnin of 500 using an admixture model, a free alpha parameter of “1” and “thermodynamic 260 integration” (TI) turned off. This stage was used to infer tuned alpha and alphaPropSD values 261 which were used in the subsequent “tuned” stage as parameters for the admixture model. This 262 stage was comprised of five runs of 10,000 iterations (10 % burnin), with TI turned on and set 263 to 20 rungs of 10,000 samples with 20 % burnin. MavericK was wrapped under 264 Structure_threader v 1.2.2 (Pina-Martins, Silva, Fino, & Paulo, 2016) and was run for values 265 of “K” between 1 and 8. The most suitable value of “K” was estimated using the TI method. 266 Full parameter files are available as Supporting Datafile 2. 267 The same methodology was used on two more datasets derived from the original data. On 268 one, only SNPs considered “neutral” were used, in order to obtain an unbiased population 269 structure (De Kort et al., 2014). On the other one, only SNPs considered “non-neutral” were 270 used, which should not be interpreted as population structure, but rather as an indication of 271 wether local adaptation is responsible for the observed pattern. 12 272 3.6 Risk of non-adaptedness 273 The software pyRona was developed in this work as the first public implementation of the 274 method described in (Rellstab et al., 2016) called “Risk of non-adaptedness” (RONA). This 275 method provides a way to represent the theoretical average change in allele frequency at loci 276 associated with environmental variables required for any given population to cope with 277 changes in that variable. The program source code is hosted on public repositories, under a 278 GPLv3 license, and can be downloaded free of charge at https://gitlab.com/StuntsPT/pyRona. 279 PyRona has a complete user manual, with installation instructions, usage patterns, and a 280 graphical method description. 281 The RONA method as implemented in pyRona, however, is slightly different from the original 282 method description (Supporting Datafile 3). Namely, instead of ranking environmental factors 283 by p-value of the difference test between present and future values like the original 284 description, pyRona will rank the environmental factors by the number of associations. 285 Furthermore, the average RONA value provided by pyRona is weighted by the R² value of 286 each involved correlation, unlike the original, which uses unweighted means. 287 In this work, two alternative climate prediction models were used to calculate a RONA value 288 for each location in pyRona v0.1.3: a low emission scenario (RCP26) and a high emission 289 scenario (RCP85) (IPCC, 2014) in order to account for uncertainties in the models’ 290 assumptions. Any associations flagged by Baypass with a BF above 15 were considered 291 relevant and included in the RONA analysis. The three non-geospatial environmental 292 variables most frequently associated with SNPs, were selected for determining generic RONA 293 values. 13 294 4 Results 295 Genotyping by sequencing (Elshire et al., 2011), a technique based on restriction enzyme 296 genomic complexity reduction followed by short-read sequencing, was employed to discover 297 SNP markers from a total of 95 Q. suber individuals sampled from 17 geographical locations 298 (Table 1). 299 A total of 225,214,094 reads (100 bp) generated by the GBS assay was processed by ipyrad 300 (Eaton, 2014) computational pipeline. The first analytical step consisted in the assembly of 301 raw reads into 4,548 distinct contiguous sequence fragments (genomic loci), from which an 302 initial set of 8,978 SNPs were flagged. Twelve Q. suber samples were discarded due to low 303 sequence representation during the assembly process, resulting in the retention of 83 304 individuals. After filtering according to the criteria presented in the methods section 3.3, 1,996 305 SNPs remained, which were used for all further analyses. This filtering process additionally 306 removed two samples which were not represented for more than 55 % of the markers, and 307 therefore, only 81 samples were used in the analyses (Supporting Table 4). 308 The calculated FIS values for each sampling site are available in Supporting Table 4. These 309 range from -0.0262 (Var) to 0.1145 (Puglia) with an average value of 0.0666. Pairwise F ST 310 values are available in Supporting Table 5. These range from 0.0028 between Sardinia and 311 Tuscany to 0.1216 between Landes and Var (average FST of 0.0541). 312 When looking at HWE results per marker, of the 1,996 SNPs, 172 (~9 %) reveal a 313 heterozygote deficit, whereas 88 (~4 %) reveal a deficit of homozygotes. Individual sampling 314 sites are comprised of two few individuals to achieve biologically meaningful results. The 315 performed Mantel test revealed no evidence of IBD among Q. suber individuals. 14 316 4.1 Outlier detection and environmental association 317 Population differentiation and ecological association approaches (François, Martins, Caye, & 318 Schoville, 2016) were employed aiming at the identification of loci targeted by selection. In 319 the first strategy, highly differentiated loci among populations, measured as outliers in F ST 320 distribution, were detected by the software BayeScan and SelEstim uncovering 29 and 17 321 outlier SNPs respectively (Supporting Table 6). All of the loci considered under outliers by 322 SelEstim were also present in the set of loci flagged as outlier by BayeScan. This set of 17 323 common markers was considered as being putatively under the effect of natural selection. 324 For a functional characterization of these loci, the draft genome sequence of Q. lobata was 325 used as a proxy for similarity searches. None of the 17 sequences revealed significant matches 326 to Q. lobata‘s genome scaffolds. 327 The ecological association approach was carried out using the software BayPass and yielded 328 274 associations between 249 SNPs and 12 of the 16 tested environmental variables (no 329 associations were found with “Altitude”, “Temperature Annual Range”, “Precipitation of 330 Wettest Month” or “Precipitation Seasonality”). These associations can be found in 331 Supporting Table 7. Despite this relatively high number of associations, it is important to note 332 that 70 of these associations were between a SNP and a geospatial variable: 12 associations 333 with “Latitude” and 58 with “Longitude”. Of all environmental variables, the one with most 334 markers associated is “Precipitation of Driest Month” with 71 associations, followed by 335 “Isothermality” with 35 associations, and “Mean Temperature of Driest Quarter” with 29 336 associations. 337 Sequences containing 22 of the 249 markers associated with environmental variables were 338 matched to entries in the Q. lobata genome, however, of these only 10 were annotated (Table 339 2). 15 340 The union of the outlier loci set and the set of loci associated with at least one environmental 341 variable resulted in a dataset of 259 SNPs which were deemed “non-neutral” (7 SNPs were 342 common to both loci sets). The remaining 1737 SNPs were grouped in another sub-dataset, 343 deemed “neutral”. 344 4.2 Population structure 345 Clustering analyses were used to infer the current population structure of Q. suber in the West 346 Mediterranean. The TI method implemented in the software MavericK determined the best 347 “K” value to be “1” on all datasets. Despite this assessment, the presented plots are always 348 with K=2 (Figure 2), but with strong evidence that the data does not support structuring of 349 any kind. Q-plots for values of K above 2 were always either reduced to two clusters, or to 350 every individual being roughly equally divided into fractions of all clusters (Supporting 351 Figure 1). 352 The Q-matrix plot showing the relatedness of each genotype to each considered deme of 353 MavericK‘s results produced using all loci (Figure 2a) can be interpreted as a rough split 354 between western individuals (from locations Sintra, Monchique, Kenitra, Toledo, Landes, 355 Taza, Haza de Lino and Catalonia), which are mostly, but not completely, assigned to cluster 356 “1” and eastern ones (from locations Var, Algeria, Sardinia, Corsica, Tunisia, Tuscany, Sicilia 357 and Puglia), which are mostly assigned to cluster “2”. Individuals from Bulgaria are a notable 358 exception, since individual genotypes are mostly assigned to cluster “1” similar to those of 359 individuals from western locations, likely due to the species’ introduced origin (Varela, 2000). 360 However, this West – East split is somewhat fuzzy, as individuals’ genomes are never 361 completely attributed to a single cluster. In fact, most individuals have a considerable part of 362 their genome attributed to both cluster “1” and “2”. Furthermore, individuals from some 363 eastern locations have their genomes almost completely attributed to cluster ”1” (Var 21, 16 364 Corsica 3, Corsica 11, Corsica 14 and Puglia 5), and all individuals from Tunisia and Algeria 365 are almost equally split between both clusters. 366 The Q-plot obtained using the “neutral” loci subset (Figure 2b) is nearly identical to the one 367 with all the loci, but with individual genomes from eastern locations being slightly more 368 assigned to cluster “1” than in Figure 2a, and can be interpreted in the same way. 369 The Q-plot produced using only the 259 (12.9 %) “non-neutral” loci (Figure 2c), however, 370 does bear a different clustering pattern from the previous ones. In this case, the East – West 371 split is more evident, as eastern individual genomes’ attribution to each cluster is not as 372 evenly split, but rather displays a more pronounced attribution to cluster “2” than in Figure 2a. 373 The opposite is also true for western individuals, but to a lesser extent. 374 The PCA clustering method (largest eigenvector values of 0.0405 and 0.0299) is essentially 375 concordant with the previous methods, revealing two loosely defined groupings (Supporting 376 Figure 2). 377 4.3 Risk of non-adaptedness (RONA) 378 A summary of the RONA analyses for both low (RCP26) and a high (RCP85) emission 379 scenario predictions can be found in Figure 3 and Supporting Table 8. The most represented 380 environmental variables are “Precipitation of Driest Month” (71 SNPs, mean R²=0.1570), 381 “Isothermality” (35 SNPs, mean R²=0.2143) and “Mean Temperature of Driest Quarter” (29 382 SNPs, mean R²=0.1501). The values of RONA per sampling site are always higher for RCP85 383 than for RCP26, except for “Precipitation of Driest Month” in Tunisia where RCP85 has a 384 lower RONA than RCP26, and in Kenitra where they are the same (the “Precipitation of 385 Driest Month” variable in Kenitra is not predicted to change from current conditions of 0 mm² 386 regardless of the model). 17 387 Under the RCP26 predictions, the highest RONA values for “Precipitation of Driest Month” is 388 Landes (0.0369), for “Isothermality” is Puglia (0.0461), and for “Mean Temperature of Driest 389 Quarter” is Catalonia (0.1281). Under the RCP85 predictions Landes presents the highest 390 RONA for “Precipitation of Driest Month” (0.1115) and Catalonia presents the highest values 391 of RONA for “Mean Temperature of Driest Quarter” (0.3888) and “Isothermality” 0.0686). It 392 is important to note that the high RONA values of Catalonia are approximately twice as high 393 as the second highest RONA value on the RCP26 prediction and close to three times as high 394 for RCP85, marking this location as the most likely to become deprived of cork oak 395 individuals in the future. 396 5 Discussion 397 In this study, Quercus suber individuals were sampled across the species’ distribution range to 398 assess population structure, impact of local adaptation and provide an estimate of the RONA 399 value of each sampled location. 400 Due to the relatively large size of Q. suber’s genome (Zoldos, Papes, Brown, Panaud, & 401 Siljak-Yakovlev, 1998) a genome reduction technique, GBS, was used to discover SNPs for 402 this species. There is no “standard” parameter set to call SNPs on GBS datasets, since this 403 will ultimately depend on the organism being studied. The stringent approach used in this 404 study was, however, deemed preferable to alternatives that could result in more SNPs being 405 called at the cost of lowering confidence in the called variants, eventually biasing analyses 406 results. In fact, since no biological replicates were performed for this study, a conservative 407 approach was always preferred as to minimize biases in the results. 408 After stringent quality filtering, a set of 1,996 SNPs was used in this study. This number is 409 lower than that of some studies with similar data (Berthouly-Salazar et al., 2016), which 18 410 obtained ~22k SNPs (albeit using a more frequent cutting enzyme), but still more than (De 411 Kort et al., 2014), which obtained 1630 SNPs, very close to that of (Escudero, Eaton, Hahn, & 412 Hipp, 2014) and (Pais, Whetten, & Xiang, 2017). Even though this number may seem small, 413 in the universe of Q. suber’s genome of ~750 Mbp, this is to date the largest number of 414 molecular markers available for this species and represents a step forward to increase the 415 power of population genetics studies. 416 5.1 Population genetic structure 417 Past studies (Magri et al., 2007) have characterized Q. suber as a highly structured species, 418 with an evolutionary history shaped by large effect events, such as plate tectonics. These 419 were, however, mostly based on plastidial DNA data, which is known to not always provide a 420 comprehensive view on a species’ evolutionary history (Kirk & Freeland, 2011). The nuclear 421 markers developed for this work provide a somewhat different perspective. 422 Hardy & Weinberg Equilibrium analysis revealed that few individual markers deviated from 423 expectations. Only ~9 % reveal a heterozygote deficit, and only ~4 % reveal a deficit of 424 homozygotes. These values do not indicate the presence of assembly bias. 425 The obtained values of FIS are higher than those of unstructured European oaks when analysed 426 with the same type of markers, such as Quercus robur or Quercus petraea (Guichoux et al., 427 2013), but are nonetheless relatively low in general, which is compatible with low levels of 428 population structuring. 429 Similar to what is observed with F IS, FST values are on average (0.0541) higher than on the 430 above mentioned unstructured oak species (0.0125) (Guichoux et al., 2013), but lower than 431 other well structured trees such as eucalypti (0.095) (Cappa et al., 2013). These results 432 corroborate what the clustering analyses reveal: an incomplete segregation of the species in 19 433 two clusters, as seen on Figure 2. Although clustering analyses using all loci do not provide a 434 clear structuring signal (and the “TI” method clearly favours a scenario of a single large 435 panmictic population), the produced Q. suber Q-plots do show some degree of segregation 436 between western and eastern individuals. This can be derived both from Figure 2a and Figure 437 2b, which are, very similar, and can be interpreted in the same way – as incomplete 438 segregation between individuals from eastern and western locations. 439 Figure 2c, where the Q-plot was produced using only loci putatively under selection, should 440 not be used to infer population structure, but can be be compared to the Q-plot obtained using 441 only “neutral” loci to interpret the role of local adaptation in shaping Q. suber’s genetic 442 background. In Figure 2c, the division between western and eastern individuals is clearer than 443 in Figure 2a and B. Furthermore, the generally observed difference pattern is similar to what 444 cen be seen in the locations of “Monchique” and “Sardinia”: individual attributions to the 445 “dominant” cluster in the “neutral” Q-plot, become even more pronounced in the “non446 neutral” Q-plot. This is expected if local adaptation is responsible for these differences 447 (otherwise, the differences between “neutral” and “non-neutral” Q-plots should be more 448 random). This evidence, combined with the relatively low pairwise F ST and FIS values, 449 suggests a balance between local adaptation and gene flow. Whereas the former is responsible 450 for maintaining the species’ standing genetic variation across the species range and the latter 451 for the species’s response to local environmental differences. Intense gene flow would also 452 explain the relatively low proportion of outlier SNPs, which may be counteracting reactions to 453 weak selective pressures. At the same time, this balance may provide the species a relatively 454 large genetic variability to respond to strong selection (De Kort et al., 2014; Kremer et al., 455 2012). 20 456 Data from this work does not seem to support the four lineages hypothesis proposed in (Magri 457 et al., 2007), however, it is also not incompatible with it, if it is assumed that nuDNA and 458 cpDNA can have different evolutionary histories. In fact, it has been argued that for other tree 459 species plastidial lineages exist due to population contractions and expansions from glacial 460 refugia, but high gene flow erases any evidence of their existence in the nuclear genome 461 (Eidesen et al., 2007). 462 Two hypotheses can thus be proposed to explain the currently observed genetic structure: 463 1. Balance between gene flow and local adaptation is responsible for both creating and 464 maintaining the current level of nuclear divergence. Whereas local adaptation tends to 465 cause divergence between contrasting regions, this effect is countered by species wide 466 gene flow. Population contractions in refugia locations during glacial periods explain 467 the occurrence of plastidial lineages, which are absent in the nuclear genome due to 468 very intense gene flow. 469 2. Differential hybridization of Q. suber with Q. cerris in the East (Bagnoli et al., 2016) 470 and with Q. ilex s.l. in the West (Burgarella et al., 2009) is responsible for the observed 471 nuDNA structuring pattern and balance between gene flow and local adaptation is 472 responsible for maintaining it. Combination of these phenomena can thus be 473 considered the cause for the observed levels of East-West differentiation. Since Q. 474 suber always acts as a pollen donor in these hybridization events (Boavida, Silva, & 475 Feijó, 2001). Under this hypothesis, Q. suber would maintain a high nuclear 476 population effective, even during glacial periods, but restrict plastidial lineages’ 477 geographic scope as suggested in (López de Heredia, Carrión, Jiménez, Collada, & 478 Gil, 2007), which is further supported by the different dispersal capabilities of pollen 479 and acorns (Sork, 1984). This scenario would result in large effective population size 21 480 differences between nuDNA and cpDNA, which can be an alternative explanation for 481 cpDNA lineages to simple population contractions to glacial refugia. 482 The proposed hypotheses are supported by the SNP data presented here, but further studies 483 are needed to confirm them. As such, the issue will remain open for investigation. 484 5.2 Outlier detection and environmental association analyses 485 The method used to detect outlier loci flagged ~0.9 % of the total SNPs, which is in line with 486 what was found on other similar studies (Berdan, Mazzoni, Waurick, Roehr, & Mayer, 2015; 487 Chen et al., 2012). Of the 17 outlier markers found, none could be matched to an annotated 488 location in Q. lobata’s genome. This is likely due to a combination of factors, such as the 489 distance between Q. suber and Q. lobata, and the incomplete annotation of Q. lobata’s 490 genome. On the other hand, it emphasizes the need for more genomic resources in this area, 491 which can potentially provide important functional information of these SNPs in Q. suber’s 492 genome, that will at least for now remain unknown. 493 The environmental association analyses (EAA) served two purposes in this work. On one 494 hand, the reported associations work as a proxy for detecting local adaptation, and on the 495 other hand, allow the attribution of a RONA score to each sampling site. Q. suber is known to 496 be very sensitive to precipitation and temperature conditions (Vessella et al., 2017), and as 497 such, it was expected beforehand that some of the markers obtained in this study were to be 498 associated with some of these conditions (Rellstab et al., 2016). In order to understand how 499 important the found associations are for the local adaptation process, it is necessary to 500 understand the putative function of the genomic region where each SNP was found. Querying 501 the available sequences against Q. lobata’s genome annotations, has provided insights 502 regarding some of the markers’ sequences putative function. The proportion of sequences that 22 503 were a match to an annotated region, however, is rather small – only ~4.4 % of the queried 504 sequences could be matched to such regions. 505 Of the 10 SNPs associated with an environmental variable that returned hits to annotated 506 regions of Q. lobata’s genome, two were matched to regions annotated as close to animal 507 genes, and one matched a region annotated as a chloroplastidial region, leaving 7 SNPs as 508 interesting to explore for downstream analyses. While all these associations are potentially 509 interesting to explore, doing so falls outside the grander scope of this work. 510 Of these markers, it is interesting to remark, that SNP 158, associated with the variable “Mean 511 Temperature of Driest Quarter”, for example is located in a region annotated as “Similar to 512 TRE1: Trehalase”, which is known to play a role in drought stress (Houtte et al., 2013). 513 Likewise, SNP 168, associated with the variable “Precipitation of Driest Month”, is located in 514 a region matching the annotation of “Similar to PER47: Peroxidase 47”, which is known to 515 play a role in drought response (Li et al., 2017). 516 Like these two examples, more of the SNPs found have associations to environmental 517 variables which are putatively located in genes involved in functions which are important in 518 responding to the very variables they are associated with. This fact flags these markers as 519 particularly useful to focus on in downstream studies. 520 5.3 Risk of non-adaptedness 521 Although the RONA method is a greatly simplified model (its limitations are described in 522 Rellstab et al., 2016), it provides an initial estimate of how affected Q. suber is likely to be by 523 environmental changes (at least as far as the tested variables are concerned). Furthermore, it is 524 important to remark that due to Baypass being limited to a univariate method, the same 23 525 constraint also applies to the RONA analysis, meaning that multi-loci associations are not 526 considered. 527 The implementation developed for this work, named pyRONA suffers from most of the same 528 limitations as the original application, even though it is based on an arguably superior 529 association detection method (Gautier, 2015), (although the original LFMM (Frichot, 530 Schoville, Bouchard, & François, 2013) method is also available to use in pyRona since 531 version 0.3.0) and introduces a correction to the average values based on the R² of each 532 marker association by using weighted means. The automation achieved by using this new 533 implementation, easily allows two different emission scenarios (RCP26 and RCP85) to be 534 tested and compared. 535 With the exception of Catalonia, which seems to have an exceptionally high highest RONA 536 value under both prediction models, the other locations present relatively low RONA values 537 for the tested variables. The variable “Mean Temperature of Driest Quarter” appears to be the 538 tested variable that requires the greatest changes in allele frequencies to ensure adaptation of 539 the species to the local projected changes. These RONA values, are nevertheless smaller than 540 those presented in (Rellstab et al., 2016). This might be due to various factors, such as the 541 different variables tested, the geographic scope of the study, the species’ respective tolerance 542 to environmental ranges, the differences between species’ standing genetic variation, the 543 association detection method, or, more likely, a combination of several of these factors. 544 Notwithstanding, the obtained results seem to indicate that Q. suber is generally well 545 genetically equipped to handle climatic change in most of its current distribution (with the 546 notable exception of Catalonia). Despite cork oak’s long generation time, it seems reasonable 547 that during the considered time frame current populations are able to shift their allele 548 frequencies (2 % to 12 % on average, depending on the predictive model) due to a relatively 24 549 high standing genetic variation, which according to (Kremer et al., 2012) should really work 550 in the species’ favour in the presence of strong selective pressures. 551 This study, however, is limited to the considered environmental variables. Other factors that 552 were not included in this work may have a larger effect on Q. suber’s RONA. Inferring future 553 adaptive potential of species is not yet commonplace practice (Jordan, Hoffmann, Dillon, & 554 Prober, 2017; Rellstab et al., 2016), however, combining this type of study with ecological 555 niche modelling approaches has the potential to greatly improve the accuracy of both kinds of 556 predictions. 557 5.4 Final remarks 558 In this study, new nuclear markers were developed to shed new light on Q. suber’s 559 evolutionary history, which is important to understand, in order to attempt to predict the 560 species response to future environmental pressures (Kremer et al., 2014). 561 Despite the relatively large geographic distances involved, the nuclear markers used in this 562 work indicate a lesser genetic structuring than previously thought from cpDNA markers, 563 which clearly segregated the species in several well defined demes (Magri et al., 2007). The 564 SNP data from this work can thus be used to propose two new hypotheses to replace the 565 current view of a deep genetic structure as evidenced by cpDNA. The observed genetic 566 structure can be explained either by balance between gene flow and local adaptation, or 567 alternatively, differential hybridization of Q. suber with Q. ilex s.l. in the West and Q. cerris 568 in the East being responsible for geographic differences’ origin, which are then maintained by 569 the mentioned balance between gene flow and local adaptation (albeit more research is 570 required to confirm this second hypothesis). 25 571 Despite the genetic structure homogeneity, outlier and association analyses hint at the 572 existence of local adaptation. The RONA analyses suggest that this balance, between local 573 adaptation and gene flow, may be key in Q. suber’s response to climatic change. It is also 574 worth considering that despite the species’ likely capability to shift its allele frequencies for 575 survival in the short term, the effects of such changes in the long term can be quite 576 unpredictable (Feder, Egan, & Nosil, 2012; Lenormand, 2002), and only very recently have 577 they began to be understood (Aguilée, Raoul, Rousset, & Ronce, 2016). 578 This study starts by providing a new perspective into the population genetics of Q. suber, and, 579 based on this data, suggests an initial conjecture on the species’ future, despite the used 580 technique’s limitations. Even though studies regarding Q. suber’s response to climatic change 581 are not new (Correia et al., 2017; Vessella et al., 2017), this is the first work where this 582 response is investigated from an adaptive perspective. 583 The software, pyRona, was developed and is provided in hopes that the method is adopted by 584 the larger scientific community to estimate the Risk of non-Adaptedness for other species, and 585 eventually, make an impact in determining species conservation strategies. In this regard, the 586 RONA results can be used in informing assisted migration projects (Aitken & Bemmels, 587 2016). In the specific case of the cork oak, European commercial stocks can be expected to 588 benefit from the introduction of trees (and therefore alleles) adapted to more extreme 589 temperature and precipitation conditions. As for which ones, should be further studied, but the 590 genes that were functionally explored in this work, should provide a good starting point. 591 In the near future, it is expected that improvements are made to the RONA method. In 592 particular, using more sophisticated association testing (including the use of multivariate 593 methods) and combining this approach with ecological niche modelling should yield much 594 improved insights into species’ response to climatic change. These changes should be 26 595 supported by expanding the use of the method to other species, which have both genetic and 596 climatic data available. 597 6 Data Archiving Statement 598 Raw GBS data is available on NCBI’s Sequence Read Archive (SRA) as “BioProject” 599 PRJNA413625. 600 A docker image containing the analysis process, software and “assembled” data is available in 601 https://hub.docker.com/r/stunts/q.suber_gbs_data_analyses/. 602 The software pyRona is available in gitlab, and mirrored on github. 603 Acknowledgements 604 We would like to thank R. Nunes, A. S. Rodrigues, C. Ribeiro and I. Modesto, for their help 605 during sample collection. We would further like to thank the two anonymous reviewers for the 606 very through feedback they have provided. 607 Field and laboratory work, and bioinformatics platform were supported by Fundação para a 608 Ciência e Tecnologia (FCT) – Portugal [grant numbers SOBREIRO/0036/2009 (under the 609 framework of the Cork Oak ESTs Consortium) and UID/BIA/00329/2013 (2015-2018)]. F. 610 Pina-Martins was funded by FCT [grant number SFRH/BD/51411/2011 (under the PhD 611 program “Biology and Ecology of Global Changes”, Univ. Aveiro & Univ. Lisbon, Portugal)]. 612 7 Literature cited Aguilée, R., Raoul, G., Rousset, F., & Ronce, O. (2016). Pollen dispersal slows geographical range shift and accelerates ecological niche shift under climate change. Proceedings of the National Academy of Sciences, 113(39), E5741–E5748. https://doi.org/10.1073/pnas.1607612113 27 Aitken, S. N., & Bemmels, J. B. (2016). Time to get moving: assisted gene flow of forest trees. Evolutionary Applications, 9(1), 271–290. https://doi.org/10.1111/eva.12293 Aitken, S. N., Yeaman, S., Holliday, J. A., Wang, T., & Curtis-McLane, S. (2008). Adaptation, migration or extirpation: climate change outcomes for tree populations. Evolutionary Applications, 1(1), 95–111. https://doi.org/10.1111/j.1752-4571.2007.00013.x Alberto, F. J., Aitken, S. N., Alia, R., Gonzalez-Martinez, S. C., Hanninen, H., Kremer, A., … Savolainen, O. (2013). Potential for evolutionary responses to climate change - evidence from tree populations. Global Change Biology, 19(6), 1645–1661. https://doi.org/10.1111/gcb.12181 Alexander, L. W., & Woeste, K. E. (2014). Pyrosequencing of the northern red oak (Quercus rubra L.) chloroplast genome reveals high quality polymorphisms for population management. Tree Genetics & Genomes, 10(4), 803–812. https://doi.org/10.1007/s11295-013-0681-1 Altschul, S. F., Madden, T. L., Schäffer, A. A., Zhang, J., Zhang, Z., Miller, W., & Lipman, D. J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research, 25(17), 3389–3402. Bagnoli, F., Tsuda, Y., Fineschi, S., Bruschi, P., Magri, D., Zhelev, P., … Vendramin, G. G. (2016). Combining molecular and fossil data to infer demographic history of Quercus cerris: insights on European eastern glacial refugia. Journal of Biogeography, 43(4), 679–690. https://doi.org/10.1111/jbi.12673 Bazin, E., Dawson, K. J., & Beaumont, M. A. (2010). Likelihood-Free Inference of Population Structure and Local Adaptation in a Bayesian Hierarchical Model. Genetics, 185(2), 587–602. https://doi.org/10.1534/ genetics.109.112391 Benito Garzón, M., Sánchez de Dios, R., & Sainz Ollero, H. (2008). Effects of climate change on the distribution of Iberian tree species. Applied Vegetation Science, 11(2), 169–178. https://doi.org/10.3170/2008-718348 Berdan, E. L., Mazzoni, C. J., Waurick, I., Roehr, J. T., & Mayer, F. (2015). A population genomic scan in Chorthippus grasshoppers unveils previously unknown phenotypic divergence. Molecular Ecology, 24(15), 3918–3930. https://doi.org/10.1111/mec.13276 28 Berthouly-Salazar, C., Mariac, C., Couderc, M., Pouzadoux, J., Floc’h, J.-B., & Vigouroux, Y. (2016). Genotyping-by-Sequencing SNP Identification for Crops without a Reference Genome: Using Transcriptome Based Mapping as an Alternative Strategy. Frontiers in Plant Science, 7, 777. https://doi.org/10.3389/fpls.2016.00777 Boavida, L. C., Silva, J. P., & Feijó, J. A. (2001). Sexual reproduction in the cork oak (<Emphasis Type="Italic">Quercus suber</Emphasis> L). II. Crossing intra- and interspecific barriers. Sexual Plant Reproduction, 14(3), 143–152. https://doi.org/10.1007/s004970100100 Borelli, S., & Varela, M. C. (2000). Mediterranean Oaks Network: Report of the first meeting. In EUFORGEN Mediterranean Oaks Network: First meeting (p. 74). Antalya, Turkey: EUFORGEN. Retrieved from http://www.euforgen.org/publications/publication/mediterranean-oaks-network-report-of-the-firstmeeting/ Burgarella, C., Lorenzo, Z., Jabbour-Zahab, R., Lumaret, R., Guichoux, E., Petit, R. J., … Gil, L. (2009). Detection of hybrids in nature: application to oaks (Quercus suber and Q. ilex). Heredity, 102(5), 442– 452. https://doi.org/10.1038/hdy.2009.8 Cappa, E. P., El-Kassaby, Y. A., Garcia, M. N., Acuña, C., Borralho, N. M. G., Grattapaglia, D., & Marcucci Poltri, S. N. (2013). Impacts of Population Structure and Analytical Models in Genome-Wide Association Studies of Complex Traits in Forest Trees: A Case Study in Eucalyptus globulus. PLoS ONE, 8(11), e81267. https://doi.org/10.1371/journal.pone.0081267 Chen, J., Källman, T., Ma, X., Gyllenstrand, N., Zaina, G., Morgante, M., … Lascoux, M. (2012). Disentangling the Roles of History and Local Selection in Shaping Clinal Variation of Allele Frequencies and Gene Expression in Norway Spruce (Picea abies). Genetics, 191(3), 865–881. https://doi.org/10.1534/genetics.112.140749 Choi, M. N., Han, M., Lee, H., Park, H.-S., Kim, M.-Y., Kim, J.-S., … Park, E.-J. (2017). The complete mitochondrial genome sequence of Populus davidiana Dode. Mitochondrial DNA Part B, 2(1), 113– 114. https://doi.org/10.1080/23802359.2017.1289346 29 Chung, H. Y., Lee, T.-H., Kim, Y.-K., & Kim, J. S. (2017). Complete chloroplast genome sequences of Wonwhang (Pyrus pyrifolia) and its phylogenetic analysis. Mitochondrial DNA Part B, 2(1), 325–326. https://doi.org/10.1080/23802359.2017.1331328 Correia, R. A., Bugalho, M. N., Franco, A. M. A., & Palmeirim, J. M. (2017). Contribution of spatially explicit models to climate change adaptation and mitigation plans for a priority forest habitat. Mitigation and Adaptation Strategies for Global Change, 1–16. https://doi.org/10.1007/s11027-017-9738-z Costa, J., Miguel, C., Almeida, H., Oliveira, M. M., Matos, J. A., Simões, F., … Batista, D. (2011). Genetic divergence in Cork Oak based on cpDNA sequence data. BMC Proceedings, 5(Suppl 7), P13. https://doi.org/10.1186/1753-6561-5-S7-P13 Danecek, P., Auton, A., Abecasis, G., Albers, C. A., Banks, E., DePristo, M. A., … Group, 1000 Genomes Project Analysis. (2011). The variant call format and VCFtools. Bioinformatics, 27(15), 2156–2158. https://doi.org/10.1093/bioinformatics/btr330 De Kort, H., Vandepitte, K., Bruun, H. H., Closset-Kopp, D., Honnay, O., & Mergeay, J. (2014). Landscape genomics and a common garden trial reveal adaptive differentiation to temperature across Europe in the tree species Alnus glutinosa. Molecular Ecology, 23(19), 4709–4721. https://doi.org/10.1111/mec.12813 Eaton, D. A. R. (2014). PyRAD: assembly of de novo RADseq loci for phylogenetic analyses. Bioinformatics, 30(13), 1844–1849. https://doi.org/10.1093/bioinformatics/btu121 Edgar, R. C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 32(5), 1792–1797. https://doi.org/10.1093/nar/gkh340 Eidesen, P. B., Alsos, I. G., Popp, M., Stensrud, Ø., Suda, J., & Brochmann, C. (2007). Nuclear vs. plastid data: complex Pleistocene history of a circumpolar key species. Molecular Ecology, 16(18), 3902–3925. https://doi.org/10.1111/j.1365-294X.2007.03425.x Elshire, R. J., Glaubitz, J. C., Sun, Q., Poland, J. A., Kawamoto, K., Buckler, E. S., & Mitchell, S. E. (2011). A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species. PLOS ONE, 6(5), e19379. https://doi.org/10.1371/journal.pone.0019379 30 Escudero, M., Eaton, D. A. R., Hahn, M., & Hipp, A. L. (2014). Genotyping-by-sequencing as a tool to infer phylogeny and ancestral hybridization: A case study in Carex (Cyperaceae). Molecular Phylogenetics and Evolution, 79, 359–367. https://doi.org/10.1016/j.ympev.2014.06.026 Feder, J. L., Egan, S. P., & Nosil, P. (2012). The genomics of speciation-with-gene-flow. Trends in Genetics, 28(7), 342–350. https://doi.org/10.1016/j.tig.2012.03.009 Foll, M., & Gaggiotti, O. (2008). A genome-scan method to identify selected loci appropriate for both dominant and codominant markers: a Bayesian perspective. Genetics, 180(2), 977–993. https://doi.org/10.1534/genetics.108.092221 Foll, M., Gaggiotti, O. E., Daub, J. T., Vatsiou, A., & Excoffier, L. (2014). Widespread Signals of Convergent Adaptation to High Altitude in Asia and America. The American Journal of Human Genetics, 0(0). https://doi.org/10.1016/j.ajhg.2014.09.002 François, O., Martins, H., Caye, K., & Schoville, S. D. (2016). Controlling false discoveries in genome scans for selection. Molecular Ecology, 25(2), 454–469. https://doi.org/10.1111/mec.13513 Frichot, E., Schoville, S. D., Bouchard, G., & François, O. (2013). Testing for Associations between Loci and Environmental Gradients Using Latent Factor Mixed Models. Molecular Biology and Evolution, 30(7), 1687–1699. https://doi.org/10.1093/molbev/mst063 Gautier, M. (2015). Genome-Wide Scan for Adaptive Divergence and Association with Population-Specific Covariates. Genetics, genetics.115.181453. https://doi.org/10.1534/genetics.115.181453 Gienapp, P., Teplitsky, C., Alho, J. S., Mills, J. A., & Merilä, J. (2008). Climate change and evolution: disentangling environmental and genetic responses. Molecular Ecology, 17(1), 167–178. https://doi.org/ 10.1111/j.1365-294X.2007.03413.x Guichoux, E., Garnier-Géré, P., Lagache, L., Lang, T., Boury, C., & Petit, R. J. (2013). Outlier loci highlight the direction of introgression in oaks. Molecular Ecology, 22(2), 450–462. https://doi.org/10.1111/mec.12125 Günther, T., & Coop, G. (2013). Robust identification of local adaptation from allele frequencies. Genetics, 195(1), 205–220. https://doi.org/10.1534/genetics.113.152462 31 Hijmans, R. J., Cameron, S. E., Parra, J. L., Jones, P. G., & Jarvis, A. (2005). Very high resolution interpolated climate surfaces for global land areas. International Journal of Climatology, 25(15), 1965–1978. https:// doi.org/10.1002/joc.1276 Houtte, H. V., Vandesteene, L., Lopez-Galvis, L., Lemmens, L., Kissel, E., Carpentier, S., … Dijck, P. V. (2013). Over-expression of the trehalase gene AtTRE1 leads to increased drought stress tolerance in Arabidopsis and is involved in ABA-induced stomatal closure. Plant Physiology, pp.112.211391. https://doi.org/10.1104/pp.112.211391 IPCC. (2014). Climate Change 2014: Synthesis Report. Contribution of Working Groups I, II and III to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change. IPCC AR5 Synthesis Report Website, 151 pp. Jordan, R., Hoffmann, A. A., Dillon, S. K., & Prober, S. M. (2017). Evidence of genomic adaptation to climate in Eucalyptus microcarpa: Implications for adaptive potential to projected climate change. Molecular Ecology, 26(21), 6002–6020. https://doi.org/10.1111/mec.14341 Kirk, H., & Freeland, J. R. (2011). Applications and Implications of Neutral versus Non-neutral Markers in Molecular Ecology. International Journal of Molecular Sciences, 12(6), 3966–3988. https://doi.org/10.3390/ijms12063966 Kremer, A., Potts, B. M., & Delzon, S. (2014). Genetic divergence in forest trees: understanding the consequences of climate change. Functional Ecology, 28(1), 22–36. https://doi.org/10.1111/13652435.12169 Kremer, A., Ronce, O., Robledo-Arnuncio, J. J., Guillaume, F., Bohrer, G., Nathan, R., … Schueler, S. (2012). Long-distance gene flow and adaptation of forest trees to rapid climate change. Ecology Letters, 15(4), 378–392. https://doi.org/10.1111/j.1461-0248.2012.01746.x Lenormand, T. (2002). Gene flow and the limits to natural selection. Trends in Ecology & Evolution, 17(4), 183– 189. https://doi.org/10.1016/S0169-5347(02)02497-7 Li, T., Yang, H., Zhang, W., Xu, D., Dong, Q., Wang, F., … Li, C. (2017). Comparative transcriptome analysis of root hairs proliferation induced by water deficiency in maize. Journal of Plant Biology, 60(1), 26–34. https://doi.org/10.1007/s12374-016-0412-x 32 Lischer, H. E. L., & Excoffier, L. (2012). PGDSpider: an automated data conversion tool for connecting population genetics and genomics programs. Bioinformatics, 28(2), 298–299. https://doi.org/10.1093/bioinformatics/btr642 López de Heredia, U., Carrión, J. S., Jiménez, P., Collada, C., & Gil, L. (2007). Molecular and palaeoecological evidence for multiple glacial refugia for evergreen oaks on the Iberian Peninsula. Journal of Biogeography, 34(9), 1505–1517. https://doi.org/10.1111/j.1365-2699.2007.01715.x Magri, D., Fineschi, S., Bellarosa, R., Buonamici, A., Sebastiani, F., Schirone, B., … Vendramin, G. G. (2007). The distribution of Quercus suber chloroplast haplotypes matches the palaeogeographical history of the western Mediterranean. Molecular Ecology, 16(24), 5259–5266. https://doi.org/10.1111/j.1365294X.2007.03587.x McVean, G., & Spencer, C. C. (2006). Scanning the human genome for signals of selection. Current Opinion in Genetics & Development, 16(6), 624–629. https://doi.org/10.1016/j.gde.2006.09.004 Modesto, I. S., Miguel, C., Pina-Martins, F., Glushkova, M., Veloso, M., Paulo, O. S., & Batista, D. (2014). Identifying signatures of natural selection in cork oak (Quercus suber L.) genes through SNP analysis. Tree Genetics & Genomes, 10(6), 1645–1660. https://doi.org/10.1007/s11295-014-0786-1 Narum, S. R., & Hess, J. E. (2011). Comparison of FST outlier tests for SNP loci under selection. Molecular Ecology Resources, 11, 184–194. https://doi.org/10.1111/j.1755-0998.2011.02987.x Ohlemuller, R., Gritti, E. S., Sykes, M. T., & Thomas, C. D. (2006). Quantifying components of risk for European woody species under climate change. Global Change Biology, 12(9), 1788–1799. https://doi.org/10.1111/j.1365-2486.2006.01231.x Pais, A. L., Whetten, R. W., & Xiang, Q.-Y. (Jenny). (2017). Ecological genomics of local adaptation in Cornus florida L. by genotyping by sequencing. Ecology and Evolution, 7(1), 441–465. https://doi.org/10.1002/ ece3.2623 Petrov, M., & Genov, K. (2004). 50 Years of cork oak (Quercus suber L.) in Bulgaria. Forest Science, 3, 93–101. Pina-Martins, F., Silva, D., Fino, J., & Paulo, O. S. (2016). Structure_threader. Zenodo. https://doi.org/10.5281/zenodo.57262 33 Primack, R. B., Ibáñez, I., Higuchi, H., Lee, S. D., Miller-Rushing, A. J., Wilson, A. M., & Silander, J. A. (2009). Spatial and interspecific variability in phenological responses to warming temperatures. Biological Conservation, 142(11), 2569–2577. https://doi.org/10.1016/j.biocon.2009.06.003 Pritchard, J. K., Stephens, M., & Donnelly, P. (2000). Inference of population structure using multilocus genotype data. Genetics, 155(2), 945–959. Ramírez-Valiente, J. A., Valladares, F., & Aranda, I. (2014). Exploring the impact of neutral evolution on intrapopulation genetic differentiation in functional traits in a long-lived plant. Tree Genetics & Genomes, 10(5), 1181–1190. https://doi.org/10.1007/s11295-014-0752-y Ramírez-Valiente, Jose Alberto, Valladares, F., Huertas, A. D., Granados, S., & Aranda, I. (2011). Factors affecting cork oak growth under dry conditions: local adaptation and contrasting additive genetic variance within populations. Tree Genetics & Genomes, 7(2), 285–295. https://doi.org/10.1007/s11295010-0331-9 Raymond, O., Gouzy, J., Just, J., Badouin, H., Verdenaud, M., Lemainque, A., … Bendahmane, M. (2018). The Rosa genome provides new insights into the domestication of modern roses. Nature Genetics, 50(6), 772–777. https://doi.org/10.1038/s41588-018-0110-3 Rellstab, C., Gugerli, F., Eckert, A. J., Hancock, A. M., & Holderegger, R. (2015). A practical guide to environmental association analysis in landscape genomics. Molecular Ecology, 24(17), 4348–4370. https://doi.org/10.1111/mec.13322 Rellstab, C., Zoller, S., Walthert, L., Lesur, I., Pluess, A. R., Graf, R., … Gugerli, F. (2016). Signatures of local adaptation in candidate genes of oaks ( Quercus spp.) in respect to present and future climatic conditions. Molecular Ecology. https://doi.org/10.1111/mec.13889 Rognes, T., Flouri, T., Nichols, B., Quince, C., & Mahé, F. (2016). VSEARCH: a versatile open source tool for metagenomics. PeerJ, 4, e2584. https://doi.org/10.7717/peerj.2584 Rousset, F. (2008). genepop’007: a complete re-implementation of the genepop software for Windows and Linux. Molecular Ecology Resources, 8(1), 103–106. https://doi.org/10.1111/j.1471-8286.2007.01931.x Savolainen, O., Lascoux, M., & Merilä, J. (2013). Ecological genomics of local adaptation. Nature Reviews Genetics, 14(11), 807–820. https://doi.org/10.1038/nrg3522 34 Simeone, Cosimo, M., Papini, A., Vessella, F., Bellarosa, R., Spada, F., & Schirone, B. (2009). Multiple genome relationships and a complex biogeographic history in the eastern range of Quercus suber L. (Fagaceae) implied by nuclear and chloroplast DNA variation. Caryologia, 62(3), 236–252. Sork, V. L. (1984). Examination of Seed Dispersal and Survival in Red Oak, Quercus Rubra (Fagaceae), Using Metal-Tagged Acorns. Ecology, 65(3), 1020–1022. https://doi.org/10.2307/1938075 Sork, V. L., Fitz-Gibbon, S. T., Puiu, D., Crepeau, M., Gugger, P. F., Sherman, R., … Salzberg, S. L. (2016). First Draft Assembly and Annotation of the Genome of a California Endemic Oak Quercus lobata Née (Fagaceae). G3: Genes, Genomes, Genetics, 6(11), 3485–3495. https://doi.org/10.1534/g3.116.030411 Thuiller, W., Albert, C., Araújo, M. B., Berry, P. M., Cabeza, M., Guisan, A., … Zimmermann, N. E. (2008). Predicting global change impacts on plant species’ distributions: Future challenges. Perspectives in Plant Ecology, Evolution and Systematics, 9(3–4), 137–152. https://doi.org/10.1016/j.ppees.2007.09.004 Varela, M. C. (2000). Evaluation of genetic resources of cork oak for appropriate use in breeding and gene conservation strategies. EC FAIR Programme. Verity, R., & Nichols, R. A. (2016). Estimating the Number of Subpopulations (K) in Structured Populations. Genetics, 203(4), 1827–1839. https://doi.org/10.1534/genetics.115.180992 Vessella, F., López-Tirado, J., Simeone, M. C., Schirone, B., & Hidalgo, P. J. (2017). A tree species range in the face of climate change: cork oak as a study case for the Mediterranean biome. European Journal of Forest Research, 1–15. https://doi.org/10.1007/s10342-017-1055-2 Vitalis, R., Gautier, M., Dawson, K. J., & Beaumont, M. A. (2014). Detecting and Measuring Selection from Gene Frequency Data. Genetics, 196(3), 799–817. https://doi.org/10.1534/genetics.113.152991 Yang, Y., Zhou, T., Duan, D., Yang, J., Feng, L., & Zhao, G. (2016). Comparative Analysis of the Complete Chloroplast Genomes of Five Quercus Species. Frontiers in Plant Science, 7. https://doi.org/10.3389/fpls.2016.00959 Zoldos, V., Papes, D., Brown, S. C., Panaud, O., & Siljak-Yakovlev, S. (1998). Genome size and base composition of seven Quercus species: inter- and intra-population variation. Genome, 41(2), 162–168. https://doi.org/10.1139/g98-006 35 36 613 8 Tables Table 1: Coordinates and number of sampled individuals for every sampling site. Latitude (decimal deg.) Sample site Algeria Bulgaria Longitude (decimal deg.) Number of sampled individuals 36.5400 7.1500 5 41.43 23.17 6 Catalonia 41.8500 2.5333 5 Corsica 41.6167 8.9667 6 Haza de Lino 36.8333 -3.3000 5 Kenitra 34.0833 -6.5833 6 Landes 43.7500 -1.3333 5 Monchique 37.3167 -8.5667 6 Puglia 40.5667 17.6667 6 Sardinia 39.0833 8.8500 6 Sicilia 37.1167 14.5000 6 Sintra 38.7500 -9.4167 5 Taza 34.2000 -4.2500 5 Toledo 39.3667 -5.3500 5 Tunisia 36.9500 8.8500 6 Tuscany 42.4167 11.9500 6 Var 43.1333 6.2500 6 Total: - - 95 37 Table 2: Summary of BLAST hits for loci with SNPs associated to one or more environmental variables. “MTDQ” and “MTWQ” stand for “Mean Temperature of Driest Quarter” and “Mean Temperature of Wettest Quarter” respectively. SNP name Note (Similar to) Associations SNP 158 TRE1: Trehalase (Arabidopsis thaliana) Mean Temperature of Driest Quarter SNP 168 PER47: Peroxidase 47 (Arabidopsis thaliana) Precipitation of Driest Month SNP 233 CPSF160: Cleavage and polyadenylation specificity factor subunit 1 (Arabidopsis thaliana) Annual Mean Temperature SNP 333 Ascc1: Activating signal cointegrator 1 complex subunit 1 (Mus musculus) Mean Temperature of Driest Quarter SNP 455 GLCAT14A: Beta-glucuronosyltransferase GlcAT14A (Arabidopsis thaliana) Precipitation of Driest Month SNP 619 GBP6: Guanylate-binding protein 6 (Pongo abelii) Precipitation of Driest Month SNP 834 NAC098: Protein CUP-SHAPED COTYLEDON 2 (Arabidopsis thaliana) Longitude SNP 880 TPP1: Thylakoidal processing peptidase 1%2C chloroplastic (Arabidopsis thaliana) Mean Temperature of Warmest Quarter SNP 1134 SNP 1589 EMB2654: Pentatricopeptide repeat-containing protein At2g41720 (Arabidopsis thaliana) At1g19525: Pentatricopeptide repeat-containing protein At1g19525 (Arabidopsis thaliana) Mean Temperature of Driest Quarter Temperature Seasonality 38 614 9 Figures Figure 1: A map of cork oak (Quercus suber) distribution. Shaded land areas represent the species' range. White dots represent the sampling locations. Adapted from EUFORGEN 2009 (www.euforgen.org). Figure 2: MavericK clustering plots for K=2. Sampling sites are presented from West to East. “a” is the Q-value plot for the dataset with all loci, “b” is for the dataset with only “neutral” loci, and “c” if for the dataset with only “non-neutral” loci. 39 Figure 3: Risk of Non-Adaptedness plot for the three SNPs with most associations. Bars represent weighted means (by R² value) and lines represent standard error. (a) is the plot for RCP26 and (b) is for RCP85 prediction models. 40 41