Data Descriptor
Open access
Published: 14 June 2023

A draft genome assembly of reef-building octocoral Heliopora coerulea

Scientific Data volume 10, Article number: 381 (2023) Cite this article

2220 Accesses
3 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Coral reefs are under existential threat from climate change and anthropogenic impacts. Genomic studies have enhanced our knowledge of resilience and responses of some coral species to environmental stress, but reference genomes are lacking for many coral species. The blue coral Heliopora is the only reef-building octocoral genus and exhibits optimal growth at a temperature close to the bleaching threshold of scleractinian corals. Local and high-latitude expansions of Heliopora coerulea were reported in the last decade, but little is known about the molecular mechanisms underlying its thermal resistance. We generated a draft genome of H. coerulea with an assembled size of 429.9 Mb, scaffold N50 of 1.42 Mb and BUSCO completeness of 94.9%. The genome contains 239.1 Mb repetitive sequences, 27,108 protein coding genes, 6,225 lncRNAs, and 79 miRNAs. This reference genome provides a valuable resource for in-depth studies on the adaptive mechanisms of corals under climate change and the evolution of skeleton in cnidarian.

The earliest diverging extant scleractinian corals recovered by mitochondrial genomes

Article Open access 26 November 2020

Chromosome-level genome assembly and annotation of rare and endangered tropical bivalve, Tridacna crocea

Article Open access 10 February 2024

Whole genome sequencing of a novel sea anemone (Actinostola sp.) from a deep-sea hydrothermal vent

Article Open access 22 January 2024

Background & Summary

Coral reefs are one of the most diverse and productive ecosystems, which support more than one-quarter of marine life with less than 2% of the ocean floor¹. In recent decades, reef-building corals are threatened by anthropogenic climate change such as ocean warming and acidification^2,3, as well as local stressors such as overfishing, pollution, and coastal development^4,5,6. The world has lost almost 50% coral coverage since the 1950s⁷. With projected continued degradation of coral reefs, 90% of coral reefs may disappear in the next few decades^8,9,10.

The blue corals (Heliopora) are the only genus of octocorals that form a massive hard skeleton and symbiosis with zooxanthellae like scleractinian corals¹¹ (Fig. 1a). Due to their massive reef structure, blue corals are an important reef-building species in the Indo-West Pacific^11,12,13,14. H. coerulea, with a characteristic blue skeleton, had long been regarded as the only extant member of the family Helioporidae, until the recent description of H. hiberniana (with white skeleton) in northwestern Australia¹⁵. Recent studies based on RAD-seq and Genotyping by sequencing in blue corals revealed there are also two distinct lineages of H. coerulea in the Kuroshio Current region^16,17. Based on fossil records, the genus Heliopora were once widely distributed throughout the warm shallow oceans in the early Cretaceous^11,18 (<120 million years ago, MYA). To date, H. coerulea is distributed in the shallow warm waters of the Indo-Pacific oceans^11,17.

Heliopora coerulea is known to survive through bleaching events better than most scleractinian corals^15,19,20. Recently, this species has been reported to expand from the tropics to the high-latitude Tsukazaki, Japan²¹. A shift of dominant taxa from scleractinian corals to H. coerulea has been reported in reefs of Ishigaki island, Japan²² and the South China Sea side of the Philippines^14,23. In addition, laboratory experiments showed that H. coerulea had a higher growth rate when exposed at 31 °C – a temperature that would usually trigger the bleaching of scleractinian corals^7,8,9 – than at 26 °C²⁴.

To facilitate molecular studies of blue corals to understand their thermal resistance, here, we report a draft genome assembly of H. coerulea generated using long-read PacBio HiFi sequencing (Tables 1, 2). The assembled genome size of H. coerulea is 429.9 Mb, consisting of 769 contigs with an N50 of 1.42 Mb, GC content of 37.4%, and 55.6% repeat elements (Fig. 2). The genome contains a total of 27,108 protein-coding genes with 95.7% functional annotated by BLASTp search against the published protein databases. In addition, RNA sequencing shows that the H. coerulea genome contains 6,225 lncRNAs and 79 miRNAs.

Table 1 A summary of Heliopora coerulea genome, mRNA, lncRNA, and miRNA sequencing data.

Full size table

Table 2 Statisitcs of assembled genome after filtering with binning, BLAST, and heterozygous contigs.

Full size table

Methods

Sample collection

The blue coral was collected by SCUBA at 5 m depth from Green Island, Taiwan (22°40′37′′N 121°28′23′′E) in April 2018. Coral fragments were transported in seawater to Biodiversity Research Center, Academia Sinica, Taipei, where they were kept in a 5 L aerated aquarium. To avoid contamination by bacteria or algae in the water, the coral fragments were rinsed several times in Milli-Q water immediately prior to DNA and RNA sampling. Coral fragments were immediately fixed in liquid nitrogen for DNA extraction and genome sequencing, whilst tissues were fixed in RNAlater (Invitrogen, CA, USA) for RNA sequencing. All samples were stored at −80 °C in a freezer until subjected to extraction.

Genomic sequencing

Genomic DNA was extracted from the coral tissue using the CTAB method²⁵. DNA quality and quantity was measured using agarose gel electrophoresis and a Qubit fluorometer (Thermo Fisher Scientific, MA, USA), respectively. DNA samples were submitted to Novogene (Beijing, China) for library preparation and whole genome sequencing (Table 1). Briefly, 1 µg DNA was used to construct two libraries with 350-bp and 500-bp insert sizes using the NEBNext DNA Library Prep Kit (New England Biolabs, MA, USA), and sequenced on an Illumina HiSeq X Ten sequencer to generate 122.4 Gb paired-end reads with a read length of 150 bp. In addition, 10 µg DNA was used to construct a HiFi SMRTbell library using the SMRTbell Express Template Prep Kit 2.0, and sequenced on a PacBio Sequel II sequencer. Total of 31.8 Gb high-quality HiFi reads were produced using the circular consensus sequencing (CCS) mode on the PacBio long-read platform.

RNA sequencing

Total RNA was extracted from the coral tissue using TRIzol reagent (Thermo Fisher Scientific, MA, USA) by following the manufacturer’s protocol. The quality of the RNA samples was determined with agarose gel electrophoresis and the quantity was determined using a Qubit fluorometer (Thermo Fisher Scientific, MA, USA). RNA samples were submitted to Novogene (Beijing, China) for mRNA, long non-coding RNA (lncRNA), and microRNA (miRNA) sequencing (Table 1). mRNA library was constructed using Illumina NEBNext Ultra RNA Library Prep Kit (New England Biolabs, MA, USA) and sequenced using an Illumina HiSeq X Ten sequencer to produce 150-bp paired-end reads. For lncRNA, ribosomal RNA was depleted from total RNA using Epicentre Ribo-Zero rRNA Removal Kit (Epicentre, WI, USA). The cDNA libraries were prepared using the NEBNext Ultra RNA Library Prep Kit (New England Biolabs, MA, USA), and sequenced on an Illumina NovaSeq platform under the paired-end mode to produce 150-bp reads. In addition, miRNA libraries were prepared using the NEBNext Multiplex Small RNA Library Prep Kit (Illumina, CA, USA) and sequenced on an Illumina NovaSeq platform to produce 50-bp single-end reads.

Estimation of genome size

The genome size of H. coerulea was estimated using GenomeScope v2.0 with Illumina data²⁶. Adaptors and low-quality reads (quality score <30, length <40 bp) of the Illumina data were trimmed with Trimmomatic v0.38²⁷. To eliminate the zooxanthellae and prokaryotic reads, Illumina data were further filtered using bbmap.sh v39.01 (https://sourceforge.net/projects/bbmap/) against the Symbiodiniaceae genomes (Symbiodinium minutum, S. microadriaticum, S. kawagutii, and S goreaui) from ReefGenomics database (http://reefgenomics.org/) and NCBI Prokaryotic Refseq genomes with default settings. A total of 88.7 Gb Illumina reads were returned after quality filtering, and 77.9 Gb (87.8%) of them were from coral host. The clean Illumina data were used to generate a 21-kmer histogram using jellyfish v2.2.0²⁸, and then characterized using GenomeScope v2.0, which predicted the genome size of 428.2 Mb and heterozygosity of 0.73% at a k-mer size of 21 (Fig. 1b).

Genome assembly

De novo assembly of HiFi reads (N50 of 14.0 kb and mean length of 13.5 kb; Table 1) were performed using nextDenovo v2.5.0 (https://github.com/Nextomics/NextDenovo) under default settings. Algal and microbial sequences were removed by binning genome assembly with MetaBAT2 v2.15²⁹, and BLASTn v2.11.0 + search against the 14 cnidarian genomes in Table 4, four Symbiodiniaceae genomes from ReefGenomics database (http://reefgenomics.org/), and NCBI Prokaryotic Refseq genomes with an E-value threshold of 1e-20. The initial assembly generated 1,309.7 Mb metagenome sequences (Table 2). After binning, a total of 170 bins were identified and the “Bin167” with 600.2 Mb and >100X coverage of Illumina data was selected (Table 2 and S1). BLASTn analysis filtered the potential symbiont sequence and resulted in the 586.0 Mb genome with 2,248 contigs. Possible alternative heterozygous contigs were further eliminated using Purge Haplotigs v1.1.230³⁰ (Table 2). The completeness of the final genome assembly was assessed by analyzing the Benchmarking Universal Single-Copy Orthologs (BUSCO) v5.4.5 scores against the databases eukaryota_odb10 and eukaryota_odb10 under the genome mode³¹. QUAST v5.2 was used to assess the assembly statistics³². The total assembled size of the genome is 429.9 Mb in length and the N50 is 1.42 Mb (Table 3; Fig. 2).

Table 3 Genome assembly and annotation statistics of Heliopora coerulea.

Full size table

In addition, the mitogenome of H. coerulea was assembled with Illumina clean reads using Norgal v1.0 under the default settings³³, and annotated using MITOS2 online³⁴ and tBLASTn v2.11.0 + search against the published H. coerulea MT genome (GenBank: OL616236). The H. coerulea mitogenome is 18,957 bp in length with 14 protein-coding genes (Fig. 3), which is 100% identical with OL616236 in GenBank.

mRNA annotation

The protein coding genes of the H. coerulea genome were predicted using MAKER v3.0 pipeline³⁵ according to Ip et al.³⁶. In brief, repeat contents in the genome were identified using RepeatMasker v4.1.2-p1 (http://www.repeatmasker.org/; settings: “-e rmblast -s -gff”) with RepBase library version 20181026³⁷ and species-specific repeat libraries in RepeatModeler v2.0.3³⁸ under the “LTRStruct” option and the default setting for other parameters. A total of 239.1 Mb (55.6%) of the H. coerulea genome consists of repetitive sequences, including 30.6% transposable elements, 21.8% unclassified repeats, and 3.1% simple repeats and low complexity sequences (Table 3 and Fig. 2).

Raw mRNA reads were trimmed using Trimmomatic v0.38²⁷ (quality score <30, length <40 bp). The clean reads were de novo and genome-guided assembled using Trinity v2.5.1³⁹ under the default settings. Cnidaria protein sequences from UniProt database were used as protein evidence. Augustus v3.4⁴⁰ and SNAP v2006-07-28⁴¹ were used for ab initio gene prediction. All predicted gene models were integrated into a consensus weighted annotation with EVidenceModeler v1.1.1⁴² under the default settings in Maker3. In addition, PASA v2.4.1 was used to improve the Maker result using the de novo transcriptome⁴³. Finally, we obtained 27,108 predicted protein-coding genes with an N50 of 1,754 bp (Table 3).

The BUSCO completeness of predicted gene models was assessed against eukaryota_odb10 and metazoa_odb10 datasets³¹ under the protein mode. The predicted genes were functionally annotated using Diamond v2.0.13.151 BLASTp⁴⁴ against UniProt and Swissport databases under the “ultra-sensitive” option and an E-value threshold of 1e-5. Gene functional annotation was conducted using eggNOG-mapper v2⁴⁵ for Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and Pfam domain.

lncRNA annotation

The raw lncRNA reads were filtered to remove adapter and low-quality reads (quality score <30, length <40 bp) using Trimmomatic v0.38²⁷. The clean lncRNA reads were mapped to the H. coerulea genome using HISAT2 v2.1.0⁴⁶ under the default settings. The resulting bam files were then assembled into transcript models using StringTie v1.3.4d⁴⁷ under the default settings. The assembled transcripts were processed through FlExible Extraction of LncRNAs (FEELnc) v0.2.1⁴⁸ for lncRNA identification and classification. Briefly, the script FEELnc filter.pl was used to remove transcripts with one exon, a size < 200 bp, and overlapping with predicted protein-coding regions. The coding potential score of each candidate transcript was calculated using the script FELLnc_codpot.pl under the shuffle mode. Finally, the FEELnc_classifier.pl was used to classify potential lncRNA with respect to the localization and the direction of transcription of nearby protein-coding genes. A total of 6,225 lncRNA genes were predicted in the H. coerulea genome (Tables S2, S3).

miRNA annotation

miRNA analysis was conducted according to Ip et al.³⁶. Briefly, raw miRNA reads were trimmed with fastp v0.20.0⁴⁹ under the settings of length_required = 18, max_length = 35, unqualified_percent_limit = 30, n_base_limit = 0. The clean reads were then combined and mapped to the genome using the mapper.pl script in miRDeep2 v2.0.1.2⁵⁰ using bowtie v1.2.2⁵¹. miRNAs were predicted using the miRDeep2.pl script in miRDeep2 with the Cnidaria mature miRNAs from miRBase v22.1⁵². The predicted miRNAs were filtered with a miRDeep2 score ≥ 4, star (complementary) and mature read count ≥ 5, and a significant Randfold p-value. The target genes of miRNAs were predicted using miRanda v3.3a⁵³ with a miRanda score ≥ 140, a dimer binding free energy < −5 kcalmol⁻¹, and strict 5′ seed pairing. In total, we detected 79 miRNA candidates ranging from 20 to 24 nt in length, and 10,636 mRNAs were predicted as their potential targets (Tables S4, S5).

Phylogeny, divergence, and gene family analyses

Orthologous groups among H. coerulea and 13 anthozoans with the outgroup species Hydra vulgaris (details in Table 4 and Table S6) were identified using OrthoFinder v2.5.4 under the “diamond_ultra_sens” option⁵⁴. A total of 407 single-copy genes were aligned using MUSCLE v3.8.31⁵⁵ and trimmed using TrimAL v1.4⁵⁶. The aligned sequences with 91,426 amino acid positions and 1.1–13.9% gaps were concatenated for phylogenetic analysis using a maximum-likelihood method implemented in IQ-TREE v2.13⁵⁷, with the best model of Q.insect + F + I + G4 and 1000 bootstrapping replicates. MCMCtree implemented in PAML v4.9h⁵⁸ was used to estimate divergence times using the burn-in, sample frequency and number of samples of 10000000, 1000 and 10000, respectively. The node calibration among cnidarians was based on fossil records (i.e., ~55 MYA for Acropora⁵⁹, ~145 MYA for Helioporacea¹⁸, ~540 MYA for Hexacorallia⁶⁰) and TIMETREE database⁶¹ (i.e., Edwardsiidae for 280 – 490 MYA, Anthozoa for 520 – 740 MYA). Using the orthologous results, we performed the gene family expansion and contraction for each node using CAFÉ v4.2⁶². These analyses revealed that H. coerulea is sister to the soft coral Dendronephthya gigantea, which split during Triassic (~216 MYA, 95% confidence interval of 157–301 MYA; Fig. 4). This D. gigantea + H. coerulea clade is then sister to the Hexacorallia clade, consistent with a previous phylogenetic analysis of 234 anthozoans⁶³. Gene family analysis detected 167 expanded and 61 contracted gene families in H. coerulea (Fig. 4; Table S7).

Table 4 Assembly statistics of 15 cnidarian genomes.

Full size table

Data Records

The Illumina, PacBio HiFi, and RNAseq data have been deposited in NCBI Sequence Read Archive with accession number SRR23530023⁶⁴, SRR23530024⁶⁵, SRR23530025⁶⁶, SRR23530026⁶⁷, SRR23530027⁶⁸, SRR23530028⁶⁹, SRR23530029⁷⁰, SRR23530030⁷¹, and SRR23530031⁷², under Bioproject accession number PRJNA936655. The genome assembly has been deposited at GenBank with accession number JASJOG000000000⁷³. The genome annotation (“Hco_maker_PASA_Final.gff”) and predicted genes (“Hco_v1.transcript.fasta” and “Hco_v1.protein.fasta”), lncRNA (“Hco_lncRNA.fasta”), and miRNA (“Hco_miRNA_mature.fasta”) has been deposited in the Figshare database⁷⁴.

Technical Validation

The quality of H. coerulea genome assembly was assessed by several approaches: (i) comparison with the estimated genome size, which is also ~430 Mb in total length (Figs. 1b, 2); (ii) obtaining the complete mitogenome, which is 100% identical in size and gene order with a published mitogenome of the same species (GenBank: OL616236; Fig. 3); (iii) conducting QUAST analysis, which showed that the assembly statistics of H. coerulea is comparable with published cnidarian genomes (Table 4); (iv) conducting BUSCO analysis, which identified 98.4% eukaryotic BUSCOs and 94.4% metazoan BUSCOs in the H. coerulea genome, and 98.4% eukaryotic BUSCOs and 95.3% metazoan BUSCOs in its predicted gene models (Table 4); (v) conducting the analysis of genome coverage using SAMtools v1.15.1⁷⁵, which showed 100% genome coverage and 91.4% mapping rate of PacBio HiFi reads, and 94.8% genome coverage and 88.4% mapping rate of Illumina short reads (Table 3). These results indicated the H. coerulea assembly is of high-quality.

Code availability

All bioinformatic tools used in this study were executed according to the corresponding manual and protocols. The version and code and parameters of the main bioinformatic tools are described below.

(1) Trimmomatic v0.38, parameters used: “PE -phred33 ILLUMINACLIP:TruSeq. 3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:30 MINLEN:40”.

(2) jellyfish v2.2.0, parameters used: “-C -m 21”.

(3) GenomeScope v.2.0, parameters used: ploidy 2 and kmer_length 21.

(4) nextDenovo v2.5.0, parameters used: default.

(5) Purge Haplotigs v1.1.2, parameters used: default.

(5) MetaBAT v 2.12.1, parameters used: default.

(6) BLASTn v2.11.0+, parameters used: “-evalue 1e-20 -max_target_seqs. 1”.

(8) BUSCO v5.4.5, parameters used: lineage_dataset eukaryota_odb10 (255 BUSCOs) and metazoa_odb10 (954 BUSCOs).

(9) Norgal v1.0, parameters used: default.

(10) MAKER v3.0, parameters used: default.

(11) RepeatMasker v4.1.2-p1, parameters used: “-e rmblast -s -gff”, Database: Dfam v3.1 and RepBaseRepeatMaskerEdition-20181026.

(12) RepeatModeler v 2.0.3, parameters used: “-LTRStruct”.

(13) Trinity v2.5.1, parameters used: default.

(14) Augustus, version 3.4.0, parameters used: species = Database trained with BUSCO.

(15) SNAP v2006-07-28, parameters used: default.

(16) EVidenceModeler v1.1.1, parameters used: default settings in Maker3.

(17) PASA v2.4.1, parameters used: “-C -R -T–ALIGNERS blat”.

Augustus, version 3.4.0, parameters used: species = Database trained with BUSCO, alternatives-from-evidence = true, hintsfile = Output of RepeatMasker.

(18) Diamond v2.0.13.151 BLASTp, parameters used: “-ultra-sensitive -max-target-seqs. 1 -evalue 1e-5”.

(19) HISAT2 v2.1.0, parameters used: default.

(20) StringTie v1.3.4d, parameters used: default.

(21) FEELnc v0.2.1, parameters used: default.

(22) fastp v0.20.0, parameters used: “length_required = 18, max_length = 35, unqualified_percent_limit = 30, n_base_limit = 0”.

(23) miRDeep2 v2.0.1.2, parameters used: default.

(24) miRanda v3.3a, parameters used: “-sc 140 -en -5 -strict”.

(25) OrthoFinder v2.5.4, parameters used: “-S diamond_ultra_sens”.

(26) IQ-TREE v2.1.3, parameters used: “-m TEST -bb 1000”.

(27) MCMCtree implemented in PAML v4.9 h, parameters used: Tree topology from IQ-TREE result, fossil records in Fig. 4, burn-in: 10000000, sample frequency: 1000, and number of samples: 10000.

(28) CAFÉ v4.2, parameters used: default.

(29) QUAST v5.2, parameters used: default.

(30) bbmap v39.01, parameters used: bbsplit.sh and mapPacBio.sh with default settings.

(31) SAMtools v1.15.1, parameters used: command = coverage, depth, with default settings.

References

Knowlton, N. et al. Coral reef biodiversity. in Life in the World’s Oceans: Diversity, Distribution, And Abundance (ed. Mclntyre, A.) Ch. 4 (Wiley-Blackwell, 2010).
Hoegh-Guldberg, O., Poloczanska, E. S., Skirving, W. & Dove, S. Coral reef ecosystems under climate change and ocean acidification. Front. Mar. Sci. 4, 158 (2017).
Article Google Scholar
Anthony, K. R. et al. Ocean acidification and warming will lower coral reef resilience. Glob. Chang. Biol. 17, 1798–808 (2011).
Article ADS PubMed Central Google Scholar
Brodie, J. E. et al. Terrestrial pollutant runoff to the great barrier reef: an update of issues, priorities and management responses. Mar. Pollut. Bull. 65, 81–100 (2012).
Article CAS PubMed Google Scholar
Baum, G., Januar, H. I., Ferse, S. C. & Kunzmann, A. Local and regional impacts of pollution on coral reefs along the Thousand Islands north of the megacity Jakarta, Indonesia. PLoS One 10, e0138271 (2015).
Article PubMed PubMed Central Google Scholar
Magesh, N. S. & Krishnakumar, S. The Gulf of Mannar marine biosphere reserve, southern India. In World seas: an environmental evaluation (ed. Sheppard, C.) Ch. 8 (Cambridge: Academic Press, 2019).
Eddy, T. D. et al. Global decline in capacity of coral reefs to provide ecosystem services. One Earth 4, 1278–1285 (2021).
Article ADS Google Scholar
Hoegh-Guldberg, O. et al. Impacts of 1.5 C global warming on natural and human systems. Global warming of 1.5 °C (IPCC Special Report, 2018).
Hoegh-Guldberg, O., Kennedy, E. V., Beyer, H. L., McClennen, C. & Possingham, H. P. Securing a long-term future for coral reefs. Trends Ecol. Evol. 33, 936–944 (2018).
Article PubMed Google Scholar
Hughes, T. P. et al. Spatial and temporal patterns of mass bleaching of corals in the Anthropocene. Science 359, 80–83 (2018).
Article ADS CAS PubMed Google Scholar
Zann, L. P. & Bolton, L. The distribution, abundance and ecology of the blue coral Heliopora coerulea (Pallas) in the Pacific. Coral reefs 4, 125–134 (1985).
Article ADS Google Scholar
Abe, M. et al. Report of the Survey of Heliopora coerulea Communities in Oura Bay, Okinawa (in Japanese) (2008).
Takino, T. et al. Discovery of a large population of Heliopora coerulea at Akaishi reef, Ishigaki Island, southwest Japan. Galaxea J. Coral Reef Stud. 12, 85–86 (2010).
Article Google Scholar
Atrigenio, M. P., Conaco, C., Guzman, C., Yap, H. T. & Aliño, P. M. Distribution and abundance of Heliopora coerulea (Cnidaria: Coenothecalia) and notes on its aggressive behavior against scleractinian corals: Temperature mediated? Reg. Stud. Mar. Sci. 40, 101502 (2020).
Google Scholar
Richards, Z. T. et al. Integrated evidence reveals a new species in the ancient blue coral genus Heliopora (Octocorallia). Sci. Rep. 8, 15875 (2018).
Article ADS PubMed PubMed Central Google Scholar
Iguchi, A. et al. RADseq population genomics confirms divergence across closely related species in blue coral (Heliopora coerulea). BMC Evol. Biol. 19, 1–7 (2019).
Article CAS Google Scholar
Taninaka, H. et al. Phylogeography of blue corals (genus Heliopora) across the Indo-West Pacific. Front. Mar. Sci. 8, 926 (2021).
Article Google Scholar
Eguchi, M. Fossil Helioporidae from Japan and the South Sea Islands. J. Paleontol. 362–364 (1948).
Harii, S., Kayanne, H., Takigawa, H., Hayashibara, T. & Yamamoto, M. Larval survivorship, competency periods and settlement of two brooding corals, Heliopora coerulea and Pocillopora damicornis. Mar. Biol. 141, 39–46 (2002).
Article Google Scholar
Kayanne, H., Harii, S., Ide, Y. & Akimoto, F. Recovery of coral populations after the 1998 bleaching on Shiraho Reef, in the southern Ryukyus, NW Pacific. Mar. Ecol. Prog. Ser. 239, 93–103 (2002).
Article ADS Google Scholar
Nakabayashi, A., Matsumoto, T., Kitano, Y. F., Nagai, S. & Yasuda, N. Discovery of the northernmost habitat of the blue coral Heliopora coerulea: possible range expansion due to climate change? Galaxea J. Coral Reef Stud. 19, 1–2 (2017).
Article Google Scholar
Harii, S., Hongo, C., Ishihara, M., Ide, Y. & Kayanne, H. Impacts of multiple disturbances on coral communities at Ishigaki Island, Okinawa, Japan, during a 15 year survey. Mar. Ecol. Prog. Ser. 509, 171–180 (2014).
Article ADS Google Scholar
Atrigenio, M., Aliño, P. & Conaco, C. Influence of the Blue coral Heliopora coerulea on scleractinian coral larval recruitment. J. Mar. Biol. 2017, 1–5 (2017).
Article Google Scholar
Guzman, C., Atrigenio, M., Shinzato, C., Aliño, P. & Conaco, C. Warm seawater temperature promotes substrate colonization by the blue coral, Heliopora coerulea. PeerJ 7, e7785 (2019).
Article PubMed PubMed Central Google Scholar
Porebski, S., Bailey, L. G. & Baum, B. R. Modification of a CTAB DNA extraction protocol for plants containing high polysaccharide and polyphenol components. Plant Mol. Biol. Rep. 15, 8–15 (1997).
Article CAS Google Scholar
Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat. Commun. 11, 1432 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Article CAS PubMed PubMed Central Google Scholar
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011).
Article PubMed PubMed Central Google Scholar
Kang, D. D. et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ 7, e7359 (2019).
Article PubMed PubMed Central Google Scholar
Roach, M. J., Schmidt, S. A. & Borneman, A. R. Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies. BMC Bioinformatics 19, 1–10 (2018).
Article Google Scholar
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Article PubMed Google Scholar
Gurevich, A., Saveliev, V., Vyahhi, N. & Tesler, G. QUAST: quality assessment tool for genome assemblies. Bioinformatics 29, 1072–1075 (2013).
Article CAS PubMed PubMed Central Google Scholar
Al-Nakeeb, K., Petersen, T. N. & Sicheritz-Pontén, T. Norgal: extraction and de novo assembly of mitochondrial DNA from whole-genome sequencing data. BMC Bioinformatics 18, 1–7 (2017).
Article Google Scholar
Donath, A. et al. Improved annotation of protein-coding genes boundaries in metazoan mitochondrial genomes. Nucleic Acids Res. 47, 10543–10552 (2019).
Article CAS PubMed PubMed Central Google Scholar
Cantarel, B. L. et al. MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 18, 188–196 (2008).
Article CAS PubMed PubMed Central Google Scholar
Ip, J. C. H. et al. Host-Endosymbiont Genome Integration in a Deep-Sea Chemosymbiotic Clam. Mol. Biol. Evol. 38, 502–518 (2021).
Article CAS PubMed Google Scholar
Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob. DNA. 6, 11 (2015).
Article PubMed PubMed Central Google Scholar
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA 117, 9451–9457 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 8, 1494–1512 (2013).
Article CAS PubMed Google Scholar
Stanke, M. & Morgenstern, B. AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Res. 33, W465–W467 (2005).
Article CAS PubMed PubMed Central Google Scholar
Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59 (2004).
Article PubMed PubMed Central Google Scholar
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 (2008).
Article PubMed PubMed Central Google Scholar
Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 31, 5654–5666 (2003).
Article CAS PubMed PubMed Central Google Scholar
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
Article CAS PubMed Google Scholar
Huerta-Cepas, J. et al. Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Mol. Biol. Evol. 34, 2115–2122 (2017).
Article CAS PubMed PubMed Central Google Scholar
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357 (2015).
Article CAS PubMed PubMed Central Google Scholar
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).
Article CAS PubMed PubMed Central Google Scholar
Wucher, V. et al. FEELnc: a tool for long non-coding RNA annotation and its application to the dog transcriptome. Nucleic Acids Res. 45, e57–e57 (2017).
ADS CAS PubMed PubMed Central Google Scholar
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Article PubMed PubMed Central Google Scholar
Friedländer, M. R., Mackowiak, S. D., Li, N., Chen, W. & Rajewsky, N. miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades. Nucleic Acids Res. 40, 37–52 (2011).
Article PubMed PubMed Central Google Scholar
Langmead, B. Aligning short sequencing reads with Bowtie. Curr. Protoc. Bioinformatics 32, 11.17. 11–11.17. 14 (2010).
Article Google Scholar
Kozomara, A., Birgaoanu, M. & Griffiths-Jones, S. miRBase: from microRNA sequences to function. Nucleic Acids Res. 47, D155–D162 (2018).
Article PubMed Central Google Scholar
Enright, A. et al. MicroRNA targets in Drosophila. Genome Biol. 4, 1–27 (2003).
Article Google Scholar
Emms, D. M. & Kelly, S. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol. 16, 157 (2015).
Article PubMed PubMed Central Google Scholar
Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
Article CAS PubMed PubMed Central Google Scholar
Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).
Article PubMed PubMed Central Google Scholar
Nguyen, L. T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2014).
Article PubMed PubMed Central Google Scholar
Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).
Article CAS PubMed Google Scholar
Medina, M., Collins, A. G., Takaoka, T. L., Kuehl, J. V. & Boore, J. L. Naked corals: skeleton loss in Scleractinia. Proc. Natl. Acad. Sci. USA 103, 9096–100 (2006).
Article ADS CAS PubMed PubMed Central Google Scholar
Han, J. et al. Tiny sea anemone from the Lower Cambrian of China. PLoS One 5, e13276 (2010).
Article ADS PubMed PubMed Central Google Scholar
Hedges, S. B., Dudley, J. & Kumar, S. TimeTree: a public knowledge-base of divergence times among organisms. Bioinformatics 22, 2971–2972 (2006).
Article CAS PubMed Google Scholar
Han, M. V., Thomas, G. W., Lugo-Martinez, J. & Hahn, M. W. Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3. Mol. Biol. Evol. 30, 1987–1997 (2013).
Article CAS PubMed Google Scholar
Quattrini, A. M. et al. Palaeoclimate ocean conditions shaped the evolution of corals and their skeletons through deep time. Nat. Ecol. Evol. 4, 1531–1538 (2020).
Article PubMed Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR23530023 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR23530024 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR23530025 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR23530026 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR23530027 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR23530028 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR23530029 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR23530030 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR23530031 (2023).
NCBI GenBank https://identifiers.org/nucleotide:JASJOG000000000 (2023).
Ip, J. et al. A draft genome assembly of reef-buliding octocoral Heliopora coerulea. Figshare https://doi.org/10.6084/m9.figshare.22093037 (2023).
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Article PubMed PubMed Central Google Scholar
Jeon, Y. et al. The draft genome of an octocoral, Dendronephthya gigantea. Genome Biol. Evol. 11, 949–953 (2019).
Article CAS PubMed PubMed Central Google Scholar
Stephens, T. G. et al. High-quality genome assembles from key Hawaiian coral species. GigaScience 11, giac098 (2022).
Article PubMed PubMed Central Google Scholar
Shinzato, C. et al. Eighteen coral genomes reveal the evolutionary origin of Acropora strategies to accommodate environmental changes. Mol. Biol. Evol. 1, 16–30 (2021).
Article Google Scholar

Download references

Acknowledgements

This work was supported by Hong Kong Baptist University’s Start-up Grant for New Academics (162780), Environmental and Conservation Fund of Hong Kong SAR (122/2022), the Key Special Project for Introduced Talents Team of Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou) (GML2019ZD0404), and the General Research Fund of Hong Kong SAR Government’s University Grants Committee (12102018). B.K.K.C. was supported by a grant for the Senior Investigator Award, Academia Sinica, Taiwan (AS-IA-105-L03).

Author information

Authors and Affiliations

Department of Biology, Hong Kong Baptist University, Kowloon Tong, Hong Kong
Jack Chi-Ho Ip & Jian-Wen Qiu
Biodiversity Research Center, Academia Sinica, Taipei, Taiwan
Ming-Hay Ho & Benny K. K. Chan

Authors

Jack Chi-Ho Ip
View author publications
You can also search for this author in PubMed Google Scholar
Ming-Hay Ho
View author publications
You can also search for this author in PubMed Google Scholar
Benny K. K. Chan
View author publications
You can also search for this author in PubMed Google Scholar
Jian-Wen Qiu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.C.H.I. and J.W.Q. designed research. B.K.K.C. and M.J.H. collected the samples and cultured them in the laboratory. J.C.H.I. conducted genomic extraction, assembled, annotated genome, and data analyses. J.C.H.I., J.W.Q. and B.K.K.C. drafted the manuscript. All authors edited the manuscript and approved the submission.

Corresponding authors

Correspondence to Jack Chi-Ho Ip or Jian-Wen Qiu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

SUPPLEMENTARY INFORMATION

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ip, J.CH., Ho, MH., Chan, B.K.K. et al. A draft genome assembly of reef-building octocoral Heliopora coerulea. Sci Data 10, 381 (2023). https://doi.org/10.1038/s41597-023-02291-z

Download citation

Received: 02 March 2023
Accepted: 31 May 2023
Published: 14 June 2023
DOI: https://doi.org/10.1038/s41597-023-02291-z

A draft genome assembly of reef-building octocoral Heliopora coerulea

Subjects

Abstract

Similar content being viewed by others

The earliest diverging extant scleractinian corals recovered by mitochondrial genomes

Chromosome-level genome assembly and annotation of rare and endangered tropical bivalve, Tridacna crocea

Whole genome sequencing of a novel sea anemone (Actinostola sp.) from a deep-sea hydrothermal vent

Background & Summary

Methods

Sample collection

Genomic sequencing

RNA sequencing

Estimation of genome size

Genome assembly

mRNA annotation

lncRNA annotation

miRNA annotation

Phylogeny, divergence, and gene family analyses

Data Records

Technical Validation

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

SUPPLEMENTARY INFORMATION

Rights and permissions

About this article

Cite this article

Search

Quick links

Subjects

Abstract

Similar content being viewed by others

The earliest diverging extant scleractinian corals recovered by mitochondrial genomes

Chromosome-level genome assembly and annotation of rare and endangered tropical bivalve, Tridacna crocea

Whole genome sequencing of a novel sea anemone (Actinostola sp.) from a deep-sea hydrothermal vent

Background & Summary

Methods

Sample collection

Genomic sequencing

RNA sequencing

Estimation of genome size

Genome assembly

mRNA annotation

lncRNA annotation

miRNA annotation

Phylogeny, divergence, and gene family analyses

Data Records

Technical Validation

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

SUPPLEMENTARY INFORMATION

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links