[go: up one dir, main page]

Skip to content

Framework to benchmark germline copy-number variant detection tools from NGS data

License

Notifications You must be signed in to change notification settings

jpuntomarcos/CNVbenchmarkeR2

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CNVbenchmarkeR2

CNVbenchmarkeR2 is a framework to benchmark germline copy number variant (CNV) calling tools on multiple NGS datasets. Current version supports DECoN, CoNVaDING, panelcn.MOPS, ExomeDepth, CODEX2, ClinCNV, clearCNV, GATK-gCNV, Atlas-CNV, Cobalt, CNVkit and VisCap tools.

Previous version, CNVbenchmarkeR, is available here.

Prerequisites

Tools should be properly installed. Links for tools installation:

Also, R/Bioconductor should be installed including these packages: GenomicRanges, biomaRt, regioneR, vcfR, optparse.

How to use

  1. Get Code
git clone https://github.com/jpuntomarcos/CNVbenchmarkeR2 
  1. Configure tools.yaml to set which tools will be benchmarked.

  2. Set tool parameter values located at tools/[name_of_the_tool]/[name_of_the_tool]Params.yaml. Please, note that tool paths must be set for tools that are not R packages (DECoN, Convading, ClinCNV, clearCNV, GATK-gCNV, Atlas-CNV, Cobalt, CNVkit and Viscap).

  3. Configure datasets.yaml to define on which datasets the tools will be executed. Within this file, it is important to provide files with the exact expected format (special attention to validated_results_file and bed_file that are tab-delimited files). To do so, please check the examples folder.

  4. Launch CNVbenchmarkeR2

cd CNVbenchmarkerR2
Rscript runBenchmark.R [-t tools_yaml] [-d datasets_yaml] [-f include_temp_files]

Output

A summary file and a .csv results file will be generated at output/summary folder. Stats include sensitivity, specificity, no-call rate, precision (PPV), NPV, F1, MCC and kappa coefficient.

Statistics are calculated per ROI, per gene and at whole strategy level.

Logs files will be generated in the logs folder. Output for each tool and dataset will be generated at output folder.

Troubleshooting

Two important checks to ensure that metrics are computed correctly:

  • The sample names in the validated_results_file should match the file names of your bam files (excluding the .bam extension). For example, if the validated_results_file contains sample names like mySample2312, your bam files should have file names like mySample2312.bam .
  • Provide and use chromosomes names with the same format, that is, do not use "chr5" and "5" in you bed and validated_results_file files, for example.

Extra feature: evaluate parameters

A parameter evaluator is also included in the framework. It executes each tool parameter over a broad range of values to assess its impact on tool performance. Up to 15 values are evaluated for each numerical param and all the available options for categorical ones. The parameter evaluator facilitates understanding how individual parameters influence the overall performance of CNV calling tools.

Prerequisites

An SGE cluster system has to be available.

Run evaluate parameters

  1. Configure all the steps needed to run the benckmark (expained in section how to use)

  2. Before running any evaluations, you need to create subfolders that will contain the YAML files. These files define the settings values for each execution. Execute the setUpFolders script from the CNVbenchmarkeR2 directory.

Rscript evaluate_parameters/setUpFolders.R [-t tools_yaml] [-d datasets_yaml]
  1. Modify jobs.sh to set your SGE parameters and paths.

  2. Execute the runEvaluate script. This process involves calculating the values for each modified parameter within the selected tools and datasets. For parameters not explicitly modified, the script will apply default values. Execute the setUpFolders script from the CNVbenchmarkeR2 directory.

Rscript evaluate_parameters/runEvaluate.R [-t tools_file] [-d datasets_file] [-f keepTempFiles]

For space optimization, it is recommended to set the -f parameter to false, which deletes all intermediate files.

  1. After the evaluation completes, generate summary CSV files for a comprehensive overview. Again, ensure you are in the CNVbenchmarkeR2 directory and run the summaryEvaluate script:
Rscript evaluate_parameters/summaryEvaluate.R [-t tools_file] [-d datasets_file]

Each CSV file will contain a summary for every parameter in each dataset, stored in the following path: evaluate_parameters/tool/dataset/parameter/results-dataset_tool_param.csv.

About

Framework to benchmark germline copy-number variant detection tools from NGS data

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 99.3%
  • Shell 0.7%