diff --git a/v1.0/Base.yml b/v1.0/Base.yml new file mode 100644 index 00000000..18f715d0 --- /dev/null +++ b/v1.0/Base.yml @@ -0,0 +1,484 @@ +$base: "https://w3id.org/cwl/cwl#" + +$namespaces: + cwl: "https://w3id.org/cwl/cwl#" + sld: "https://w3id.org/cwl/salad#" + +$graph: + +- name: CWLType + type: enum + extends: "sld:PrimitiveType" + symbols: + - cwl:File + - cwl:Directory + doc: + - "Extends primitive types with the concept of a file and directory as a builtin type." + - "File: A File object" + - "Directory: A Directory object" + +- name: CWLArraySchema + type: record + extends: "sld:ArraySchema" + fields: + items: + type: + - PrimitiveType + - CWLRecordSchema + - EnumSchema + - CWLArraySchema + - string + - type: array + items: + - PrimitiveType + - CWLRecordSchema + - EnumSchema + - CWLArraySchema + - string + jsonldPredicate: + _id: "sld:items" + _type: "@vocab" + refScope: 2 + doc: "Defines the type of the array elements." + +- name: CWLRecordField + type: record + extends: "sld:RecordField" + fields: + - name: type + type: + - PrimitiveType + - CWLRecordSchema + - EnumSchema + - CWLArraySchema + - string + - type: array + items: + - PrimitiveType + - CWLRecordSchema + - EnumSchema + - CWLArraySchema + - string + jsonldPredicate: + _id: sld:type + _type: "@vocab" + typeDSL: true + refScope: 2 + doc: | + The field type + +- name: CWLRecordSchema + type: record + extends: "sld:RecordSchema" + fields: + fields: + type: CWLRecordField[]? + jsonldPredicate: + _id: sld:fields + mapSubject: name + mapPredicate: type + doc: "Defines the fields of the record." + +- name: File + type: record + docParent: "#CWLType" + doc: | + Represents a file (or group of files when `secondaryFiles` is provided) that + will be accessible by tools using standard POSIX file system call API such as + open(2) and read(2). + + Files are represented as objects with `class` of `File`. File objects have + a number of properties that provide metadata about the file. + + The `location` property of a File is a URI that uniquely identifies the + file. Implementations must support the file:// URI scheme and may support + other schemes such as http://. The value of `location` may also be a + relative reference, in which case it must be resolved relative to the URI + of the document it appears in. Alternately to `location`, implementations + must also accept the `path` property on File, which must be a filesystem + path available on the same host as the CWL runner (for inputs) or the + runtime environment of a command line tool execution (for command line tool + outputs). + + If no `location` or `path` is specified, a file object must specify + `contents` with the UTF-8 text content of the file. This is a "file + literal". File literals do not correspond to external resources, but are + created on disk with `contents` with when needed for a executing a tool. + Where appropriate, expressions can return file literals to define new files + on a runtime. The maximum size of `contents` is 64 kilobytes. + + The `basename` property defines the filename on disk where the file is + staged. This may differ from the resource name. If not provided, + `basename` must be computed from the last path part of `location` and made + available to expressions. + + The `secondaryFiles` property is a list of File or Directory objects that + must be staged in the same directory as the primary file. It is an error + for file names to be duplicated in `secondaryFiles`. + + The `size` property is the size in bytes of the File. It must be computed + from the resource and made available to expressions. The `checksum` field + contains a cryptographic hash of the file content for use it verifying file + contents. Implementations may, at user option, enable or disable + computation of the `checksum` field for performance or other reasons. + However, the ability to compute output checksums is required to pass the + CWL conformance test suite. + + When executing a CommandLineTool, the files and secondary files may be + staged to an arbitrary directory, but must use the value of `basename` for + the filename. The `path` property must be file path in the context of the + tool execution runtime (local to the compute node, or within the executing + container). All computed properties should be available to expressions. + File literals also must be staged and `path` must be set. + + When collecting CommandLineTool outputs, `glob` matching returns file paths + (with the `path` property) and the derived properties. This can all be + modified by `outputEval`. Alternately, if the file `cwl.output.json` is + present in the output, `outputBinding` is ignored. + + File objects in the output must provide either a `location` URI or a `path` + property in the context of the tool execution runtime (local to the compute + node, or within the executing container). + + When evaluating an ExpressionTool, file objects must be referenced via + `location` (the expression tool does not have access to files on disk so + `path` is meaningless) or as file literals. It is legal to return a file + object with an existing `location` but a different `basename`. The + `loadContents` field of ExpressionTool inputs behaves the same as on + CommandLineTool inputs, however it is not meaningful on the outputs. + + An ExpressionTool may forward file references from input to output by using + the same value for `location`. + + fields: + - name: class + type: + type: enum + name: File_class + symbols: + - cwl:File + jsonldPredicate: + _id: "@type" + _type: "@vocab" + doc: Must be `File` to indicate this object describes a file. + - name: location + type: string? + doc: | + An IRI that identifies the file resource. This may be a relative + reference, in which case it must be resolved using the base IRI of the + document. The location may refer to a local or remote resource; the + implementation must use the IRI to retrieve file content. If an + implementation is unable to retrieve the file content stored at a + remote resource (due to unsupported protocol, access denied, or other + issue) it must signal an error. + + If the `location` field is not provided, the `contents` field must be + provided. The implementation must assign a unique identifier for + the `location` field. + + If the `path` field is provided but the `location` field is not, an + implementation may assign the value of the `path` field to `location`, + then follow the rules above. + jsonldPredicate: + _id: "@id" + _type: "@id" + - name: path + type: string? + doc: | + The local host path where the File is available when a CommandLineTool is + executed. This field must be set by the implementation. The final + path component must match the value of `basename`. This field + must not be used in any other context. The command line tool being + executed must be able to to access the file at `path` using the POSIX + `open(2)` syscall. + + As a special case, if the `path` field is provided but the `location` + field is not, an implementation may assign the value of the `path` + field to `location`, and remove the `path` field. + + If the `path` contains [POSIX shell metacharacters](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02) + (`|`,`&`, `;`, `<`, `>`, `(`,`)`, `$`,`` ` ``, `\`, `"`, `'`, + ``, ``, and ``) or characters + [not allowed](http://www.iana.org/assignments/idna-tables-6.3.0/idna-tables-6.3.0.xhtml) + for [Internationalized Domain Names for Applications](https://tools.ietf.org/html/rfc6452) + then implementations may terminate the process with a + `permanentFailure`. + jsonldPredicate: + "_id": "cwl:path" + "_type": "@id" + - name: basename + type: string? + doc: | + The base name of the file, that is, the name of the file without any + leading directory path. The base name must not contain a slash `/`. + + If not provided, the implementation must set this field based on the + `location` field by taking the final path component after parsing + `location` as an IRI. If `basename` is provided, it is not required to + match the value from `location`. + + When this file is made available to a CommandLineTool, it must be named + with `basename`, i.e. the final component of the `path` field must match + `basename`. + jsonldPredicate: "cwl:basename" + - name: dirname + type: string? + doc: | + The name of the directory containing file, that is, the path leading up + to the final slash in the path such that `dirname + '/' + basename == + path`. + + The implementation must set this field based on the value of `path` + prior to evaluating parameter references or expressions in a + CommandLineTool document. This field must not be used in any other + context. + - name: nameroot + type: string? + doc: | + The basename root such that `nameroot + nameext == basename`, and + `nameext` is empty or begins with a period and contains at most one + period. For the purposess of path splitting leading periods on the + basename are ignored; a basename of `.cshrc` will have a nameroot of + `.cshrc`. + + The implementation must set this field automatically based on the value + of `basename` prior to evaluating parameter references or expressions. + - name: nameext + type: string? + doc: | + The basename extension such that `nameroot + nameext == basename`, and + `nameext` is empty or begins with a period and contains at most one + period. Leading periods on the basename are ignored; a basename of + `.cshrc` will have an empty `nameext`. + + The implementation must set this field automatically based on the value + of `basename` prior to evaluating parameter references or expressions. + - name: checksum + type: string? + doc: | + Optional hash code for validating file integrity. Currently must be in the form + "sha1$ + hexadecimal string" using the SHA-1 algorithm. + - name: size + type: + - "null" + - int + - long + doc: Optional file size + - name: "secondaryFiles" + type: + - "null" + - type: array + items: [File, Directory] + jsonldPredicate: "cwl:secondaryFiles" + doc: | + A list of additional files or directories that are associated with the + primary file and must be transferred alongside the primary file. + Examples include indexes of the primary file, or external references + which must be included when loading primary document. A file object + listed in `secondaryFiles` may itself include `secondaryFiles` for + which the same rules apply. + - name: format + type: string? + jsonldPredicate: + _id: cwl:format + _type: "@id" + identity: true + noLinkCheck: true + doc: | + The format of the file: this must be an IRI of a concept node that + represents the file format, preferrably defined within an ontology. + If no ontology is available, file formats may be tested by exact match. + + Reasoning about format compatability must be done by checking that an + input file format is the same, `owl:equivalentClass` or + `rdfs:subClassOf` the format required by the input parameter. + `owl:equivalentClass` is transitive with `rdfs:subClassOf`, e.g. if + ` owl:equivalentClass ` and ` owl:subclassOf ` then infer + ` owl:subclassOf `. + + File format ontologies may be provided in the "$schemas" metadata at the + root of the document. If no ontologies are specified in `$schemas`, the + runtime may perform exact file format matches. + - name: contents + type: string? + doc: | + File contents literal. Maximum of 64 KiB. + + If neither `location` nor `path` is provided, `contents` must be + non-null. The implementation must assign a unique identifier for the + `location` field. When the file is staged as input to CommandLineTool, + the value of `contents` must be written to a file. + + If `loadContents` of `inputBinding` or `outputBinding` is true and + `location` is valid, the implementation must read up to the first 64 + KiB of text from the file and place it in the "contents" field. + + +- name: Directory + type: record + docAfter: "#File" + doc: | + Represents a directory to present to a command line tool. + + Directories are represented as objects with `class` of `Directory`. Directory objects have + a number of properties that provide metadata about the directory. + + The `location` property of a Directory is a URI that uniquely identifies + the directory. Implementations must support the file:// URI scheme and may + support other schemes such as http://. Alternately to `location`, + implementations must also accept the `path` property on Directory, which + must be a filesystem path available on the same host as the CWL runner (for + inputs) or the runtime environment of a command line tool execution (for + command line tool outputs). + + A Directory object may have a `listing` field. This is a list of File and + Directory objects that are contained in the Directory. For each entry in + `listing`, the `basename` property defines the name of the File or + Subdirectory when staged to disk. If `listing` is not provided, the + implementation must have some way of fetching the Directory listing at + runtime based on the `location` field. + + If a Directory does not have `location`, it is a Directory literal. A + Directory literal must provide `listing`. Directory literals must be + created on disk at runtime as needed. + + The resources in a Directory literal do not need to have any implied + relationship in their `location`. For example, a Directory listing may + contain two files located on different hosts. It is the responsibility of + the runtime to ensure that those files are staged to disk appropriately. + Secondary files associated with files in `listing` must also be staged to + the same Directory. + + When executing a CommandLineTool, Directories must be recursively staged + first and have local values of `path` assigend. + + Directory objects in CommandLineTool output must provide either a + `location` URI or a `path` property in the context of the tool execution + runtime (local to the compute node, or within the executing container). + + An ExpressionTool may forward file references from input to output by using + the same value for `location`. + + Name conflicts (the same `basename` appearing multiple times in `listing` + or in any entry in `secondaryFiles` in the listing) is a fatal error. + + fields: + - name: class + type: + type: enum + name: Directory_class + symbols: + - cwl:Directory + jsonldPredicate: + _id: "@type" + _type: "@vocab" + doc: Must be `Directory` to indicate this object describes a Directory. + - name: location + type: string? + doc: | + An IRI that identifies the directory resource. This may be a relative + reference, in which case it must be resolved using the base IRI of the + document. The location may refer to a local or remote resource. If + the `listing` field is not set, the implementation must use the + location IRI to retrieve directory listing. If an implementation is + unable to retrieve the directory listing stored at a remote resource (due to + unsupported protocol, access denied, or other issue) it must signal an + error. + + If the `location` field is not provided, the `listing` field must be + provided. The implementation must assign a unique identifier for + the `location` field. + + If the `path` field is provided but the `location` field is not, an + implementation may assign the value of the `path` field to `location`, + then follow the rules above. + jsonldPredicate: + _id: "@id" + _type: "@id" + - name: path + type: string? + doc: | + The local path where the Directory is made available prior to executing a + CommandLineTool. This must be set by the implementation. This field + must not be used in any other context. The command line tool being + executed must be able to to access the directory at `path` using the POSIX + `opendir(2)` syscall. + + If the `path` contains [POSIX shell metacharacters](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02) + (`|`,`&`, `;`, `<`, `>`, `(`,`)`, `$`,`` ` ``, `\`, `"`, `'`, + ``, ``, and ``) or characters + [not allowed](http://www.iana.org/assignments/idna-tables-6.3.0/idna-tables-6.3.0.xhtml) + for [Internationalized Domain Names for Applications](https://tools.ietf.org/html/rfc6452) + then implementations may terminate the process with a + `permanentFailure`. + jsonldPredicate: + _id: "cwl:path" + _type: "@id" + - name: basename + type: string? + doc: | + The base name of the directory, that is, the name of the file without any + leading directory path. The base name must not contain a slash `/`. + + If not provided, the implementation must set this field based on the + `location` field by taking the final path component after parsing + `location` as an IRI. If `basename` is provided, it is not required to + match the value from `location`. + + When this file is made available to a CommandLineTool, it must be named + with `basename`, i.e. the final component of the `path` field must match + `basename`. + jsonldPredicate: "cwl:basename" + - name: listing + type: + - "null" + - type: array + items: [File, Directory] + doc: | + List of files or subdirectories contained in this directory. The name + of each file or subdirectory is determined by the `basename` field of + each `File` or `Directory` object. It is an error if a `File` shares a + `basename` with any other entry in `listing`. If two or more + `Directory` object share the same `basename`, this must be treated as + equivalent to a single subdirectory with the listings recursively + merged. + jsonldPredicate: + _id: "cwl:listing" + + +- name: CWLObjectType + type: union + names: + - boolean + - int + - long + - float + - double + - string + - File + - Directory + - type: array + items: + - "null" + - CWLObjectType + - type: map + values: + - "null" + - CWLObjectType + doc: | + Generic type representing a valid CWL object. It is used to represent + `default` values passed to CWL `InputParameter` and `WorkflowStepInput` + record fields. + + +- name: CWLInputFile + type: map + values: + - "null" + - CWLObjectType + doc: | + Type representing a valid CWL input file as a `map`. + jsonldPredicate: + _id: "cwl:inputfile" + _container: "@list" + noLinkCheck: true diff --git a/v1.0/CommandLineTool.yml b/v1.0/CommandLineTool.yml index e2aad5bd..5b9eb0f2 100644 --- a/v1.0/CommandLineTool.yml +++ b/v1.0/CommandLineTool.yml @@ -193,12 +193,14 @@ $graph: fields: - name: position type: int? + default: 0 doc: "The sorting key. Default position is 0." - name: prefix type: string? doc: "Command line prefix to add before the value." - name: separate type: boolean? + default: true doc: | If true (default), then the prefix and value must be added as separate command line arguments; if false, prefix and value must be concatenated @@ -231,6 +233,7 @@ $graph: - name: shellQuote type: boolean? + default: true doc: | If `ShellCommandRequirement` is in the requirements for the current command, this controls whether the value is quoted on the command line (default is true). @@ -549,7 +552,11 @@ $graph: jsonldPredicate: "_id": "@type" "_type": "@vocab" - type: string + type: + type: enum + name: CommandLineTool_class + symbols: + - cwl:CommandLineTool - name: baseCommand doc: | Specifies the program to execute. If an array, the first element of @@ -674,7 +681,11 @@ $graph: fields: - name: class - type: string + type: + type: enum + name: DockerRequirement_class + symbols: + - cwl:DockerRequirement doc: "Always 'DockerRequirement'" jsonldPredicate: "_id": "@type" @@ -713,7 +724,11 @@ $graph: the defined process. fields: - name: class - type: string + type: + type: enum + name: SoftwareRequirement_class + symbols: + - cwl:SoftwareRequirement doc: "Always 'SoftwareRequirement'" jsonldPredicate: "_id": "@type" @@ -741,6 +756,7 @@ $graph: compatible. - name: specs type: string[]? + jsonldPredicate: {_type: "@id", noLinkCheck: true} doc: | One or more [IRI](https://en.wikipedia.org/wiki/Internationalized_Resource_Identifier)s identifying resources for installing or enabling the software named in @@ -821,6 +837,7 @@ $graph: file. - name: writable type: boolean? + default: false doc: | If true, the file or directory must be writable by the tool. Changes to the file or directory must be isolated and not visible by any other @@ -841,7 +858,11 @@ $graph: command line tool. fields: - name: class - type: string + type: + type: enum + name: InitialWorkDirRequirement_class + symbols: + - cwl:InitialWorkDirRequirement doc: InitialWorkDirRequirement jsonldPredicate: "_id": "@type" @@ -877,7 +898,11 @@ $graph: execution environment of the tool. See `EnvironmentDef` for details. fields: - name: class - type: string + type: + type: enum + name: EnvVarRequirement_class + symbols: + - cwl:EnvVarRequirement doc: "Always 'EnvVarRequirement'" jsonldPredicate: "_id": "@type" @@ -903,7 +928,11 @@ $graph: the use of shell metacharacters such as `|` for pipes. fields: - name: class - type: string + type: + type: enum + name: ShellCommandRequirement_class + symbols: + - cwl:ShellCommandRequirement doc: "Always 'ShellCommandRequirement'" jsonldPredicate: "_id": "@type" @@ -937,13 +966,17 @@ $graph: fields: - name: class - type: string + type: + type: enum + name: ResourceRequirement_class + symbols: + - cwl:ResourceRequirement doc: "Always 'ResourceRequirement'" jsonldPredicate: "_id": "@type" "_type": "@vocab" - name: coresMin - type: ["null", long, string, Expression] + type: ["null", int, long, string, Expression] doc: Minimum reserved number of CPU cores - name: coresMax @@ -951,11 +984,11 @@ $graph: doc: Maximum reserved number of CPU cores - name: ramMin - type: ["null", long, string, Expression] + type: ["null", int, long, string, Expression] doc: Minimum reserved RAM in mebibytes (2**20) - name: ramMax - type: ["null", long, string, Expression] + type: ["null", int, long, string, Expression] doc: Maximum reserved RAM in mebibytes (2**20) - name: tmpdirMin @@ -963,13 +996,13 @@ $graph: doc: Minimum reserved filesystem based storage for the designated temporary directory, in mebibytes (2**20) - name: tmpdirMax - type: ["null", long, string, Expression] + type: ["null", int, long, string, Expression] doc: Maximum reserved filesystem based storage for the designated temporary directory, in mebibytes (2**20) - name: outdirMin - type: ["null", long, string, Expression] + type: ["null", int, long, string, Expression] doc: Minimum reserved filesystem based storage for the designated output directory, in mebibytes (2**20) - name: outdirMax - type: ["null", long, string, Expression] + type: ["null", int, long, string, Expression] doc: Maximum reserved filesystem based storage for the designated output directory, in mebibytes (2**20) diff --git a/v1.0/Process.yml b/v1.0/Process.yml index 73f94361..b0a7ed04 100644 --- a/v1.0/Process.yml +++ b/v1.0/Process.yml @@ -13,6 +13,8 @@ $graph: - $import: "salad/schema_salad/metaschema/metaschema_base.yml" +- $import: Base.yml + - name: BaseTypesDoc type: documentation doc: | @@ -38,378 +40,6 @@ $graph: - cwl:v1.0.dev4 - cwl:v1.0 -- name: CWLType - type: enum - extends: "sld:PrimitiveType" - symbols: - - cwl:File - - cwl:Directory - doc: - - "Extends primitive types with the concept of a file and directory as a builtin type." - - "File: A File object" - - "Directory: A Directory object" - -- name: File - type: record - docParent: "#CWLType" - doc: | - Represents a file (or group of files when `secondaryFiles` is provided) that - will be accessible by tools using standard POSIX file system call API such as - open(2) and read(2). - - Files are represented as objects with `class` of `File`. File objects have - a number of properties that provide metadata about the file. - - The `location` property of a File is a URI that uniquely identifies the - file. Implementations must support the file:// URI scheme and may support - other schemes such as http://. The value of `location` may also be a - relative reference, in which case it must be resolved relative to the URI - of the document it appears in. Alternately to `location`, implementations - must also accept the `path` property on File, which must be a filesystem - path available on the same host as the CWL runner (for inputs) or the - runtime environment of a command line tool execution (for command line tool - outputs). - - If no `location` or `path` is specified, a file object must specify - `contents` with the UTF-8 text content of the file. This is a "file - literal". File literals do not correspond to external resources, but are - created on disk with `contents` with when needed for a executing a tool. - Where appropriate, expressions can return file literals to define new files - on a runtime. The maximum size of `contents` is 64 kilobytes. - - The `basename` property defines the filename on disk where the file is - staged. This may differ from the resource name. If not provided, - `basename` must be computed from the last path part of `location` and made - available to expressions. - - The `secondaryFiles` property is a list of File or Directory objects that - must be staged in the same directory as the primary file. It is an error - for file names to be duplicated in `secondaryFiles`. - - The `size` property is the size in bytes of the File. It must be computed - from the resource and made available to expressions. The `checksum` field - contains a cryptographic hash of the file content for use it verifying file - contents. Implementations may, at user option, enable or disable - computation of the `checksum` field for performance or other reasons. - However, the ability to compute output checksums is required to pass the - CWL conformance test suite. - - When executing a CommandLineTool, the files and secondary files may be - staged to an arbitrary directory, but must use the value of `basename` for - the filename. The `path` property must be file path in the context of the - tool execution runtime (local to the compute node, or within the executing - container). All computed properties should be available to expressions. - File literals also must be staged and `path` must be set. - - When collecting CommandLineTool outputs, `glob` matching returns file paths - (with the `path` property) and the derived properties. This can all be - modified by `outputEval`. Alternately, if the file `cwl.output.json` is - present in the output, `outputBinding` is ignored. - - File objects in the output must provide either a `location` URI or a `path` - property in the context of the tool execution runtime (local to the compute - node, or within the executing container). - - When evaluating an ExpressionTool, file objects must be referenced via - `location` (the expression tool does not have access to files on disk so - `path` is meaningless) or as file literals. It is legal to return a file - object with an existing `location` but a different `basename`. The - `loadContents` field of ExpressionTool inputs behaves the same as on - CommandLineTool inputs, however it is not meaningful on the outputs. - - An ExpressionTool may forward file references from input to output by using - the same value for `location`. - - fields: - - name: class - type: - type: enum - name: File_class - symbols: - - cwl:File - jsonldPredicate: - _id: "@type" - _type: "@vocab" - doc: Must be `File` to indicate this object describes a file. - - name: location - type: string? - doc: | - An IRI that identifies the file resource. This may be a relative - reference, in which case it must be resolved using the base IRI of the - document. The location may refer to a local or remote resource; the - implementation must use the IRI to retrieve file content. If an - implementation is unable to retrieve the file content stored at a - remote resource (due to unsupported protocol, access denied, or other - issue) it must signal an error. - - If the `location` field is not provided, the `contents` field must be - provided. The implementation must assign a unique identifier for - the `location` field. - - If the `path` field is provided but the `location` field is not, an - implementation may assign the value of the `path` field to `location`, - then follow the rules above. - jsonldPredicate: - _id: "@id" - _type: "@id" - - name: path - type: string? - doc: | - The local host path where the File is available when a CommandLineTool is - executed. This field must be set by the implementation. The final - path component must match the value of `basename`. This field - must not be used in any other context. The command line tool being - executed must be able to to access the file at `path` using the POSIX - `open(2)` syscall. - - As a special case, if the `path` field is provided but the `location` - field is not, an implementation may assign the value of the `path` - field to `location`, and remove the `path` field. - - If the `path` contains [POSIX shell metacharacters](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02) - (`|`,`&`, `;`, `<`, `>`, `(`,`)`, `$`,`` ` ``, `\`, `"`, `'`, - ``, ``, and ``) or characters - [not allowed](http://www.iana.org/assignments/idna-tables-6.3.0/idna-tables-6.3.0.xhtml) - for [Internationalized Domain Names for Applications](https://tools.ietf.org/html/rfc6452) - then implementations may terminate the process with a - `permanentFailure`. - jsonldPredicate: - "_id": "cwl:path" - "_type": "@id" - - name: basename - type: string? - doc: | - The base name of the file, that is, the name of the file without any - leading directory path. The base name must not contain a slash `/`. - - If not provided, the implementation must set this field based on the - `location` field by taking the final path component after parsing - `location` as an IRI. If `basename` is provided, it is not required to - match the value from `location`. - - When this file is made available to a CommandLineTool, it must be named - with `basename`, i.e. the final component of the `path` field must match - `basename`. - jsonldPredicate: "cwl:basename" - - name: dirname - type: string? - doc: | - The name of the directory containing file, that is, the path leading up - to the final slash in the path such that `dirname + '/' + basename == - path`. - - The implementation must set this field based on the value of `path` - prior to evaluating parameter references or expressions in a - CommandLineTool document. This field must not be used in any other - context. - - name: nameroot - type: string? - doc: | - The basename root such that `nameroot + nameext == basename`, and - `nameext` is empty or begins with a period and contains at most one - period. For the purposess of path splitting leading periods on the - basename are ignored; a basename of `.cshrc` will have a nameroot of - `.cshrc`. - - The implementation must set this field automatically based on the value - of `basename` prior to evaluating parameter references or expressions. - - name: nameext - type: string? - doc: | - The basename extension such that `nameroot + nameext == basename`, and - `nameext` is empty or begins with a period and contains at most one - period. Leading periods on the basename are ignored; a basename of - `.cshrc` will have an empty `nameext`. - - The implementation must set this field automatically based on the value - of `basename` prior to evaluating parameter references or expressions. - - name: checksum - type: string? - doc: | - Optional hash code for validating file integrity. Currently must be in the form - "sha1$ + hexadecimal string" using the SHA-1 algorithm. - - name: size - type: long? - doc: Optional file size - - name: "secondaryFiles" - type: - - "null" - - type: array - items: [File, Directory] - jsonldPredicate: "cwl:secondaryFiles" - doc: | - A list of additional files or directories that are associated with the - primary file and must be transferred alongside the primary file. - Examples include indexes of the primary file, or external references - which must be included when loading primary document. A file object - listed in `secondaryFiles` may itself include `secondaryFiles` for - which the same rules apply. - - name: format - type: string? - jsonldPredicate: - _id: cwl:format - _type: "@id" - identity: true - doc: | - The format of the file: this must be an IRI of a concept node that - represents the file format, preferrably defined within an ontology. - If no ontology is available, file formats may be tested by exact match. - - Reasoning about format compatability must be done by checking that an - input file format is the same, `owl:equivalentClass` or - `rdfs:subClassOf` the format required by the input parameter. - `owl:equivalentClass` is transitive with `rdfs:subClassOf`, e.g. if - ` owl:equivalentClass ` and ` owl:subclassOf ` then infer - ` owl:subclassOf `. - - File format ontologies may be provided in the "$schemas" metadata at the - root of the document. If no ontologies are specified in `$schemas`, the - runtime may perform exact file format matches. - - name: contents - type: string? - doc: | - File contents literal. Maximum of 64 KiB. - - If neither `location` nor `path` is provided, `contents` must be - non-null. The implementation must assign a unique identifier for the - `location` field. When the file is staged as input to CommandLineTool, - the value of `contents` must be written to a file. - - If `loadContents` of `inputBinding` or `outputBinding` is true and - `location` is valid, the implementation must read up to the first 64 - KiB of text from the file and place it in the "contents" field. - - -- name: Directory - type: record - docAfter: "#File" - doc: | - Represents a directory to present to a command line tool. - - Directories are represented as objects with `class` of `Directory`. Directory objects have - a number of properties that provide metadata about the directory. - - The `location` property of a Directory is a URI that uniquely identifies - the directory. Implementations must support the file:// URI scheme and may - support other schemes such as http://. Alternately to `location`, - implementations must also accept the `path` property on Directory, which - must be a filesystem path available on the same host as the CWL runner (for - inputs) or the runtime environment of a command line tool execution (for - command line tool outputs). - - A Directory object may have a `listing` field. This is a list of File and - Directory objects that are contained in the Directory. For each entry in - `listing`, the `basename` property defines the name of the File or - Subdirectory when staged to disk. If `listing` is not provided, the - implementation must have some way of fetching the Directory listing at - runtime based on the `location` field. - - If a Directory does not have `location`, it is a Directory literal. A - Directory literal must provide `listing`. Directory literals must be - created on disk at runtime as needed. - - The resources in a Directory literal do not need to have any implied - relationship in their `location`. For example, a Directory listing may - contain two files located on different hosts. It is the responsibility of - the runtime to ensure that those files are staged to disk appropriately. - Secondary files associated with files in `listing` must also be staged to - the same Directory. - - When executing a CommandLineTool, Directories must be recursively staged - first and have local values of `path` assigend. - - Directory objects in CommandLineTool output must provide either a - `location` URI or a `path` property in the context of the tool execution - runtime (local to the compute node, or within the executing container). - - An ExpressionTool may forward file references from input to output by using - the same value for `location`. - - Name conflicts (the same `basename` appearing multiple times in `listing` - or in any entry in `secondaryFiles` in the listing) is a fatal error. - - fields: - - name: class - type: - type: enum - name: Directory_class - symbols: - - cwl:Directory - jsonldPredicate: - _id: "@type" - _type: "@vocab" - doc: Must be `Directory` to indicate this object describes a Directory. - - name: location - type: string? - doc: | - An IRI that identifies the directory resource. This may be a relative - reference, in which case it must be resolved using the base IRI of the - document. The location may refer to a local or remote resource. If - the `listing` field is not set, the implementation must use the - location IRI to retrieve directory listing. If an implementation is - unable to retrieve the directory listing stored at a remote resource (due to - unsupported protocol, access denied, or other issue) it must signal an - error. - - If the `location` field is not provided, the `listing` field must be - provided. The implementation must assign a unique identifier for - the `location` field. - - If the `path` field is provided but the `location` field is not, an - implementation may assign the value of the `path` field to `location`, - then follow the rules above. - jsonldPredicate: - _id: "@id" - _type: "@id" - - name: path - type: string? - doc: | - The local path where the Directory is made available prior to executing a - CommandLineTool. This must be set by the implementation. This field - must not be used in any other context. The command line tool being - executed must be able to to access the directory at `path` using the POSIX - `opendir(2)` syscall. - - If the `path` contains [POSIX shell metacharacters](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02) - (`|`,`&`, `;`, `<`, `>`, `(`,`)`, `$`,`` ` ``, `\`, `"`, `'`, - ``, ``, and ``) or characters - [not allowed](http://www.iana.org/assignments/idna-tables-6.3.0/idna-tables-6.3.0.xhtml) - for [Internationalized Domain Names for Applications](https://tools.ietf.org/html/rfc6452) - then implementations may terminate the process with a - `permanentFailure`. - jsonldPredicate: - _id: "cwl:path" - _type: "@id" - - name: basename - type: string? - doc: | - The base name of the directory, that is, the name of the file without any - leading directory path. The base name must not contain a slash `/`. - - If not provided, the implementation must set this field based on the - `location` field by taking the final path component after parsing - `location` as an IRI. If `basename` is provided, it is not required to - match the value from `location`. - - When this file is made available to a CommandLineTool, it must be named - with `basename`, i.e. the final component of the `path` field must match - `basename`. - jsonldPredicate: "cwl:basename" - - name: listing - type: - - "null" - - type: array - items: [File, Directory] - doc: | - List of files or subdirectories contained in this directory. The name - of each file or subdirectory is determined by the `basename` field of - each `File` or `Directory` object. It is an error if a `File` shares a - `basename` with any other entry in `listing`. If two or more - `Directory` object share the same `basename`, this must be treated as - equivalent to a single subdirectory with the listings recursively - merged. - jsonldPredicate: - _id: "cwl:listing" - name: SchemaBase type: record @@ -474,6 +104,7 @@ $graph: - name: streamable type: boolean? + default: false doc: | Only valid when `type: File` or is an array of `items: File`. @@ -536,13 +167,13 @@ $graph: - name: InputRecordField type: record - extends: "sld:RecordField" + extends: CWLRecordField specialize: - - specializeFrom: "sld:RecordSchema" + - specializeFrom: CWLRecordSchema specializeTo: InputRecordSchema - specializeFrom: "sld:EnumSchema" specializeTo: InputEnumSchema - - specializeFrom: "sld:ArraySchema" + - specializeFrom: CWLArraySchema specializeTo: InputArraySchema - specializeFrom: "sld:PrimitiveType" specializeTo: CWLType @@ -558,9 +189,9 @@ $graph: - name: InputRecordSchema type: record - extends: ["sld:RecordSchema", InputSchema] + extends: [CWLRecordSchema, InputSchema] specialize: - - specializeFrom: "sld:RecordField" + - specializeFrom: CWLRecordField specializeTo: InputRecordField fields: - name: name @@ -582,13 +213,13 @@ $graph: - name: InputArraySchema type: record - extends: ["sld:ArraySchema", InputSchema] + extends: [CWLArraySchema, InputSchema] specialize: - - specializeFrom: "sld:RecordSchema" + - specializeFrom: CWLRecordSchema specializeTo: InputRecordSchema - specializeFrom: "sld:EnumSchema" specializeTo: InputEnumSchema - - specializeFrom: "sld:ArraySchema" + - specializeFrom: CWLArraySchema specializeTo: InputArraySchema - specializeFrom: "sld:PrimitiveType" specializeTo: CWLType @@ -600,13 +231,13 @@ $graph: - name: OutputRecordField type: record - extends: "sld:RecordField" + extends: CWLRecordField specialize: - - specializeFrom: "sld:RecordSchema" + - specializeFrom: CWLRecordSchema specializeTo: OutputRecordSchema - specializeFrom: "sld:EnumSchema" specializeTo: OutputEnumSchema - - specializeFrom: "sld:ArraySchema" + - specializeFrom: CWLArraySchema specializeTo: OutputArraySchema - specializeFrom: "sld:PrimitiveType" specializeTo: CWLType @@ -618,10 +249,10 @@ $graph: - name: OutputRecordSchema type: record - extends: ["sld:RecordSchema", "#OutputSchema"] + extends: [CWLRecordSchema, "#OutputSchema"] docParent: "#OutputParameter" specialize: - - specializeFrom: "sld:RecordField" + - specializeFrom: CWLRecordField specializeTo: OutputRecordField @@ -636,14 +267,14 @@ $graph: - name: OutputArraySchema type: record - extends: ["sld:ArraySchema", OutputSchema] + extends: [CWLArraySchema, OutputSchema] docParent: "#OutputParameter" specialize: - - specializeFrom: "sld:RecordSchema" + - specializeFrom: CWLRecordSchema specializeTo: OutputRecordSchema - specializeFrom: "sld:EnumSchema" specializeTo: OutputEnumSchema - - specializeFrom: "sld:ArraySchema" + - specializeFrom: CWLArraySchema specializeTo: OutputArraySchema - specializeFrom: "sld:PrimitiveType" specializeTo: CWLType @@ -673,6 +304,7 @@ $graph: _id: cwl:format _type: "@id" identity: true + noLinkCheck: true doc: | Only valid when `type: File` or is an array of `items: File`. @@ -690,9 +322,10 @@ $graph: into a concrete form for execution, such as command line parameters. - name: default - type: Any? + type: CWLObjectType? jsonldPredicate: - _id: cwl:default + _id: "cwl:default" + _container: "@list" noLinkCheck: true doc: | The default value to use for this parameter if the parameter is missing @@ -745,6 +378,7 @@ $graph: _id: cwl:format _type: "@id" identity: true + noLinkCheck: true doc: | Only valid when `type: File` or is an array of `items: File`. @@ -824,7 +458,10 @@ $graph: error and the implementation must not attempt to run the process, unless overridden at user option. - name: hints - type: Any[]? + type: + - "null" + - type: array + items: [ProcessRequirement, Any] doc: | Declares hints applying to either the runtime environment or the workflow engine that may be helpful in executing this process. It is @@ -860,7 +497,11 @@ $graph: interpolatation. fields: - name: class - type: string + type: + type: enum + name: InlineJavascriptRequirement_class + symbols: + - cwl:InlineJavascriptRequirement doc: "Always 'InlineJavascriptRequirement'" jsonldPredicate: "_id": "@type" @@ -886,7 +527,11 @@ $graph: to earlier schema definitions. fields: - name: class - type: string + type: + type: enum + name: SchemaDefRequirement_class + symbols: + - cwl:SchemaDefRequirement doc: "Always 'SchemaDefRequirement'" jsonldPredicate: "_id": "@type" diff --git a/v1.0/Workflow.yml b/v1.0/Workflow.yml index 83fa926e..e410de8d 100644 --- a/v1.0/Workflow.yml +++ b/v1.0/Workflow.yml @@ -137,7 +137,11 @@ $graph: jsonldPredicate: "_id": "@type" "_type": "@vocab" - type: string + type: + type: enum + name: ExpressionTool_class + symbols: + - cwl:ExpressionTool - name: expression type: [string, Expression] doc: | @@ -169,7 +173,7 @@ $graph: jsonldPredicate: "_id": "cwl:outputSource" "_type": "@id" - refScope: 0 + refScope: 1 type: - string? - string[]? @@ -277,13 +281,14 @@ $graph: jsonldPredicate: "@id" doc: "A unique identifier for this workflow input parameter." - name: default - type: ["null", Any] + type: CWLObjectType? doc: | The default value for this parameter to use if either there is no `source` field, or the value produced by the `source` is `null`. The default must be applied prior to scattering or evaluating `valueFrom`. jsonldPredicate: _id: "cwl:default" + _container: "@list" noLinkCheck: true - name: valueFrom type: @@ -305,7 +310,7 @@ $graph: 1. `null` if there is no `source` field 2. the value of the parameter(s) specified in the `source` field when this workflow input parameter **is not** specified in this workflow step's `scatter` field. - 3. an element of the parameter specified in the `source` field when this workflow input + 3. an element of the parameter specified in the `source` field when this workflow input parameter **is** specified in this workflow step's `scatter` field. The value of `inputs` in the parameter reference or expression must be @@ -547,7 +552,11 @@ $graph: jsonldPredicate: "_id": "@type" "_type": "@vocab" - type: string + type: + type: enum + name: Workflow_class + symbols: + - cwl:Workflow - name: steps doc: | The individual steps that make up the workflow. Each step is executed when all of its @@ -569,7 +578,11 @@ $graph: the `run` field of [WorkflowStep](#WorkflowStep). fields: - name: "class" - type: "string" + type: + type: enum + name: SubworkflowFeatureRequirement_class + symbols: + - cwl:SubworkflowFeatureRequirement doc: "Always 'SubworkflowFeatureRequirement'" jsonldPredicate: "_id": "@type" @@ -583,7 +596,11 @@ $graph: `scatterMethod` fields of [WorkflowStep](#WorkflowStep). fields: - name: "class" - type: "string" + type: + type: enum + name: ScatterFeatureRequirement_class + symbols: + - cwl:ScatterFeatureRequirement doc: "Always 'ScatterFeatureRequirement'" jsonldPredicate: "_id": "@type" @@ -597,7 +614,11 @@ $graph: listed in the `source` field of [WorkflowStepInput](#WorkflowStepInput). fields: - name: "class" - type: "string" + type: + type: enum + name: MultipleInputFeatureRequirement_class + symbols: + - cwl:MultipleInputFeatureRequirement doc: "Always 'MultipleInputFeatureRequirement'" jsonldPredicate: "_id": "@type" @@ -611,7 +632,11 @@ $graph: of [WorkflowStepInput](#WorkflowStepInput). fields: - name: "class" - type: "string" + type: + type: enum + name: StepInputExpressionRequirement_class + symbols: + - cwl:StepInputExpressionRequirement doc: "Always 'StepInputExpressionRequirement'" jsonldPredicate: "_id": "@type" diff --git a/v1.0/extensions.yml b/v1.0/extensions.yml new file mode 100644 index 00000000..b855dd9e --- /dev/null +++ b/v1.0/extensions.yml @@ -0,0 +1,250 @@ +$base: http://commonwl.org/cwltool# +$namespaces: + cwl: "https://w3id.org/cwl/cwl#" + cwltool: "http://commonwl.org/cwltool#" +$graph: +- $import: https://github.com/common-workflow-language/common-workflow-language/raw/codegen/v1.0/CommonWorkflowLanguage.yml + +- name: LoadListingRequirement + type: record + extends: cwl:ProcessRequirement + inVocab: false + fields: + class: + type: string + doc: "Always 'LoadListingRequirement'" + jsonldPredicate: + "_id": "@type" + "_type": "@vocab" + loadListing: + type: + - type: enum + name: LoadListingEnum + symbols: [no_listing, shallow_listing, deep_listing] + +- name: InplaceUpdateRequirement + type: record + inVocab: false + extends: cwl:ProcessRequirement + fields: + class: + type: string + doc: "Always 'InplaceUpdateRequirement'" + jsonldPredicate: + "_id": "@type" + "_type": "@vocab" + inplaceUpdate: + type: boolean + +- name: Secrets + type: record + inVocab: false + extends: cwl:ProcessRequirement + fields: + class: + type: string + doc: "Always 'Secrets'" + jsonldPredicate: + "_id": "@type" + "_type": "@vocab" + secrets: + type: string[] + doc: | + List one or more input parameters that are sensitive (such as passwords) + which will be deliberately obscured from logging. + jsonldPredicate: + "_type": "@id" + refScope: 0 + + +- name: TimeLimit + type: record + inVocab: false + extends: cwl:ProcessRequirement + doc: | + Set an upper limit on the execution time of a CommandLineTool or + ExpressionTool. A tool execution which exceeds the time limit may + be preemptively terminated and considered failed. May also be + used by batch systems to make scheduling decisions. + fields: + - name: class + type: string + doc: "Always 'TimeLimit'" + jsonldPredicate: + "_id": "@type" + "_type": "@vocab" + - name: timelimit + type: [long, string] + doc: | + The time limit, in seconds. A time limit of zero means no + time limit. Negative time limits are an error. + + +- name: WorkReuse + type: record + inVocab: false + extends: cwl:ProcessRequirement + doc: | + For implementations that support reusing output from past work (on + the assumption that same code and same input produce same + results), control whether to enable or disable the reuse behavior + for a particular tool or step (to accommodate situations where that + assumption is incorrect). A reused step is not executed but + instead returns the same output as the original execution. + + If `enableReuse` is not specified, correct tools should assume it + is enabled by default. + fields: + - name: class + type: string + doc: "Always 'WorkReuse'" + jsonldPredicate: + "_id": "@type" + "_type": "@vocab" + - name: enableReuse + type: [boolean, string] + #default: true + + +- name: NetworkAccess + type: record + inVocab: false + extends: cwl:ProcessRequirement + doc: | + Indicate whether a process requires outgoing IPv4/IPv6 network + access. Choice of IPv4 or IPv6 is implementation and site + specific, correct tools must support both. + + If `networkAccess` is false or not specified, tools must not + assume network access, except for localhost (the loopback device). + + If `networkAccess` is true, the tool must be able to make outgoing + connections to network resources. Resources may be on a private + subnet or the public Internet. However, implementations and sites + may apply their own security policies to restrict what is + accessible by the tool. + + Enabling network access does not imply a publicly routable IP + address or the ability to accept inbound connections. + + fields: + - name: class + type: string + doc: "Always 'NetworkAccess'" + jsonldPredicate: + "_id": "@type" + "_type": "@vocab" + - name: networkAccess + type: [boolean, string] + +- name: ProcessGenerator + type: record + inVocab: true + extends: cwl:Process + documentRoot: true + fields: + - name: class + jsonldPredicate: + "_id": "@type" + "_type": "@vocab" + type: string + - name: run + type: [string, cwl:Process] + jsonldPredicate: + _id: "cwl:run" + _type: "@id" + doc: | + Specifies the process to run. + +- name: MPIRequirement + type: record + inVocab: false + extends: cwl:ProcessRequirement + doc: | + Indicates that a process requires an MPI runtime. + fields: + - name: class + type: string + doc: "Always 'MPIRequirement'" + jsonldPredicate: + "_id": "@type" + "_type": "@vocab" + - name: processes + type: [int, cwl:Expression] + doc: | + The number of MPI processes to start. If you give a string, + this will be evaluated as a CWL Expression and it must + evaluate to an integer. + +- name: CUDARequirement + type: record + extends: cwl:ProcessRequirement + inVocab: false + doc: | + Require support for NVIDA CUDA (GPU hardware acceleration). + fields: + class: + type: string + doc: 'cwltool:CUDARequirement' + jsonldPredicate: + _id: "@type" + _type: "@vocab" + cudaVersionMin: + type: string + doc: | + Minimum CUDA version to run the software, in X.Y format. This + corresponds to a CUDA SDK release. When running directly on + the host (not in a container) the host must have a compatible + CUDA SDK (matching the exact version, or, starting with CUDA + 11.3, matching major version). When run in a container, the + container image should provide the CUDA runtime, and the host + driver is injected into the container. In this case, because + CUDA drivers are backwards compatible, it is possible to + use an older SDK with a newer driver across major versions. + + See https://docs.nvidia.com/deploy/cuda-compatibility/ for + details. + cudaComputeCapability: + type: + - 'string' + - 'string[]' + doc: | + CUDA hardware capability required to run the software, in X.Y + format. + + * If this is a single value, it defines only the minimum + compute capability. GPUs with higher capability are also + accepted. + + * If it is an array value, then only select GPUs with compute + capabilities that explicitly appear in the array. + cudaDeviceCountMin: + type: ['null', int, cwl:Expression] + default: 1 + doc: | + Minimum number of GPU devices to request. If not specified, + same as `cudaDeviceCountMax`. If neither are specified, + default 1. + cudaDeviceCountMax: + type: ['null', int, cwl:Expression] + doc: | + Maximum number of GPU devices to request. If not specified, + same as `cudaDeviceCountMin`. +- name: ShmSize + type: record + extends: cwl:ProcessRequirement + inVocab: false + fields: + class: + type: string + doc: 'cwltool:ShmSize' + jsonldPredicate: + "_id": "@type" + "_type": "@vocab" + shmSize: + type: string + doc: | + Size of /dev/shm. The format is ``. must be greater + than 0. Unit is optional and can be `b` (bytes), `k` (kilobytes), `m` + (megabytes), or `g` (gigabytes). If you omit the unit, the default is + bytes. If you omit the size entirely, the value is `64m`." diff --git a/v1.0/salad/schema_salad/metaschema/metaschema.yml b/v1.0/salad/schema_salad/metaschema/metaschema.yml index 28b9e662..f696e0ae 100644 --- a/v1.0/salad/schema_salad/metaschema/metaschema.yml +++ b/v1.0/salad/schema_salad/metaschema/metaschema.yml @@ -20,6 +20,7 @@ $graph: - $include: import_include.md - $import: map_res.yml - $import: typedsl_res.yml + - $import: sfdsl_res.yml - name: "Link_Validation" type: documentation @@ -35,10 +36,40 @@ $graph: `noLinkCheck` in the `jsonldPredicate` section of the field schema. -- name: "Schema_validation" +- name: "Schema_Validation" type: documentation - doc: "" - + doc: | + # Validating a document against a schema + + To validate a document against the schema, first [apply + preprocessing](#Document_preprocessing), then, use the following + algorithm. + + 1. The document root must be an object or a list. If the document root is an + object containing the field `$graph` (which must be a list of + objects), then validation applies to each object in the list. + 2. For each object, attempt to validate as one of the record types + flagged with `documentRoot: true`. + 3. To validate a record, go through `fields` and recursively + validate each field of the object. + 4. For fields with a list of types (type union), go through each + type in the list and recursively validate the type. For the + field to be valid, at least one type in the union must be valid. + 5. Missing fields are considered `null`. To validate, the allowed types + for the field must include `null` + 6. Primitive types are null, boolean, int, long, float, double, + string. To validate, the value in the document must have one + of these type. For numerics, the value appearing in the + document must fit into the specified type. + 7. To validate an array, the value in the document must be a list, + and each item in the list must recursively validate as a type + in `items`. + 8. To validate an enum, the value in the document be a string, and + the value must be equal to the short name of one of the values + listed in `symbols`. + 9. As a special case, a field with the `Expression` type validates string values + which contain a CWL parameter reference or expression in the form + `$(...)` or `${...}` # - name: "JSON_LD_Context" # type: documentation @@ -123,7 +154,7 @@ $graph: then finally `#foo`. The first valid URI in the search order shall be used as the fully resolved value of the identifier. The value of the refScope field is the specified number of levels from the containing - identifer scope before starting the search, so if `refScope: 2` then + identifier scope before starting the search, so if `refScope: 2` then "baz" and "bar" must be stripped to get the base `#foo` and search `#foo/foo` and the `#foo`. The last scope searched must be the top level scope before determining if the identifier cannot be resolved. @@ -131,7 +162,15 @@ $graph: type: boolean? doc: | Field must be expanded based on the the Schema Salad type DSL. - + - name: secondaryFilesDSL + type: boolean? + doc: | + Field must be expanded based on the the Schema Salad secondary file DSL. + - name: subscope + type: string? + doc: | + Append the subscope to the current scope when performing + identifier resolution to objects under this field. - name: SpecializeDef type: record @@ -164,24 +203,26 @@ $graph: doc: "The identifier for this type" - name: inVocab type: boolean? + default: true doc: | - By default or if "true", include the short name of this type in the - vocabulary (the keys of the JSON-LD context). If false, do not include - the short name in the vocabulary. + If "true" (the default), include the short name of this type + in the vocabulary. The vocabulary are all the symbols (field + names and other identifiers, such as classes and enum values) + which can be used in the document without a namespace prefix. + These are the keys of the JSON-LD context. If false, do not + include the short name in the vocabulary. + + This is useful for specifying schema extensions that will be + included in validation without introducing ambiguity by + introducing non-standard terms into the vocabulary. - name: DocType type: record + extends: Documented abstract: true docParent: "#Schema" fields: - - name: doc - type: - - string? - - string[]? - doc: "A documentation string for this type, or an array of strings which should be concatenated." - jsonldPredicate: "rdfs:comment" - - name: docParent type: string? doc: | @@ -233,6 +274,7 @@ $graph: doc: | If true, indicates that the type is a valid at the document root. At least one type in a schema must be tagged with `documentRoot: true`. + jsonldPredicate: sld:documentRoot - name: SaladRecordField @@ -247,6 +289,13 @@ $graph: doc: | Annotate this type with linked data context. jsonldPredicate: "sld:jsonldPredicate" + - name: default + type: Any? + jsonldPredicate: + _id: sld:default + noLinkCheck: true + doc: | + The default value to use for this field if the field is missing or "null". - name: SaladRecordSchema @@ -261,7 +310,8 @@ $graph: type: boolean? doc: | If true, this record is abstract and may be used as a base for other - records, but is not valid on its own. + records, but is not valid on its own. Inherited fields may be + re-specified to narrow their type. - name: extends type: @@ -273,7 +323,7 @@ $graph: refScope: 1 doc: | Indicates that this record inherits fields from one or more base records. - + Inherited fields may be re-specified to narrow their type. - name: specialize type: - SpecializeDef[]? @@ -290,7 +340,7 @@ $graph: - name: SaladEnumSchema docParent: "#Schema" type: record - extends: [EnumSchema, SchemaDefinedType] + extends: [NamedType, EnumSchema, SchemaDefinedType] documentRoot: true doc: | Define an enumerated type. @@ -307,6 +357,31 @@ $graph: Indicates that this enum inherits symbols from a base enum. +- name: SaladMapSchema + docParent: "#Schema" + type: record + extends: [NamedType, MapSchema, SchemaDefinedType] + documentRoot: true + doc: | + Define a map type. + + +- name: SaladUnionSchema + docParent: "#Schema" + type: record + extends: [NamedType, UnionSchema, DocType] + documentRoot: true + doc: | + Define a union type. + fields: + - name: documentRoot + type: boolean? + doc: | + If true, indicates that the type is a valid at the document root. At + least one type in a schema must be tagged with `documentRoot: true`. + jsonldPredicate: sld:documentRoot + + - name: Documentation type: record docParent: "#Schema" @@ -319,8 +394,8 @@ $graph: - name: type doc: "Must be `documentation`" type: - name: Documentation_symbol type: enum + name: Documentation_name symbols: - "sld:documentation" jsonldPredicate: diff --git a/v1.0/salad/schema_salad/metaschema/metaschema_base.yml b/v1.0/salad/schema_salad/metaschema/metaschema_base.yml index d8bf0a3c..3bdf6390 100644 --- a/v1.0/salad/schema_salad/metaschema/metaschema_base.yml +++ b/v1.0/salad/schema_salad/metaschema/metaschema_base.yml @@ -14,6 +14,19 @@ $graph: doc: | # Schema +- name: Documented + type: record + abstract: true + docParent: "#Schema" + fields: + - name: doc + type: + - string? + - string[]? + doc: "A documentation string for this object, or an array of strings which should be concatenated." + jsonldPredicate: "rdfs:comment" + + - name: PrimitiveType type: enum symbols: @@ -26,8 +39,9 @@ $graph: - "xsd:string" doc: - | - Salad data types are based on Avro schema declarations. Refer to the - [Avro schema declaration documentation](https://avro.apache.org/docs/current/spec.html#schemas) for + Names of salad data types (based on Avro schema declarations). + + Refer to the [Avro schema declaration documentation](https://avro.apache.org/docs/current/spec.html#schemas) for detailed information. - "null: no value" - "boolean: a binary value" @@ -48,6 +62,7 @@ $graph: - name: RecordField type: record + extends: Documented doc: A field of a record. fields: - name: name @@ -56,18 +71,14 @@ $graph: doc: | The name of the field - - name: doc - type: string? - doc: | - A documentation string for this field - jsonldPredicate: "rdfs:comment" - - name: type type: - PrimitiveType - RecordSchema - EnumSchema - ArraySchema + - MapSchema + - UnionSchema - string - type: array items: @@ -75,6 +86,8 @@ $graph: - RecordSchema - EnumSchema - ArraySchema + - MapSchema + - UnionSchema - string jsonldPredicate: _id: sld:type @@ -82,7 +95,9 @@ $graph: typeDSL: true refScope: 2 doc: | - The field type + The field type. If it is an array, it indicates + that the field type is a union type of its elements. + Its elements may be duplicated. - name: RecordSchema @@ -91,8 +106,8 @@ $graph: type: doc: "Must be `record`" type: - name: Record_symbol type: enum + name: Record_name symbols: - "sld:record" jsonldPredicate: @@ -117,8 +132,8 @@ $graph: type: doc: "Must be `enum`" type: - name: Enum_symbol type: enum + name: Enum_name symbols: - "sld:enum" jsonldPredicate: @@ -126,6 +141,9 @@ $graph: _type: "@vocab" typeDSL: true refScope: 2 + name: + type: string? + jsonldPredicate: "@id" symbols: type: string[] jsonldPredicate: @@ -141,8 +159,8 @@ $graph: type: doc: "Must be `array`" type: - name: Array_symbol type: enum + name: Array_name symbols: - "sld:array" jsonldPredicate: @@ -156,6 +174,8 @@ $graph: - RecordSchema - EnumSchema - ArraySchema + - MapSchema + - UnionSchema - string - type: array items: @@ -163,9 +183,91 @@ $graph: - RecordSchema - EnumSchema - ArraySchema + - MapSchema + - UnionSchema - string jsonldPredicate: _id: "sld:items" _type: "@vocab" refScope: 2 doc: "Defines the type of the array elements." + + +- name: MapSchema + type: record + fields: + type: + doc: "Must be `map`" + type: + type: enum + name: Map_name + symbols: + - "sld:map" + jsonldPredicate: + _id: "sld:type" + _type: "@vocab" + typeDSL: true + refScope: 2 + values: + type: + - PrimitiveType + - RecordSchema + - EnumSchema + - ArraySchema + - MapSchema + - UnionSchema + - string + - type: array + items: + - PrimitiveType + - RecordSchema + - EnumSchema + - ArraySchema + - MapSchema + - UnionSchema + - string + jsonldPredicate: + _id: "sld:values" + _type: "@vocab" + refScope: 2 + doc: "Defines the type of the map elements." + + +- name: UnionSchema + type: record + fields: + type: + doc: "Must be `union`" + type: + type: enum + name: Union_name + symbols: + - "sld:union" + jsonldPredicate: + _id: "sld:type" + _type: "@vocab" + typeDSL: true + refScope: 2 + names: + type: + - PrimitiveType + - RecordSchema + - EnumSchema + - ArraySchema + - MapSchema + - UnionSchema + - string + - type: array + items: + - PrimitiveType + - RecordSchema + - EnumSchema + - ArraySchema + - MapSchema + - UnionSchema + - string + jsonldPredicate: + _id: "sld:names" + _type: "@vocab" + refScope: 2 + doc: "Defines the type of the union elements." diff --git a/v1.0/salad/schema_salad/metaschema/salad.md b/v1.0/salad/schema_salad/metaschema/salad.md index 5fcd2beb..7c53c997 100644 --- a/v1.0/salad/schema_salad/metaschema/salad.md +++ b/v1.0/salad/schema_salad/metaschema/salad.md @@ -2,13 +2,15 @@ Author: -* Peter Amstutz, Curoverse (now ) +* Peter Amstutz , Curii Corporation Contributors: * The developers of Apache Avro * The developers of JSON-LD * Nebojša Tijanić , Seven Bridges Genomics +* Michael R. Crusoe, ELIXIR-DE +* Iacopo Colonnelli, University of Torino # Abstract @@ -70,17 +72,38 @@ and RDF schema, and production of RDF triples by applying the JSON-LD context. The schema language also provides for robust support of inline documentation. -## Introduction to v1.0 +## Introduction to v1.1 -This is the second version of of the Schema Salad specification. It is -developed concurrently with v1.0 of the Common Workflow Language for use in +This is the third version of the Schema Salad specification. It is +developed concurrently with v1.1 of the Common Workflow Language for use in specifying the Common Workflow Language, however Schema Salad is intended to be -useful to a broader audience. Compared to the draft-1 schema salad +useful to a broader audience. Compared to the v1.0 schema salad specification, the following changes have been made: -* Use of [mapSubject and mapPredicate](#Identifier_maps) to transform maps to lists of records. -* Resolution of the [domain Specific Language for types](#Domain_Specific_Language_for_types) -* Consolidation of the formal [schema into section 5](#Schema). +* Support for `default` values on record fields to specify default values +* Add subscoped fields (fields which introduce a new inner scope for identifiers) +* Add the *inVocab* flag (default true) to indicate if a type is added to the vocabulary of well known terms or must be prefixed +* Add *secondaryFilesDSL* micro DSL (domain specific language) to convert text strings to a secondaryFiles record type used in CWL +* The `$mixin` feature has been removed from the specification, as it + is poorly documented, not included in conformance testing, + and not widely supported. + +## Introduction to v1.2 + +This is the fourth version of the Schema Salad specification. It was created to +ease the development of extensions to CWL v1.2. The only change is that +inherited records can narrow the types of fields if those fields are re-specified +with a matching jsonldPredicate. + +## Introduction to v1.3 + +This is the fifth version of the Schema Salad specification. It was created to +enhance code generation by representing CWL data types as specific Python objects +(instead of relying on the generic `Any` type). The following changes have been made: + +* Support for the Avro `map` schema +* Add named versions of the `map` and `union` Avro types +* Support for nested named `union` type definitions ## References to Other Specifications @@ -88,7 +111,7 @@ specification, the following changes have been made: **JSON Linked Data (JSON-LD)**: http://json-ld.org -**YAML**: http://yaml.org +**YAML**: https://yaml.org/spec/1.2/spec.html **Avro**: https://avro.apache.org/docs/current/spec.html @@ -109,7 +132,7 @@ the behavior of conforming implementations. The terminology used to describe Salad documents is defined in the Concepts section of the specification. The terms defined in the following list are -used in building those definitions and in describing the actions of an +used in building those definitions and in describing the actions of a Salad implementation: **may**: Conforming Salad documents and Salad implementations are permitted but @@ -157,11 +180,19 @@ by a document schema, where each term maps to absolute URI. ## Syntax -Conforming Salad documents are serialized and loaded using YAML syntax and -UTF-8 text encoding. Salad documents are written using the JSON-compatible -subset of YAML. Features of YAML such as headers and type tags that are -not found in the standard JSON data model must not be used in conforming -Salad documents. It is a fatal error if the document is not valid YAML. +Conforming Salad v1.1 documents are serialized and loaded using a +subset of YAML 1.2 syntax and UTF-8 text encoding. Salad documents +are written using the [JSON-compatible subset of YAML described in +section 10.2](https://yaml.org/spec/1.2/spec.html#id2803231). The +following features of YAML must not be used in conforming Salad +documents: + +* Use of explicit node tags with leading `!` or `!!` +* Use of anchors with leading `&` and aliases with leading `*` +* %YAML directives +* %TAG directives + +It is a fatal error if the document is not valid YAML. A Salad document must consist only of either a single root object or an array of objects. @@ -223,9 +254,9 @@ document schema. A schema may consist of: * Any number of documentation objects which allow in-line documentation of the schema. The schema for defining a salad schema (the metaschema) is described in -detail in "Schema validation". +detail in the [Schema](#Schema) section. -### Record field annotations +## Record field annotations In a document schema, record field definitions may include the field `jsonldPredicate`, which may be either a string or object. Implementations @@ -235,21 +266,57 @@ rules: * If the value of `jsonldPredicate` is `@id`, the field is an identifier field. - * If the value of `jsonldPredicate` is an object, and contains that - object contains the field `_type` with the value `@id`, the field is a - link field. + * If the value of `jsonldPredicate` is an object, and that + object contains the field `_type` with the value `@id`, the + field is a link field. If the field `jsonldPredicate` also + has the field `identity` with the value `true`, the field is + resolved with [identifier resolution](#Identifier_resolution). + Otherwise it is resolved with [link resolution](#Link_resolution). - * If the value of `jsonldPredicate` is an object, and contains that - object contains the field `_type` with the value `@vocab`, the field is a - vocabulary field, which is a subtype of link field. + * If the value of `jsonldPredicate` is an object which contains the + field `_type` with the value `@vocab`, the field value is subject to + [vocabulary resolution](#Vocabulary_resolution). ## Document traversal -To perform document document preprocessing, link validation and schema +To perform document preprocessing, link validation and schema validation, the document must be traversed starting from the fields or array items of the root object or array and recursively visiting each child item which contains an object or arrays. +## Short names + +The "short name" of a fully qualified identifier is the portion of +the identifier following the final slash `/` of either the fragment +identifier following `#` or the path portion, if there is no fragment. +Some examples: + +* the short name of `http://example.com/foo` is `foo` +* the short name of `http://example.com/#bar` is `bar` +* the short name of `http://example.com/foo/bar` is `bar` +* the short name of `http://example.com/foo#bar` is `bar` +* the short name of `http://example.com/#foo/bar` is `bar` +* the short name of `http://example.com/foo#bar/baz` is `baz` + +## Inheritance and specialization + +A record definition may inherit from one or more record definitions +with the `extends` field. This copies the fields defined in the +parent record(s) as the base for the new record. A record definition +may `specialize` type declarations of the fields inherited from the +base record. For each field inherited from the base record, any +instance of the type in `specializeFrom` is replaced with the type in +`specializeTo`. The type in `specializeTo` should extend from the +type in `specializeFrom`. + +A record definition may be `abstract`. This means the record +definition is not used for validation on its own, but may be extended +by other definitions. If an abstract type appears in a field +definition, it is logically replaced with a union of all concrete +subtypes of the abstract type. In other words, the field value does +not validate as the abstract type, but must validate as some concrete +type that inherits from the abstract type. + # Document preprocessing After processing the explicit context (if any), document preprocessing