[go: up one dir, main page]

Categories:

System functions (Control)

EXECUTE_AI_EVALUATION

Start or get the status of a Cortex Agent evaluation run.

For more information on Cortex Agent evaluations, see Cortex Agent evaluations.

See also:

GET_AI_RECORD_TRACE (SNOWFLAKE.LOCAL) , GET_AI_EVALUATION_DATA (SNOWFLAKE.LOCAL) , GET_AI_OBSERVABILITY_LOGS (SNOWFLAKE.LOCAL)

Syntax

EXECUTE_AI_EVALUATION( <evaluation_job> , <run_parameters> , <config_file_path> )

Arguments

evaluation_job

One of the following values:

  • ‘START’: Starts an evaluation

  • ‘STATUS’: Retrieves the status of an evaluation

run_parameters

A SQL OBJECT value that contains the following key:

  • run_name: The name of the run to perform the evaluation_job operation on.

config_file_path

A stage file path pointing to an agent evaluation configuration. This path can’t be a signed URL. For the full configuration YAML specification, see Agent Evaluation YAML specification.

Returns

The return value of this function depends on the evaluation_job:

  • ‘START’ returns a single string message, indicating whether the SQL execution succeeded or failed.

  • ‘STATUS’ returns a table containing information on the current state of the evaluation run.

The table returned by the ‘STATUS’ evaluation job has the following columns:

Name

Type

Description

RUN_NAME

VARCHAR

The name of the evaluation run.

AGENT_NAME

VARCHAR

The (unqualified) name of the agent being evaluated.

AGENT_TYPE

VARCHAR

The type of agent being evaluated.

STATUS

VARCHAR

The current status of the evaluation run.

STATUS_DETAILS

ARRAY

An array of error messages that occured during this run.

Values in the STATUS column are one of:

Run status

Status

Description

CREATED

The run has been created but not started.

INVOCATION_IN_PROGRESS

The run invocation is in the process of generating the output and the traces.

INVOCATION_COMPLETED

The run invocation completed with all outputs and traces created.

INVOCATION_PARTIALLY_COMPLETED

The run invocation is partially completed due to failures in application invocation and trace generation.

COMPUTATION_IN_PROGRESS

The metric computation is in progress.

COMPLETED

The metric computation is completed with detailed outputs and traces.

PARTIALLY_COMPLETED

The run is partially completed due to failures during the metric computation.

CANCELLED

The run has been cancelled.

Access control requirements

For the full access control requirements to conduct a Cortex Agent evaluation, see Cortex Agent evaluatons – Access control requirements.

Examples

The following example starts a run called run-1 using the agent evaluation configuration from @eval_db.eval_schema.metrics/agent_evaluation_config.yaml:

CALL EXECUTE_AI_EVALUATION(
  'START',
  OBJECT_CONSTRUCT('run_name', 'run-1'),
  '@eval_db.eval_schema.metrics/agent_evaluation_config.yaml'
);

The following example queries the status of the evaluation run run-1 using the agent configuration from @eval_db.eval_schema.metrics/agent_evaluation_config.yaml:

CALL EXECUTE_AI_EVALUATION(
  'STATUS',
  OBJECT_CONSTRUCT('run_name', 'run-1'),
  '@eval_db.eval_schema.metrics/agent_evaluation_config.yaml'
);