Create a safety case

A safety case is a document intended for human readers that compiles and presents the evidence of the safety (or un-safety) of a system for a particular use case and context. Safety cases are modeled by SafetyCase resources.

Safety cases are scoped to a single AI/ML system. A safety case typically will include many related Measurements of the system produced using various input datasets and analysis methods. Safety cases in Dyff are rendered as HTML documents containing text, tables, charts, and other graphics.

Implementation: Jupyter notebook

Safety cases are generated using an analysis workflow that is very similar to the workflow for generating Measurements. The key difference is that Methods that generate safety cases are implemented as Jupyter notebooks . The Dyff Platform runs the Jupyter notebook, renders the output cells as HTML, and serves the generated HTML at a designated endpoint.

Conceptually, a notebook that generates a safety case is very similar to a Python function that generates a measurement. You can think of the notebook as a function that maps input datasets to an HTML document. The key difference is that notebooks don’t accept arguments in the same way that Python functions do, so we need a different mechanism to pass data into the notebook.

Warning

Any output generated by your notebook will be visible to anyone who has permission to view the corresponding safety cases. Be careful not to leak sensitive information such as PII or private labels in the output.

The AnalysisContext

The first thing you should do in your notebook is to instantiate the dyff.audit.analysis.AnalysisContext class:

from dyff.audit.analysis import AnalysisContext

ctx = AnalysisContext()

The context instance allows you to access the inputs to the notebook:

# Access arguments
category: str = ctx.get_argument("category")
temperature: float = float(ctx.get_argument("temperature"))

# Access input data
dataset: pyarrow.dataset.Dataset = ctx.open_input_dataset("dataset")

The available arguments and input datasets are defined in the Method specification resource associated with the notebook. When you create a safety case resource that references this method, the Dyff Platform binds the specified values and data inputs to the specified names.

You can now proceed with all of the usual Jupyter notebook activities — manipulating data, embedding charts, creating formatted text, etc.

Deploying and running the notebook

The process of deploying a notebook is just like the process of deploying an analysis implemented as a Python function. You need to create three resources:

  1. A Module containing the notebook code (the .ipynb file).

  2. A Method that describes the method and its inputs and outputs, and references the Module from step (1).

  3. A Measurement that references the Method from step (2) and specifies the IDs of specific resources to pass as inputs.

Create a Module

Assuming you’ve implemented your notebook in a file called my-notebook.ipynb in the directory /home/me/dyff/my-notebook, you would create and upload the package like this:

 1# SPDX-FileCopyrightText: 2024 UL Research Institutes
 2# SPDX-License-Identifier: Apache-2.0
 3
 4from __future__ import annotations
 5
 6from pathlib import Path
 7
 8from dyff.audit.local import DyffLocalPlatform
 9from dyff.schema.platform import *
10from dyff.schema.requests import *
11
12ACCOUNT: str = ...
13ROOT_DIR: Path = Path("/home/me/dyff")
14
15# Develop using the local platform
16dyffapi = DyffLocalPlatform(
17    storage_root=ROOT_DIR / ".dyff-local",
18)
19# When you're ready, switch to the remote platform:
20# dyffapi = Client(...)
21
22module_root = str(ROOT_DIR / "my-notebook")
23module = dyffapi.modules.create_package(
24    module_root,
25    account=ACCOUNT,
26    name="my-notebook",
27)
28dyffapi.modules.upload_package(module, module_root)
29print(module.json(indent=2))
30

Create a Method

The Method resource specifies the inputs and outputs:

31method_description = """
32# Summary
33
34Visualizes the relationship between mean word length in prompts and system
35completions. The description uses [Markdown](https://www.markdownguide.org) syntax.
36"""
37method_request = MethodCreateRequest(
38    name="mean-word-length-notebook",
39    # The notebook analyzes multiple measurements of the same system
40    scope=MethodScope.InferenceService,
41    description=method_description,
42    # The method is implemented as a Jupyter notebook
43    implementation=MethodImplementation(
44        kind=MethodImplementationKind.JupyterNotebook,
45        jupyterNotebook=MethodImplementationJupyterNotebook(
46            notebookModule=module.id,
47            # The path to the notebook file, relative to the module root directory
48            notebookPath="my-notebook.ipynb",
49        ),
50    ),
51    # The method accepts one argument called 'threshold'
52    parameters=[
53        MethodParameter(keyword="threshold", description="(float) A numeric threshold"),
54    ],
55    # The method accepts two PyArrow datasets as inputs:
56    # - The one called 'easy' is a Measurement on the "easy" dataset
57    # - The one called 'hard' is a Measurement on the "hard" dataset
58    inputs=[
59        MethodInput(kind=MethodInputKind.Measurement, keyword="easy"),
60        MethodInput(kind=MethodInputKind.Measurement, keyword="hard"),
61    ],
62    # The method produces a SafetyCase
63    output=MethodOutput(
64        kind=MethodOutputKind.SafetyCase,
65        safetyCase=SafetyCaseSpec(
66            name="mean-word-length-safetycase",
67            description="This is also **Markdown**.",
68        ),
69    ),
70    # The Module containing the Method code
71    modules=[module.id],
72    account=ACCOUNT,
73)
74method = dyffapi.methods.create(method_request)
75print(method.json(indent=2))
76

For this notebook, both of the inputs are Measurements — they could also be Datasets, Evaluations, or Reports. Here, we want to analyze a Measurement computed using the same Method but applied to two different input datasets, called easy and hard.

Create a SafetyCase

The SafetyCase resource represents the computational work needed to run your notebook on specific inputs. You use the same AnalysisCreateRequest class that is used when creating Measurements:

77easy_measurement_id: str = ...
78hard_measurement_id: str = ...
79analysis_request = AnalysisCreateRequest(
80    account=ACCOUNT,
81    method=method.id,
82    arguments=[
83        AnalysisArgument(keyword="threshold", value="1.0"),
84    ],
85    inputs=[
86        AnalysisInput(keyword="easy", entity=easy_measurement_id),
87        AnalysisInput(keyword="hard", entity=hard_measurement_id),
88    ],
89)
90safetycase = dyffapi.safetycases.create(analysis_request)
91print(safetycase.json(indent=2))

Full Example

 1# SPDX-FileCopyrightText: 2024 UL Research Institutes
 2# SPDX-License-Identifier: Apache-2.0
 3
 4from __future__ import annotations
 5
 6from pathlib import Path
 7
 8from dyff.audit.local import DyffLocalPlatform
 9from dyff.schema.platform import *
10from dyff.schema.requests import *
11
12ACCOUNT: str = ...
13ROOT_DIR: Path = Path("/home/me/dyff")
14
15# Develop using the local platform
16dyffapi = DyffLocalPlatform(
17    storage_root=ROOT_DIR / ".dyff-local",
18)
19# When you're ready, switch to the remote platform:
20# dyffapi = Client(...)
21
22module_root = str(ROOT_DIR / "my-notebook")
23module = dyffapi.modules.create_package(
24    module_root,
25    account=ACCOUNT,
26    name="my-notebook",
27)
28dyffapi.modules.upload_package(module, module_root)
29print(module.json(indent=2))
30
31method_description = """
32# Summary
33
34Visualizes the relationship between mean word length in prompts and system
35completions. The description uses [Markdown](https://www.markdownguide.org) syntax.
36"""
37method_request = MethodCreateRequest(
38    name="mean-word-length-notebook",
39    # The notebook analyzes multiple measurements of the same system
40    scope=MethodScope.InferenceService,
41    description=method_description,
42    # The method is implemented as a Jupyter notebook
43    implementation=MethodImplementation(
44        kind=MethodImplementationKind.JupyterNotebook,
45        jupyterNotebook=MethodImplementationJupyterNotebook(
46            notebookModule=module.id,
47            # The path to the notebook file, relative to the module root directory
48            notebookPath="my-notebook.ipynb",
49        ),
50    ),
51    # The method accepts one argument called 'threshold'
52    parameters=[
53        MethodParameter(keyword="threshold", description="(float) A numeric threshold"),
54    ],
55    # The method accepts two PyArrow datasets as inputs:
56    # - The one called 'easy' is a Measurement on the "easy" dataset
57    # - The one called 'hard' is a Measurement on the "hard" dataset
58    inputs=[
59        MethodInput(kind=MethodInputKind.Measurement, keyword="easy"),
60        MethodInput(kind=MethodInputKind.Measurement, keyword="hard"),
61    ],
62    # The method produces a SafetyCase
63    output=MethodOutput(
64        kind=MethodOutputKind.SafetyCase,
65        safetyCase=SafetyCaseSpec(
66            name="mean-word-length-safetycase",
67            description="This is also **Markdown**.",
68        ),
69    ),
70    # The Module containing the Method code
71    modules=[module.id],
72    account=ACCOUNT,
73)
74method = dyffapi.methods.create(method_request)
75print(method.json(indent=2))
76
77easy_measurement_id: str = ...
78hard_measurement_id: str = ...
79analysis_request = AnalysisCreateRequest(
80    account=ACCOUNT,
81    method=method.id,
82    arguments=[
83        AnalysisArgument(keyword="threshold", value="1.0"),
84    ],
85    inputs=[
86        AnalysisInput(keyword="easy", entity=easy_measurement_id),
87        AnalysisInput(keyword="hard", entity=hard_measurement_id),
88    ],
89)
90safetycase = dyffapi.safetycases.create(analysis_request)
91print(safetycase.json(indent=2))