Create a safety case

A safety case is a document intended for human readers that compiles and presents the evidence of the safety (or un-safety) of a single AI system for a particular use case and context. Safety cases are modeled by SafetyCase resources. They are rendered as HTML documents containing text, tables, charts, and other graphics.

Implementation: Jupyter notebook

Safety cases are generated using an analysis workflow that is very similar to the workflow for generating Measurements. The key difference is that Methods that generate safety cases are implemented as Jupyter notebooks . The Dyff Platform runs the Jupyter notebook, renders the output cells as HTML, and serves the generated HTML at a specific route.

Conceptually, a notebook that generates a safety case is very similar to a Python function that generates a measurement. You can think of the notebook as a function that maps input datasets to an HTML document. The key difference is that notebooks can’t accept arguments in the same way that Python functions do, so we need a different mechanism to pass data into the notebook.

Warning

Any output generated by your notebook will be visible to anyone who has permission to view the corresponding safety cases. Be careful not to leak sensitive information such as PII or private labels in the output.

The AnalysisContext

The first thing you must do in your notebook is to instantiate the dyff.audit.analysis.AnalysisContext class:

from dyff.audit.analysis import AnalysisContext

ctx = AnalysisContext()

The context instance allows you to access the inputs to the notebook:

# Access arguments
category: str = ctx.get_argument("category")
temperature: float = float(ctx.get_argument("temperature"))

# Access input data
dataset: pyarrow.dataset.Dataset = ctx.open_input_dataset("dataset")

The available arguments and input datasets are defined in the Method specification resource associated with the notebook. When you create a safety case resource that references this method, the Dyff Platform binds the specified values and data inputs to the specified names.

You can now proceed with all of the usual Jupyter notebook activities — manipulating data, embedding charts, creating formatted text, etc.

Styling the notebook

Dyff provides a few basic display “widgets” to help you create reports that communicate your main conclusions effectively. These are implemented as methods on the AnalysisContext object that you can call in your notebooks:

ctx = AnalysisContext()
# Use this at the top of the notebook to generate a "title" section
ctx.TitleCard(
    headline="System gives inaccurate resopnses about cookies",
    author="Flancrest Enterprises",
    summary_phrase="Often inaccurate",
    summary_text="When answering multiple-choice questions about cookie ingredients.",
)
# Conclusions call out specific "take-away" messages
ctx.Conclusion(text="People with allergies should be careful", indicator="Hazard")
# Scores call out numeric quantities
ctx.Score(text="Error rate", quantity=20, unit="%")

The information you provide will be rendered in the notebook using display templates.

The Score widget can also be used to integrate scores into the Dyff App.

Deploying and running the notebook

The process of deploying a notebook is just like the process of deploying an analysis implemented as a Python function. You need to create three resources:

  1. A Module containing the notebook code (the .ipynb file).

  2. A Method that describes the method and its inputs and outputs, and references the Module from step (1).

  3. A SafetyCase that references the Method from step (2) and specifies the IDs of specific resources to pass as inputs.

Create a Module

Assuming you’ve implemented your notebook in a file called my-notebook.ipynb in the directory /home/me/dyff/my-notebook, you would create and upload the package like this:

 1# SPDX-FileCopyrightText: 2024 UL Research Institutes
 2# SPDX-License-Identifier: Apache-2.0
 3
 4from __future__ import annotations
 5
 6from pathlib import Path
 7
 8from dyff.audit.local import DyffLocalPlatform
 9from dyff.schema.platform import *
10from dyff.schema.requests import *
11
12ACCOUNT: str = ...
13ROOT_DIR: Path = Path("/home/me/dyff")
14
15# Develop using the local platform
16dyffapi = DyffLocalPlatform(
17    storage_root=ROOT_DIR / ".dyff-local",
18)
19# When you're ready, switch to the remote platform:
20# dyffapi = Client(...)
21
22module_root = str(ROOT_DIR / "my-notebook")
23module = dyffapi.modules.create_package(
24    module_root,
25    account=ACCOUNT,
26    name="my-notebook",
27)
28dyffapi.modules.upload_package(module, module_root)
29print(module.json(indent=2))
30

Create a Method

The Method resource specifies the inputs and outputs:

31method_request = MethodCreateRequest(
32    name="mean-word-length-notebook",
33    # The notebook analyzes multiple measurements of the same system
34    scope=MethodScope.InferenceService,
35    description="Visualizes the mean word length in prompts and system completions.",
36    # The method is implemented as a Jupyter notebook
37    implementation=MethodImplementation(
38        kind=MethodImplementationKind.JupyterNotebook,
39        jupyterNotebook=MethodImplementationJupyterNotebook(
40            notebookModule=module.id,
41            # The path to the notebook file, relative to the module root directory
42            notebookPath="my-notebook.ipynb",
43        ),
44    ),
45    # The method accepts one argument called 'threshold'
46    parameters=[
47        MethodParameter(keyword="threshold", description="(float) A numeric threshold"),
48    ],
49    # The method accepts two PyArrow datasets as inputs:
50    # - The one called 'easy' is a Measurement on the "easy" dataset
51    # - The one called 'hard' is a Measurement on the "hard" dataset
52    inputs=[
53        MethodInput(kind=MethodInputKind.Measurement, keyword="easy"),
54        MethodInput(kind=MethodInputKind.Measurement, keyword="hard"),
55    ],
56    # The method produces a SafetyCase
57    output=MethodOutput(
58        kind=MethodOutputKind.SafetyCase,
59        safetyCase=SafetyCaseSpec(
60            name="mean-word-length-safetycase",
61            description="Visualizes the mean word length in prompts and system completions.",
62        ),
63    ),
64    # The Module containing the notebook code
65    modules=[module.id],
66    account=ACCOUNT,
67)
68method = dyffapi.methods.create(method_request)
69print(method.json(indent=2))
70

For this notebook, both of the inputs are Measurements — they could also be Datasets or Evaluations. Here, we want to analyze a (hypothetical) Measurement computed using the same Method but applied to two different input datasets, called easy and hard.

Create a SafetyCase

The SafetyCase resource represents the computational work needed to run your notebook on specific inputs. You use the same AnalysisCreateRequest class that is used when creating Measurements:

71easy_measurement_id: str = ...
72hard_measurement_id: str = ...
73analysis_request = AnalysisCreateRequest(
74    account=ACCOUNT,
75    method=method.id,
76    arguments=[
77        AnalysisArgument(keyword="threshold", value="1.0"),
78    ],
79    inputs=[
80        AnalysisInput(keyword="easy", entity=easy_measurement_id),
81        AnalysisInput(keyword="hard", entity=hard_measurement_id),
82    ],
83)
84safetycase = dyffapi.safetycases.create(analysis_request)
85print(safetycase.json(indent=2))

Full Example

 1# SPDX-FileCopyrightText: 2024 UL Research Institutes
 2# SPDX-License-Identifier: Apache-2.0
 3
 4from __future__ import annotations
 5
 6from pathlib import Path
 7
 8from dyff.audit.local import DyffLocalPlatform
 9from dyff.schema.platform import *
10from dyff.schema.requests import *
11
12ACCOUNT: str = ...
13ROOT_DIR: Path = Path("/home/me/dyff")
14
15# Develop using the local platform
16dyffapi = DyffLocalPlatform(
17    storage_root=ROOT_DIR / ".dyff-local",
18)
19# When you're ready, switch to the remote platform:
20# dyffapi = Client(...)
21
22module_root = str(ROOT_DIR / "my-notebook")
23module = dyffapi.modules.create_package(
24    module_root,
25    account=ACCOUNT,
26    name="my-notebook",
27)
28dyffapi.modules.upload_package(module, module_root)
29print(module.json(indent=2))
30
31method_request = MethodCreateRequest(
32    name="mean-word-length-notebook",
33    # The notebook analyzes multiple measurements of the same system
34    scope=MethodScope.InferenceService,
35    description="Visualizes the mean word length in prompts and system completions.",
36    # The method is implemented as a Jupyter notebook
37    implementation=MethodImplementation(
38        kind=MethodImplementationKind.JupyterNotebook,
39        jupyterNotebook=MethodImplementationJupyterNotebook(
40            notebookModule=module.id,
41            # The path to the notebook file, relative to the module root directory
42            notebookPath="my-notebook.ipynb",
43        ),
44    ),
45    # The method accepts one argument called 'threshold'
46    parameters=[
47        MethodParameter(keyword="threshold", description="(float) A numeric threshold"),
48    ],
49    # The method accepts two PyArrow datasets as inputs:
50    # - The one called 'easy' is a Measurement on the "easy" dataset
51    # - The one called 'hard' is a Measurement on the "hard" dataset
52    inputs=[
53        MethodInput(kind=MethodInputKind.Measurement, keyword="easy"),
54        MethodInput(kind=MethodInputKind.Measurement, keyword="hard"),
55    ],
56    # The method produces a SafetyCase
57    output=MethodOutput(
58        kind=MethodOutputKind.SafetyCase,
59        safetyCase=SafetyCaseSpec(
60            name="mean-word-length-safetycase",
61            description="Visualizes the mean word length in prompts and system completions.",
62        ),
63    ),
64    # The Module containing the notebook code
65    modules=[module.id],
66    account=ACCOUNT,
67)
68method = dyffapi.methods.create(method_request)
69print(method.json(indent=2))
70
71easy_measurement_id: str = ...
72hard_measurement_id: str = ...
73analysis_request = AnalysisCreateRequest(
74    account=ACCOUNT,
75    method=method.id,
76    arguments=[
77        AnalysisArgument(keyword="threshold", value="1.0"),
78    ],
79    inputs=[
80        AnalysisInput(keyword="easy", entity=easy_measurement_id),
81        AnalysisInput(keyword="hard", entity=hard_measurement_id),
82    ],
83)
84safetycase = dyffapi.safetycases.create(analysis_request)
85print(safetycase.json(indent=2))