Create a safety case¶
A safety case is a document intended for human readers that compiles and
presents the evidence of the safety (or un-safety) of a system for a particular
use case and context. Safety cases are modeled by
SafetyCase
resources.
Safety cases are scoped to a single AI/ML system. A safety case typically will include many related Measurements of the system produced using various input datasets and analysis methods. Safety cases in Dyff are rendered as HTML documents containing text, tables, charts, and other graphics.
Implementation: Jupyter notebook¶
Safety cases are generated using an analysis workflow that is very similar to the workflow for generating Measurements. The key difference is that Methods that generate safety cases are implemented as Jupyter notebooks . The Dyff Platform runs the Jupyter notebook, renders the output cells as HTML, and serves the generated HTML at a designated endpoint.
Conceptually, a notebook that generates a safety case is very similar to a Python function that generates a measurement. You can think of the notebook as a function that maps input datasets to an HTML document. The key difference is that notebooks don’t accept arguments in the same way that Python functions do, so we need a different mechanism to pass data into the notebook.
Warning
Any output generated by your notebook will be visible to anyone who has permission to view the corresponding safety cases. Be careful not to leak sensitive information such as PII or private labels in the output.
The AnalysisContext
¶
The first thing you should do in your notebook is to instantiate the
dyff.audit.analysis.AnalysisContext
class:
from dyff.audit.analysis import AnalysisContext
ctx = AnalysisContext()
The context instance allows you to access the inputs to the notebook:
# Access arguments
category: str = ctx.get_argument("category")
temperature: float = float(ctx.get_argument("temperature"))
# Access input data
dataset: pyarrow.dataset.Dataset = ctx.open_input_dataset("dataset")
The available arguments and input datasets are defined in the Method specification resource associated with the notebook. When you create a safety case resource that references this method, the Dyff Platform binds the specified values and data inputs to the specified names.
You can now proceed with all of the usual Jupyter notebook activities — manipulating data, embedding charts, creating formatted text, etc.
Deploying and running the notebook¶
The process of deploying a notebook is just like the process of deploying an analysis implemented as a Python function. You need to create three resources:
A
Module
containing the notebook code (the.ipynb
file).A
Method
that describes the method and its inputs and outputs, and references theModule
from step (1).A
Measurement
that references theMethod
from step (2) and specifies the IDs of specific resources to pass as inputs.
Create a Module¶
Assuming you’ve implemented your notebook in a file called my-notebook.ipynb
in the directory /home/me/dyff/my-notebook
, you would create and upload the
package like this:
1# SPDX-FileCopyrightText: 2024 UL Research Institutes
2# SPDX-License-Identifier: Apache-2.0
3
4from __future__ import annotations
5
6from pathlib import Path
7
8from dyff.audit.local import DyffLocalPlatform
9from dyff.schema.platform import *
10from dyff.schema.requests import *
11
12ACCOUNT: str = ...
13ROOT_DIR: Path = Path("/home/me/dyff")
14
15# Develop using the local platform
16dyffapi = DyffLocalPlatform(
17 storage_root=ROOT_DIR / ".dyff-local",
18)
19# When you're ready, switch to the remote platform:
20# dyffapi = Client(...)
21
22module_root = str(ROOT_DIR / "my-notebook")
23module = dyffapi.modules.create_package(
24 module_root,
25 account=ACCOUNT,
26 name="my-notebook",
27)
28dyffapi.modules.upload_package(module, module_root)
29print(module.json(indent=2))
30
Create a Method¶
The Method
resource specifies the inputs and
outputs:
31method_description = """
32# Summary
33
34Visualizes the relationship between mean word length in prompts and system
35completions. The description uses [Markdown](https://www.markdownguide.org) syntax.
36"""
37method_request = MethodCreateRequest(
38 name="mean-word-length-notebook",
39 # The notebook analyzes multiple measurements of the same system
40 scope=MethodScope.InferenceService,
41 description=method_description,
42 # The method is implemented as a Jupyter notebook
43 implementation=MethodImplementation(
44 kind=MethodImplementationKind.JupyterNotebook,
45 jupyterNotebook=MethodImplementationJupyterNotebook(
46 notebookModule=module.id,
47 # The path to the notebook file, relative to the module root directory
48 notebookPath="my-notebook.ipynb",
49 ),
50 ),
51 # The method accepts one argument called 'threshold'
52 parameters=[
53 MethodParameter(keyword="threshold", description="(float) A numeric threshold"),
54 ],
55 # The method accepts two PyArrow datasets as inputs:
56 # - The one called 'easy' is a Measurement on the "easy" dataset
57 # - The one called 'hard' is a Measurement on the "hard" dataset
58 inputs=[
59 MethodInput(kind=MethodInputKind.Measurement, keyword="easy"),
60 MethodInput(kind=MethodInputKind.Measurement, keyword="hard"),
61 ],
62 # The method produces a SafetyCase
63 output=MethodOutput(
64 kind=MethodOutputKind.SafetyCase,
65 safetyCase=SafetyCaseSpec(
66 name="mean-word-length-safetycase",
67 description="This is also **Markdown**.",
68 ),
69 ),
70 # The Module containing the Method code
71 modules=[module.id],
72 account=ACCOUNT,
73)
74method = dyffapi.methods.create(method_request)
75print(method.json(indent=2))
76
For this notebook, both of the inputs are Measurements
— they could also
be Datasets
, Evaluations
, or Reports
. Here, we want to analyze a
Measurement computed using the same Method but applied to two different input
datasets, called easy
and hard
.
Create a SafetyCase¶
The SafetyCase
resource represents the
computational work needed to run your notebook on specific inputs. You use the
same AnalysisCreateRequest
class that is used
when creating Measurements:
77easy_measurement_id: str = ...
78hard_measurement_id: str = ...
79analysis_request = AnalysisCreateRequest(
80 account=ACCOUNT,
81 method=method.id,
82 arguments=[
83 AnalysisArgument(keyword="threshold", value="1.0"),
84 ],
85 inputs=[
86 AnalysisInput(keyword="easy", entity=easy_measurement_id),
87 AnalysisInput(keyword="hard", entity=hard_measurement_id),
88 ],
89)
90safetycase = dyffapi.safetycases.create(analysis_request)
91print(safetycase.json(indent=2))
Full Example¶
1# SPDX-FileCopyrightText: 2024 UL Research Institutes
2# SPDX-License-Identifier: Apache-2.0
3
4from __future__ import annotations
5
6from pathlib import Path
7
8from dyff.audit.local import DyffLocalPlatform
9from dyff.schema.platform import *
10from dyff.schema.requests import *
11
12ACCOUNT: str = ...
13ROOT_DIR: Path = Path("/home/me/dyff")
14
15# Develop using the local platform
16dyffapi = DyffLocalPlatform(
17 storage_root=ROOT_DIR / ".dyff-local",
18)
19# When you're ready, switch to the remote platform:
20# dyffapi = Client(...)
21
22module_root = str(ROOT_DIR / "my-notebook")
23module = dyffapi.modules.create_package(
24 module_root,
25 account=ACCOUNT,
26 name="my-notebook",
27)
28dyffapi.modules.upload_package(module, module_root)
29print(module.json(indent=2))
30
31method_description = """
32# Summary
33
34Visualizes the relationship between mean word length in prompts and system
35completions. The description uses [Markdown](https://www.markdownguide.org) syntax.
36"""
37method_request = MethodCreateRequest(
38 name="mean-word-length-notebook",
39 # The notebook analyzes multiple measurements of the same system
40 scope=MethodScope.InferenceService,
41 description=method_description,
42 # The method is implemented as a Jupyter notebook
43 implementation=MethodImplementation(
44 kind=MethodImplementationKind.JupyterNotebook,
45 jupyterNotebook=MethodImplementationJupyterNotebook(
46 notebookModule=module.id,
47 # The path to the notebook file, relative to the module root directory
48 notebookPath="my-notebook.ipynb",
49 ),
50 ),
51 # The method accepts one argument called 'threshold'
52 parameters=[
53 MethodParameter(keyword="threshold", description="(float) A numeric threshold"),
54 ],
55 # The method accepts two PyArrow datasets as inputs:
56 # - The one called 'easy' is a Measurement on the "easy" dataset
57 # - The one called 'hard' is a Measurement on the "hard" dataset
58 inputs=[
59 MethodInput(kind=MethodInputKind.Measurement, keyword="easy"),
60 MethodInput(kind=MethodInputKind.Measurement, keyword="hard"),
61 ],
62 # The method produces a SafetyCase
63 output=MethodOutput(
64 kind=MethodOutputKind.SafetyCase,
65 safetyCase=SafetyCaseSpec(
66 name="mean-word-length-safetycase",
67 description="This is also **Markdown**.",
68 ),
69 ),
70 # The Module containing the Method code
71 modules=[module.id],
72 account=ACCOUNT,
73)
74method = dyffapi.methods.create(method_request)
75print(method.json(indent=2))
76
77easy_measurement_id: str = ...
78hard_measurement_id: str = ...
79analysis_request = AnalysisCreateRequest(
80 account=ACCOUNT,
81 method=method.id,
82 arguments=[
83 AnalysisArgument(keyword="threshold", value="1.0"),
84 ],
85 inputs=[
86 AnalysisInput(keyword="easy", entity=easy_measurement_id),
87 AnalysisInput(keyword="hard", entity=hard_measurement_id),
88 ],
89)
90safetycase = dyffapi.safetycases.create(analysis_request)
91print(safetycase.json(indent=2))