Create a safety case¶

A safety case is a document intended for human readers that compiles and presents the evidence of the safety (or un-safety) of a system for a particular use case and context. Safety cases are modeled by SafetyCase resources.

Safety cases are scoped to a single AI/ML system. A safety case typically will include many related Measurements of the system produced using various input datasets and analysis methods. Safety cases in Dyff are rendered as HTML documents containing text, tables, charts, and other graphics.

Implementation: Jupyter notebook¶

Safety cases are generated using an analysis workflow that is very similar to the workflow for generating Measurements. The key difference is that Methods that generate safety cases are implemented as Jupyter notebooks . The Dyff Platform runs the Jupyter notebook, renders the output cells as HTML, and serves the generated HTML at a designated endpoint.

Conceptually, a notebook that generates a safety case is very similar to a Python function that generates a measurement. You can think of the notebook as a function that maps input datasets to an HTML document. The key difference is that notebooks don’t accept arguments in the same way that Python functions do, so we need a different mechanism to pass data into the notebook.

Warning

Any output generated by your notebook will be visible to anyone who has permission to view the corresponding safety cases. Be careful not to leak sensitive information such as PII or private labels in the output.

The `AnalysisContext`¶

The first thing you should do in your notebook is to instantiate the dyff.audit.analysis.AnalysisContext class:

from dyff.audit.analysis import AnalysisContext

ctx = AnalysisContext()

The context instance allows you to access the inputs to the notebook:

# Access arguments
category: str = ctx.get_argument("category")
temperature: float = float(ctx.get_argument("temperature"))

# Access input data
dataset: pyarrow.dataset.Dataset = ctx.open_input_dataset("dataset")

The available arguments and input datasets are defined in the Method specification resource associated with the notebook. When you create a safety case resource that references this method, the Dyff Platform binds the specified values and data inputs to the specified names.

You can now proceed with all of the usual Jupyter notebook activities — manipulating data, embedding charts, creating formatted text, etc.

Deploying and running the notebook¶

The process of deploying a notebook is just like the process of deploying an analysis implemented as a Python function. You need to create three resources:

A Module containing the notebook code (the .ipynb file).

A Method that describes the method and its inputs and outputs, and references the Module from step (1).

A Measurement that references the Method from step (2) and specifies the IDs of specific resources to pass as inputs.

Create a Module¶

Assuming you’ve implemented your notebook in a file called my-notebook.ipynb in the directory /home/me/dyff/my-notebook, you would create and upload the package like this:

# SPDX-FileCopyrightText: 2024 UL Research Institutes
# SPDX-License-Identifier: Apache-2.0

from __future__ import annotations

from pathlib import Path

from dyff.audit.local import DyffLocalPlatform
from dyff.schema.platform import *
from dyff.schema.requests import *

ACCOUNT: str = ...
ROOT_DIR: Path = Path("/home/me/dyff")

# Develop using the local platform
dyffapi = DyffLocalPlatform(
    storage_root=ROOT_DIR / ".dyff-local",
)
# When you're ready, switch to the remote platform:
# dyffapi = Client(...)

module_root = str(ROOT_DIR / "my-notebook")
module = dyffapi.modules.create_package(
    module_root,
    account=ACCOUNT,
    name="my-notebook",
)
dyffapi.modules.upload_package(module, module_root)
print(module.json(indent=2))

Create a Method¶

The Method resource specifies the inputs and outputs:

method_description = """
# Summary

Visualizes the relationship between mean word length in prompts and system
completions. The description uses [Markdown](https://www.markdownguide.org) syntax.
"""
method_request = MethodCreateRequest(
    name="mean-word-length-notebook",
    # The notebook analyzes multiple measurements of the same system
    scope=MethodScope.InferenceService,
    description=method_description,
    # The method is implemented as a Jupyter notebook
    implementation=MethodImplementation(
        kind=MethodImplementationKind.JupyterNotebook,
        jupyterNotebook=MethodImplementationJupyterNotebook(
            notebookModule=module.id,
            # The path to the notebook file, relative to the module root directory
            notebookPath="my-notebook.ipynb",
        ),
    ),
    # The method accepts one argument called 'threshold'
    parameters=[
        MethodParameter(keyword="threshold", description="(float) A numeric threshold"),
    ],
    # The method accepts two PyArrow datasets as inputs:
    # - The one called 'easy' is a Measurement on the "easy" dataset
    # - The one called 'hard' is a Measurement on the "hard" dataset
    inputs=[
        MethodInput(kind=MethodInputKind.Measurement, keyword="easy"),
        MethodInput(kind=MethodInputKind.Measurement, keyword="hard"),
    ],
    # The method produces a SafetyCase
    output=MethodOutput(
        kind=MethodOutputKind.SafetyCase,
        safetyCase=SafetyCaseSpec(
            name="mean-word-length-safetycase",
            description="This is also **Markdown**.",
        ),
    ),
    # The Module containing the Method code
    modules=[module.id],
    account=ACCOUNT,
)
method = dyffapi.methods.create(method_request)
print(method.json(indent=2))

For this notebook, both of the inputs are Measurements — they could also be Datasets, Evaluations, or Reports. Here, we want to analyze a Measurement computed using the same Method but applied to two different input datasets, called easy and hard.