Inference – Models, Services, and Sessions

As illustrated in the overview, the process of making inferences on data involves three kinds of entities:

Models

A Model is the “raw” form of an ML system, such as neural network weights. These are often obtained from sites such as Hugging Face. Models generally are not runnable directly; they require supporting code and configuration to set up the system and move data in and out.

Inference Services

An InferenceService, in contrast, is a packaged ML system that is ready to run. In Dyff, an inference service is a Docker image that runs a Web server that provides a standard JSON API. Often, this Docker image is just a “wrapper” that loads a Model, but Dyff can serve any inference service that provides the appropriate JSON API.

Inference Session

An InferenceSession is a running instance of an inference service. If a service is a Docker image, a session is a Docker container. A session can be backed by multiple replicas of the service to increase capacity.

As an auditor, you generally don’t need to worry about creating these resources; you’ll simply reference an existing inference service as the “system under test” when you create resources like Evaluations and SafetyCases as part of your audit pipeline. This section shows you how to set up a “mock” inference service that you can use when developing your audits.

Local testing using mock services

When developing locally, you might not want to deal with running a real ML system, which might take up gigabytes of space and require special hardware like GPUs. Instead, you can create a “mock-up” service that implements the same interface as a real service and responds with simulated inferences.

Create the service

First, create a service using one of the types defined in the dyff.audit.local.mocks module:

 1from dyff.audit.local import DyffLocalPlatform, mocks
 2from dyff.schema.requests import InferenceSessionCreateRequest
 3
 4dyffapi = DyffLocalPlatform(storage_root="/some/dir")
 5ACCOUNT = ...
 6
 7service = dyffapi.inferenceservices.create_mock(mocks.TextCompletion, account=ACCOUNT)
 8print(service.json(indent=2))
 9
10session_request = InferenceSessionCreateRequest(
11    account=ACCOUNT, inferenceService=service.id
12)
13session_and_token = dyffapi.inferencesessions.create(session_request)
14session = session_and_token.inferencesession
15token = session_and_token.session
16print(session.json(indent=2))
17
18inference_client = dyffapi.inferencesessions.client(
19    session.id, token, interface=session.inferenceService.interface
20)
21
22completion = inference_client.infer({"text": "Open the pod bay doors, Hal!"})
23print(completion)

Start a session

Next, create an inference session backed by the mock service:

 1from dyff.audit.local import DyffLocalPlatform, mocks
 2from dyff.schema.requests import InferenceSessionCreateRequest
 3
 4dyffapi = DyffLocalPlatform(storage_root="/some/dir")
 5ACCOUNT = ...
 6
 7service = dyffapi.inferenceservices.create_mock(mocks.TextCompletion, account=ACCOUNT)
 8print(service.json(indent=2))
 9
10session_request = InferenceSessionCreateRequest(
11    account=ACCOUNT, inferenceService=service.id
12)
13session_and_token = dyffapi.inferencesessions.create(session_request)
14session = session_and_token.inferencesession
15token = session_and_token.session
16print(session.json(indent=2))
17
18inference_client = dyffapi.inferencesessions.client(
19    session.id, token, interface=session.inferenceService.interface
20)
21
22completion = inference_client.infer({"text": "Open the pod bay doors, Hal!"})
23print(completion)

Note that when creating an inference session, the response is a InferenceSessionAndToken object containing both the inference session resource and an auth token for making inference calls.

Make an inference

Finally, create an inference client for the service, and use it to make an inference. When creating the client, you specify an interface that tells the client how to translate inputs and outputs to and from the format expected by the service. Usually, you’ll use the interface specified by the service, as shown in the example. One reason to use a different interface is if you want to communicate with the service using its “native” data format rather than the Dyff standard one.

Note

The client() function requires a token argument for consistency with the Dyff API client, but the token is ignored when using DyffLocalPlatform.

 1from dyff.audit.local import DyffLocalPlatform, mocks
 2from dyff.schema.requests import InferenceSessionCreateRequest
 3
 4dyffapi = DyffLocalPlatform(storage_root="/some/dir")
 5ACCOUNT = ...
 6
 7service = dyffapi.inferenceservices.create_mock(mocks.TextCompletion, account=ACCOUNT)
 8print(service.json(indent=2))
 9
10session_request = InferenceSessionCreateRequest(
11    account=ACCOUNT, inferenceService=service.id
12)
13session_and_token = dyffapi.inferencesessions.create(session_request)
14session = session_and_token.inferencesession
15token = session_and_token.session
16print(session.json(indent=2))
17
18inference_client = dyffapi.inferencesessions.client(
19    session.id, token, interface=session.inferenceService.interface
20)
21
22completion = inference_client.infer({"text": "Open the pod bay doors, Hal!"})
23print(completion)

Inference on the Dyff platform

Running AI systems on the Dyff platform is very similar to running a mock-up system locally, except that normally, you will be running an inference service that already exists rather than creating a new one. The main difference is that real inference sessions can take a while to start, so you need to account for the possibility that inference will fail during the startup period:

import time
from dyff.client import Client
from dyff.schema.platform import is_status_terminal
from dyff.schema.requests import InferenceSessionCreateRequest

dyffapi = Client()
ACCOUNT = ...

service = dyffapi.inferenceservices.get("service-id")

session_request = InferenceSessionCreateRequest(
    account=ACCOUNT, inferenceService=service.id
)
session_and_token = dyffapi.inferencesessions.create(session_request)
session = session_and_token.inferencesession
token = session_and_token.session

inference_client = dyffapi.inferencesessions.client(
    session.id, token, interface=session.inferenceService.interface
)

# Sessions usually take some time to start. In general, the only way to tell
# that the session is ready is to attempt to make an inference.
while True:
    try:
        completion = inference_client.infer({"text": "Open the pod bay doors, Hal!"})
        print(completion)
        break
    except Exception:
        # If the session has terminated, inference will never succeed
        if is_status_terminal(dyffapi.inferencesessions.get(session.id).status):
            print("session terminated")
            break
        print("not ready")
        time.sleep(30)