Inference – Models, Services, and Sessions¶
As illustrated in the overview, the process of making inferences on data involves three kinds of entities:
- Models
A
Model
is the “raw” form of an ML system, such as neural network weights. These are often obtained from sites such as Hugging Face. Models generally are not runnable directly; they require supporting code and configuration to set up the system and move data in and out.- Inference Services
An
InferenceService
, in contrast, is a packaged ML system that is ready to run. In Dyff, an inference service is a Docker image that runs a Web server that provides a standard JSON API. Often, this Docker image is just a “wrapper” that loads a Model, but Dyff can serve any inference service that provides the appropriate JSON API.- Inference Session
An
InferenceSession
is a running instance of an inference service. If a service is a Docker image, a session is a Docker container. A session can be backed by multiple replicas of the service to increase capacity.
As an auditor, you generally don’t need to worry about creating these resources;
you’ll simply reference an existing inference service as the “system under test”
when you create resources like Evaluations
and SafetyCases
as part of your audit pipeline. This section
shows you how to set up a “mock” inference service that you can use when
developing your audits.
Local testing using mock services¶
When developing locally, you might not want to deal with running a real ML system, which might take up gigabytes of space and require special hardware like GPUs. Instead, you can create a “mock-up” service that implements the same interface as a real service and responds with simulated inferences.
Create the service¶
First, create a service using one of the types defined in the
dyff.audit.local.mocks
module:
1from dyff.audit.local import DyffLocalPlatform, mocks
2from dyff.schema.requests import InferenceSessionCreateRequest
3
4dyffapi = DyffLocalPlatform(storage_root="/some/dir")
5ACCOUNT = ...
6
7service = dyffapi.inferenceservices.create_mock(mocks.TextCompletion, account=ACCOUNT)
8print(service.json(indent=2))
9
10session_request = InferenceSessionCreateRequest(
11 account=ACCOUNT, inferenceService=service.id
12)
13session_and_token = dyffapi.inferencesessions.create(session_request)
14session = session_and_token.inferencesession
15token = session_and_token.session
16print(session.json(indent=2))
17
18inference_client = dyffapi.inferencesessions.client(
19 session.id, token, interface=session.inferenceService.interface
20)
21
22completion = inference_client.infer({"text": "Open the pod bay doors, Hal!"})
23print(completion)
Start a session¶
Next, create an inference session backed by the mock service:
1from dyff.audit.local import DyffLocalPlatform, mocks
2from dyff.schema.requests import InferenceSessionCreateRequest
3
4dyffapi = DyffLocalPlatform(storage_root="/some/dir")
5ACCOUNT = ...
6
7service = dyffapi.inferenceservices.create_mock(mocks.TextCompletion, account=ACCOUNT)
8print(service.json(indent=2))
9
10session_request = InferenceSessionCreateRequest(
11 account=ACCOUNT, inferenceService=service.id
12)
13session_and_token = dyffapi.inferencesessions.create(session_request)
14session = session_and_token.inferencesession
15token = session_and_token.session
16print(session.json(indent=2))
17
18inference_client = dyffapi.inferencesessions.client(
19 session.id, token, interface=session.inferenceService.interface
20)
21
22completion = inference_client.infer({"text": "Open the pod bay doors, Hal!"})
23print(completion)
Note that when creating an inference session, the response is a
InferenceSessionAndToken
object containing
both the inference session resource and an auth token for making inference
calls.
Make an inference¶
Finally, create an inference client for the service, and use it to make an
inference. When creating the client, you specify an interface
that tells the
client how to translate inputs and outputs to and from the format expected by
the service. Usually, you’ll use the interface specified by the service, as
shown in the example. One reason to use a different interface is if you want to
communicate with the service using its “native” data format rather than the Dyff
standard one.
Note
The client()
function requires a token
argument for consistency with
the Dyff API client, but the token
is ignored when using
DyffLocalPlatform
.
1from dyff.audit.local import DyffLocalPlatform, mocks
2from dyff.schema.requests import InferenceSessionCreateRequest
3
4dyffapi = DyffLocalPlatform(storage_root="/some/dir")
5ACCOUNT = ...
6
7service = dyffapi.inferenceservices.create_mock(mocks.TextCompletion, account=ACCOUNT)
8print(service.json(indent=2))
9
10session_request = InferenceSessionCreateRequest(
11 account=ACCOUNT, inferenceService=service.id
12)
13session_and_token = dyffapi.inferencesessions.create(session_request)
14session = session_and_token.inferencesession
15token = session_and_token.session
16print(session.json(indent=2))
17
18inference_client = dyffapi.inferencesessions.client(
19 session.id, token, interface=session.inferenceService.interface
20)
21
22completion = inference_client.infer({"text": "Open the pod bay doors, Hal!"})
23print(completion)
Inference on the Dyff platform¶
Running AI systems on the Dyff platform is very similar to running a mock-up system locally, except that normally, you will be running an inference service that already exists rather than creating a new one. The main difference is that real inference sessions can take a while to start, so you need to account for the possibility that inference will fail during the startup period:
import time
from dyff.client import Client
from dyff.schema.platform import is_status_terminal
from dyff.schema.requests import InferenceSessionCreateRequest
dyffapi = Client()
ACCOUNT = ...
service = dyffapi.inferenceservices.get("service-id")
session_request = InferenceSessionCreateRequest(
account=ACCOUNT, inferenceService=service.id
)
session_and_token = dyffapi.inferencesessions.create(session_request)
session = session_and_token.inferencesession
token = session_and_token.session
inference_client = dyffapi.inferencesessions.client(
session.id, token, interface=session.inferenceService.interface
)
# Sessions usually take some time to start. In general, the only way to tell
# that the session is ready is to attempt to make an inference.
while True:
try:
completion = inference_client.infer({"text": "Open the pod bay doors, Hal!"})
print(completion)
break
except Exception:
# If the session has terminated, inference will never succeed
if is_status_terminal(dyffapi.inferencesessions.get(session.id).status):
print("session terminated")
break
print("not ready")
time.sleep(30)