Inference – Models, Services, and Sessions¶
As illustrated in the overview, the process of making inferences on data involves three kinds of entities:
- Models
A
Model
is the “raw” form of an ML system, such as neural network weights. These are often obtained from sites such as Hugging Face. Models generally are not runnable directly; they require supporting code and configuration to set up the system and move data in and out.- Inference Services
An
InferenceService
, in contrast, is a packaged ML system that is ready to run. In Dyff, an inference service is a Docker image that runs a Web server that provides a standard JSON API. Often, this Docker image is just a “wrapper” that loads a Model, but Dyff can serve any inference service that provides the appropriate JSON API. This is a key differentiating capability of Dyff: it allows for complex, production-ready ML systems with arbitrary dependencies to be audited through a standard interface.- Inference Session
An
InferenceSession
is a running instance of an inference service. If a service is a Docker image, a session is a Docker container. A session can be backed by multiple replicas of the service to increase capacity.
As an Auditor, you generally don’t need to worry about creating these resources;
you’ll simply reference an existing inference service as the “system under test”
when you create resources like Evaluations
and Measurements
as part of your audit pipeline. This section
shows you how to set up a “mock” inference service that you can use when
developing your audits.
Local testing using mock services¶
When developing locally, you might not want to deal with running a real ML system, which might take up gigabytes of space and require special hardware like GPUs. Instead, you can create a “mock-up” service that implements the same interface as a real service and responds with simulated inferences.
Create the service¶
First, create a service using one of the types defined in the
dyff.audit.local.mocks
module:
1from dyff.audit.local import DyffLocalPlatform, mocks
2from dyff.schema.requests import InferenceSessionCreateRequest
3
4dyffapi = DyffLocalPlatform(storage_root="/some/dir")
5ACCOUNT = ...
6
7service = dyffapi.inferenceservices.create_mock(mocks.TextCompletion, account=ACCOUNT)
8print(service.json(indent=2))
9
10session_request = InferenceSessionCreateRequest(
11 account=ACCOUNT, inferenceService=service.id
12)
13session = dyffapi.inferencesessions.create(session_request)
14print(session.json(indent=2))
15
16inference_client = dyffapi.inferencesessions.client(
17 session.id, "dummy-token", interface=session.inferenceService.interface
18)
19
20completion = inference_client.infer({"text": "Open the pod bay doors, Hal!"})
21print(completion)
Start a session¶
Next, create an inference session backed by the mock service:
1from dyff.audit.local import DyffLocalPlatform, mocks
2from dyff.schema.requests import InferenceSessionCreateRequest
3
4dyffapi = DyffLocalPlatform(storage_root="/some/dir")
5ACCOUNT = ...
6
7service = dyffapi.inferenceservices.create_mock(mocks.TextCompletion, account=ACCOUNT)
8print(service.json(indent=2))
9
10session_request = InferenceSessionCreateRequest(
11 account=ACCOUNT, inferenceService=service.id
12)
13session = dyffapi.inferencesessions.create(session_request)
14print(session.json(indent=2))
15
16inference_client = dyffapi.inferencesessions.client(
17 session.id, "dummy-token", interface=session.inferenceService.interface
18)
19
20completion = inference_client.infer({"text": "Open the pod bay doors, Hal!"})
21print(completion)
Make an inference¶
Finally, create an inference client for the service, and use it to make an
inference. When creating the client, you specify an interface
that tells the
client how to translate inputs and outputs to and from the format expected by
the service. Usually, you’ll use the interface specified by the service, as
shown in the example. One reason to use a different interface is if you want to
communicate with the service using its “native” data format rather than the Dyff
standard one.
Note
The client()
function requires a token
argument for consistency with
the Dyff API client, but the token
is ignored when using
DyffLocalPlatform
.
1from dyff.audit.local import DyffLocalPlatform, mocks
2from dyff.schema.requests import InferenceSessionCreateRequest
3
4dyffapi = DyffLocalPlatform(storage_root="/some/dir")
5ACCOUNT = ...
6
7service = dyffapi.inferenceservices.create_mock(mocks.TextCompletion, account=ACCOUNT)
8print(service.json(indent=2))
9
10session_request = InferenceSessionCreateRequest(
11 account=ACCOUNT, inferenceService=service.id
12)
13session = dyffapi.inferencesessions.create(session_request)
14print(session.json(indent=2))
15
16inference_client = dyffapi.inferencesessions.client(
17 session.id, "dummy-token", interface=session.inferenceService.interface
18)
19
20completion = inference_client.infer({"text": "Open the pod bay doors, Hal!"})
21print(completion)