Inference – Models, Services, and Sessions

As illustrated in the overview, the process of making inferences on data involves three kinds of entities:

Models

A Model is the “raw” form of an ML system, such as neural network weights. These are often obtained from sites such as Hugging Face. Models generally are not runnable directly; they require supporting code and configuration to set up the system and move data in and out.

Inference Services

An InferenceService, in contrast, is a packaged ML system that is ready to run. In Dyff, an inference service is a Docker image that runs a Web server that provides a standard JSON API. Often, this Docker image is just a “wrapper” that loads a Model, but Dyff can serve any inference service that provides the appropriate JSON API. This is a key differentiating capability of Dyff: it allows for complex, production-ready ML systems with arbitrary dependencies to be audited through a standard interface.

Inference Session

An InferenceSession is a running instance of an inference service. If a service is a Docker image, a session is a Docker container. A session can be backed by multiple replicas of the service to increase capacity.

As an Auditor, you generally don’t need to worry about creating these resources; you’ll simply reference an existing inference service as the “system under test” when you create resources like Evaluations and Measurements as part of your audit pipeline. This section shows you how to set up a “mock” inference service that you can use when developing your audits.

Local testing using mock services

When developing locally, you might not want to deal with running a real ML system, which might take up gigabytes of space and require special hardware like GPUs. Instead, you can create a “mock-up” service that implements the same interface as a real service and responds with simulated inferences.

Create the service

First, create a service using one of the types defined in the dyff.audit.local.mocks module:

 1from dyff.audit.local import DyffLocalPlatform, mocks
 2from dyff.schema.requests import InferenceSessionCreateRequest
 3
 4dyffapi = DyffLocalPlatform(storage_root="/some/dir")
 5ACCOUNT = ...
 6
 7service = dyffapi.inferenceservices.create_mock(mocks.TextCompletion, account=ACCOUNT)
 8print(service.json(indent=2))
 9
10session_request = InferenceSessionCreateRequest(
11    account=ACCOUNT, inferenceService=service.id
12)
13session = dyffapi.inferencesessions.create(session_request)
14print(session.json(indent=2))
15
16inference_client = dyffapi.inferencesessions.client(
17    session.id, "dummy-token", interface=session.inferenceService.interface
18)
19
20completion = inference_client.infer({"text": "Open the pod bay doors, Hal!"})
21print(completion)

Start a session

Next, create an inference session backed by the mock service:

 1from dyff.audit.local import DyffLocalPlatform, mocks
 2from dyff.schema.requests import InferenceSessionCreateRequest
 3
 4dyffapi = DyffLocalPlatform(storage_root="/some/dir")
 5ACCOUNT = ...
 6
 7service = dyffapi.inferenceservices.create_mock(mocks.TextCompletion, account=ACCOUNT)
 8print(service.json(indent=2))
 9
10session_request = InferenceSessionCreateRequest(
11    account=ACCOUNT, inferenceService=service.id
12)
13session = dyffapi.inferencesessions.create(session_request)
14print(session.json(indent=2))
15
16inference_client = dyffapi.inferencesessions.client(
17    session.id, "dummy-token", interface=session.inferenceService.interface
18)
19
20completion = inference_client.infer({"text": "Open the pod bay doors, Hal!"})
21print(completion)

Make an inference

Finally, create an inference client for the service, and use it to make an inference. When creating the client, you specify an interface that tells the client how to translate inputs and outputs to and from the format expected by the service. Usually, you’ll use the interface specified by the service, as shown in the example. One reason to use a different interface is if you want to communicate with the service using its “native” data format rather than the Dyff standard one.

Note

The client() function requires a token argument for consistency with the Dyff API client, but the token is ignored when using DyffLocalPlatform.

 1from dyff.audit.local import DyffLocalPlatform, mocks
 2from dyff.schema.requests import InferenceSessionCreateRequest
 3
 4dyffapi = DyffLocalPlatform(storage_root="/some/dir")
 5ACCOUNT = ...
 6
 7service = dyffapi.inferenceservices.create_mock(mocks.TextCompletion, account=ACCOUNT)
 8print(service.json(indent=2))
 9
10session_request = InferenceSessionCreateRequest(
11    account=ACCOUNT, inferenceService=service.id
12)
13session = dyffapi.inferencesessions.create(session_request)
14print(session.json(indent=2))
15
16inference_client = dyffapi.inferencesessions.client(
17    session.id, "dummy-token", interface=session.inferenceService.interface
18)
19
20completion = inference_client.infer({"text": "Open the pod bay doors, Hal!"})
21print(completion)