Data schemas

Dyff requires that formal schemas are defined for all data artifacts involved in the audit process. This includes inputs to the model and additional covariates associated with the inputs, the raw inference outputs from the model, and the post-processed and “scored” model outputs that are ready for public consumption. Formal schemas ensure forward-compatibility of existing resources in the platform with new datasets, models, and analysis methods.

The input data schemas are associated with Dataset resources. Model output schemas are associated with InferenceService resources and, by extension, resources such as InferenceSession and Evaluation that consume inference services. Schemas for the post-processed outputs are associated with Method resources and their products (Measurement and SafetyCase resources).

How schemas are specified

Dyff uses Pydantic models to formalize all of its data schemas. From these Pydantic schemas, we generate Arrow schemas for data in persistent storage, and JSON schemas for the specification of remote procedure call (RPC) interfaces. These three schema types are inter-convertible if we avoid a few specific features that don’t exist in all three.

When data schemas are required in Dyff APIs, they are specified with the DataSchema type, which contains fields for all three kinds of schema. Only the .arrowSchema is required, but you should populate the other kinds when possible. You can populate all three kinds of schema from just a Pydantic model type.

The best way to create your own schema is to create a new Pydantic model type. You can combine multiple schemas by creating a model type that inherits from all of them. You can also use the product_schema() function to combine component schemas. Both of these approaches create a product type, which just means a schema that contains all the fields in all of its components. The top-level field names must be unique so that the product schema doesn’t result in name collisions.

Required fields

Field names that begin and end with an underscore (e.g., _index_) and field names prefixed with dyff.io or a sub-domain thereof (e.g., subdomain.dyff.io/fieldname) are reserved for use by Dyff.

When you specify input and output schemas for resources like InferenceService, the following special fields are mandatory in the input and/or output schema, as noted:

_index_ : int64 [Required in input and output]

The _index_ field uniquely identifies a single input item within its containing dataset. Every input item must have an _index_ field that is unique within its dataset. Every output item has an _index_ field that matches it to the corresponding input item.

_replication_ : string [Required in output]

The _replication_ field identifies which replication of an evaluation the output item belongs to. It is a UUIDv5 identifier, where the “namespace” is the ID of the evaluation resource and the “name” is the sequential integer index of the replication (i.e., 0, 1, …).

responses : list(struct(response type)) [Required in output]

The responses field contains the list of responses from the inference service for a corresponding input. It is always a list, even if the service only returns a single response.

The elements of the responses list must contain the following fields:

_response_index_ : int64 [Required in output]

The _response_index_ field uniquely identifies each response to a given input item. The responses list may contain more than one item with the same _response_index_ value. For example, text span tagging tasks like Named Entity Recognition may output zero or more predicted entities for a single input text.

Putting this all together, we can see that each combination of (_index_, _replication_, _response_index_) identifies one set of system inferences for the single input item identified by _index_.

You are responsible for ensuring all required fields are in the schemas you specify. This is a design choice that we have made to ensure that data records are self-describing. To make this easier, you can use make_input_schema() and make_output_schema(), which add the required fields to another schema that you provide and then populate a full DataSchema instance using the result. For example, if your service outputs both a piece of text and a classification label, you can create the spec for its output schema like this:

from dyff.schema import product_schema
from dyff.schema.dataset import arrow, classification, text
from dyff.schema.platform import DataSchema

item_schema = product_schema([text.Text, classification.Label])
output_schema = DataSchema.make_output_schema(item_schema)
print(arrow.decode_schema(output_schema.arrowSchema))
_index_: int64
  -- field metadata --
  __doc__: 'The index of the item in the dataset'
_replication_: string
  -- field metadata --
  __doc__: 'ID of the replication the item belongs to.'
responses: list<item: struct<_response_index_: int64, text: string, label: string>>
  child 0, item: struct<_response_index_: int64, text: string, label: string>
      child 0, _response_index_: int64
      -- field metadata --
      __doc__: 'The index of the response among responses to the correspo' + 13
      child 1, text: string
      -- field metadata --
      __doc__: 'Text data'
      child 2, label: string
      -- field metadata --
      __doc__: 'The discrete label of the item'
  -- field metadata --
  __doc__: 'Inference responses'

Notice how the generated Arrow schema includes the required fields _index_, _replication_, and responses, and the items in responses include a _response_index_ field. Note that you can provide any type that derives from DyffSchemaBaseModel as the argument to make_input_schema() and make_output_schema(), which is useful if you need to include data that doesn’t fit any of the pre-defined schemas.

Native data formats and Schema Adapters

Dyff is designed to evaluate AI systems in their “deployable” form, meaning that the code being evaluated is the same as the code that will actually be used in a product, or at least as close as possible. So, Dyff allows model inputs and outputs to have essentially any JSON-like structure. Because there is no accepted standard for AI system APIs, models created by different organizations tend to have incompatible interfaces.

To bridge this gap, it is often necessary to convert data from one schema to another, often multiple times in the course of running the full audit workflow. The code that performs these transformations is part of the test specification; we are not evaluating the model on dataset D, we are evaluating it on dataset D with transformation T applied. So, the necessary adapters must be specified as part of the corresponding resource, so that the whole combination of components and adapters has a unique ID and we can reproduce the entire end-to-end pipeline.

Usually, you do not need to alter your existing datasets and models to conform to Dyff’s schemas. Instead, you will specify schema adapters to transform the inputs and outputs, and the Dyff Platform will store those adapter specifications in the uniquely-identified specification of the inference service. All of these adapters take their configuration as a JSON object, so that schema adapter pipelines can be specified easily as part of a resource specification.

In many cases these transformations are fairly simple. For example, the model might expect the input field to be called "prompt" instead of "text", so the input adapter just has to re-name that field. The TransformJSON adapter is useful for this purpose. This adapter can also be used to add literal fields, which is useful when the model takes additional arguments that modify its behavior (such as sampling parameters). This way, the inference service spec also fully specifies which non-default model parameter settings that particular instantiation of the service uses, which makes uses of the inference service fully reproducible.

For the output schemas, it is often necessary to transform a “column-oriented” schema to a “row-oriented” schema. For example, the service might return responses like:

{"text": ["response A", "response B"]}

that need to be transformed to:

[{"text": "response A"}, {"text": "response B"}]

The ExplodeCollections adapter is useful for this purpose. This transformation can convert a list to a set of rows while optionally also adding one or more index fields that can be used to sort the responses to an input in different ways. The FlattenHierarchy adapter can be used to flatten nested structures into top-level fields.