Data schemas¶
Dyff requires that formal schemas are defined for all data artifacts involved in the audit process. This includes inputs to the model and additional covariates associated with the inputs, the raw inference outputs from the model, and the post-processed and “scored” model outputs that are ready for public consumption. Formal schemas ensure forward-compatibility of existing resources in the platform with new datasets, models, and analysis methods.
The input data schemas are associated with
Dataset
resources. Model output schemas are
associated with InferenceService
resources and,
by extension, resources such as InferenceSession
and Evaluation
that consume inference services.
Schemas for the post-processed outputs are associated with
Method
resources and their products (Measurement
and SafetyCase
resources).
How schemas are specified¶
Dyff uses Pydantic models to formalize all of its data schemas. From these Pydantic schemas, we generate Arrow schemas for data in persistent storage, and JSON schemas for the specification of remote procedure call (RPC) interfaces. These three schema types are inter-convertible if we avoid a few specific features that don’t exist in all three.
When data schemas are required in Dyff APIs, they are specified with the
DataSchema
type, which contains fields for all
three kinds of schema. Only the .arrowSchema
is required, but you should populate the other kinds when possible. You can populate all three kinds of schema from just a Pydantic model type.
The best way to create your own schema is to create a new Pydantic model type. You can combine multiple schemas by creating a model type that inherits from all of them. You can also use the product_schema()
function to combine component schemas. Both of these approaches create a product type, which just means a schema that contains all the fields in all of its components. The top-level field names must be unique so that the product schema doesn’t result in name collisions.
Required fields¶
Field names that begin and end with an underscore (e.g., _index_
) and field
names prefixed with dyff.io
or a sub-domain thereof (e.g.,
subdomain.dyff.io/fieldname
) are reserved for use by Dyff.
When you specify input and output schemas for resources like
InferenceService
, the following special fields
are mandatory in the input and/or output schema, as noted:
_index_ : int64
[Required in input and output]The
_index_
field uniquely identifies a single input item within its containing dataset. Every input item must have an_index_
field that is unique within its dataset. Every output item has an_index_
field that matches it to the corresponding input item._replication_ : string
[Required in output]The
_replication_
field identifies which replication of an evaluation the output item belongs to. It is a UUIDv5 identifier, where the “namespace” is the ID of the evaluation resource and the “name” is the sequential integer index of the replication (i.e.,0
,1
, …).responses : list(struct(response type))
[Required in output]The
responses
field contains the list of responses from the inference service for a corresponding input. It is always a list, even if the service only returns a single response.The elements of the
responses
list must contain the following fields:
_response_index_ : int64
[Required in output]The
_response_index_
field uniquely identifies each response to a given input item. Theresponses
list may contain more than one item with the same_response_index_
value. For example, text span tagging tasks like Named Entity Recognition may output zero or more predicted entities for a single input text.
Putting this all together, we can see that each combination of (_index_,
_replication_, _response_index_)
identifies one set of system inferences for
the single input item identified by _index_
.
You are responsible for ensuring all required fields are in the schemas you
specify. This is a design choice that we have made to ensure that data records
are self-describing. To make this easier, you can use
make_input_schema()
and
make_output_schema()
, which add the
required fields to another schema that you provide and then populate a full
DataSchema
instance using the result. For
example, if your service outputs both a piece of text and a classification
label, you can create the spec for its output schema like this:
from dyff.schema import product_schema
from dyff.schema.dataset import arrow, classification, text
from dyff.schema.platform import DataSchema
item_schema = product_schema([text.Text, classification.Label])
output_schema = DataSchema.make_output_schema(item_schema)
print(arrow.decode_schema(output_schema.arrowSchema))
_index_: int64
-- field metadata --
__doc__: 'The index of the item in the dataset'
_replication_: string
-- field metadata --
__doc__: 'ID of the replication the item belongs to.'
responses: list<item: struct<_response_index_: int64, text: string, label: string>>
child 0, item: struct<_response_index_: int64, text: string, label: string>
child 0, _response_index_: int64
-- field metadata --
__doc__: 'The index of the response among responses to the correspo' + 13
child 1, text: string
-- field metadata --
__doc__: 'Text data'
child 2, label: string
-- field metadata --
__doc__: 'The discrete label of the item'
-- field metadata --
__doc__: 'Inference responses'
Notice how the generated Arrow schema includes the required fields _index_
,
_replication_
, and responses
, and the items in responses
include a
_response_index_
field. Note that you can provide any type that derives from
DyffSchemaBaseModel
as the argument to
make_input_schema()
and
make_output_schema()
, which is useful
if you need to include data that doesn’t fit any of the pre-defined schemas.
Native data formats and Schema Adapters¶
Dyff is designed to evaluate AI systems in their “deployable” form, meaning that the code being evaluated is the same as the code that will actually be used in a product, or at least as close as possible. So, Dyff allows model inputs and outputs to have essentially any JSON-like structure. Because there is no accepted standard for AI system APIs, models created by different organizations tend to have incompatible interfaces.
To bridge this gap, it is often necessary to convert data from one schema to
another, often multiple times in the course of running the full audit workflow.
The code that performs these transformations is part of the test
specification; we are not evaluating the model on dataset D
, we are
evaluating it on dataset D
with transformation T
applied. So, the
necessary adapters must be specified as part of the corresponding resource, so
that the whole combination of components and adapters has a unique ID and we can
reproduce the entire end-to-end pipeline.
Usually, you do not need to alter your existing datasets and models to
conform to Dyff’s schemas. Instead, you will specify schema adapters
to transform the inputs and outputs, and the Dyff
Platform will store those adapter specifications in the uniquely-identified
specification of the inference service. All of these adapters take their
configuration as a JSON object, so that schema adapter pipelines can be
specified easily as part of a resource specification.
In many cases these transformations are fairly simple. For example, the model
might expect the input field to be called "prompt"
instead of "text"
, so
the input adapter just has to re-name that field. The
TransformJSON
adapter is useful for this purpose.
This adapter can also be used to add literal fields, which is useful when the
model takes additional arguments that modify its behavior (such as sampling
parameters). This way, the inference service spec also fully specifies which
non-default model parameter settings that particular instantiation of the
service uses, which makes uses of the inference service fully reproducible.
For the output schemas, it is often necessary to transform a “column-oriented” schema to a “row-oriented” schema. For example, the service might return responses like:
{"text": ["response A", "response B"]}
that need to be transformed to:
[{"text": "response A"}, {"text": "response B"}]
The ExplodeCollections
adapter is useful for this
purpose. This transformation can convert a list to a set of rows while
optionally also adding one or more index fields that can be used to sort the
responses to an input in different ways. The
FlattenHierarchy
adapter can be used to flatten
nested structures into top-level fields.