Data schemas¶
Dyff requires that formal schemas are defined for all data involved in the audit process. This includes inputs to the model and additional covariates associated with the inputs, the raw inference outputs from the model, and the post-processed and “scored” model outputs that are ready for public consumption. Formal schemas ensure forward-compatibility of existing resources in the platform with new datasets, models, and analysis methods.
The input data schemas are associated with
Dataset
resources. Model output schemas are
associated with InferenceService
resources and,
by extension, resources such as InferenceSession
and Evaluation
that consume inference services.
Schemas for the post-processed outputs are associated with
Method
resources and their products (Measurement
and SafetyCase
resources).
Native data formats and Schema Adapters¶
Dyff is designed to evaluate AI systems in their “deployable” form, meaning that the code being evaluated is the same as the code that will actually be used in a product, or at least as close as possible. So, Dyff allows model inputs and outputs to have essentially any JSON-like structure. Because there is no accepted standard for AI system APIs, models created by different organizations tend to have incompatible interfaces.
To bridge this gap, it is often necessary to convert data from one schema to
another, often multiple times in the course of running the full audit workflow.
The code that performs these transformations is part of the test
specification; we are not evaluating the model on dataset D
, we are
evaluating it on dataset D
with transformation T
applied. So, the
necessary adapters must be specified as part of the corresponding resource, so
that the whole combination of components and adapters has a unique ID and we can
reproduce the entire end-to-end pipeline.
Usually, you do not need to alter your existing datasets and models to
conform to Dyff’s schemas. Instead, you will specify schema adapters
to transform the inputs and outputs, and the Dyff
Platform will store those adapter specifications in the uniquely-identified
specification of the inference service. All of these adapters take their
configuration as a JSON object, so that schema adapter pipelines can be
specified easily as part of a resource specification.
Schema conventions¶
Our guiding principles for data schema design are:
Schemas should be composable
Flat is better than nested
Schemas should describe semantics as well as type
This results in data that is easy to store and manipulate with common tools like Arrow and Pandas, and that is easy to reuse because it is clear what the data represents.
Semantic field names¶
To achieve composability, we define standard names for top-level fields with
associated task semantics. For example, for the task of “text generation”, we
expect that both the inputs and outputs of the model contain a field called
"text"
, which, by convention, contains text. If the task is “text
classification”, the output would instead have a field called "label"
.
For a limited number of ubiquitous kinds of fields with very general semantics,
we use non-namespaced “reserved” names like “text” and “label”. To allow
extensibility, we define more task-specific fields within namespaces. For
example, the output for a text tagging task might use the field
text.dyff.io/taggedspan
. Field names prefixed with dyff.io/
or
subdomain.dyff.io/
are reserved for Dyff.
Input and output schemas¶
Inference services take a single object input and return a list of objects. Making the output a list accommodates the common practice of returning multiple possible answers, often in descending order of preference.
If the input schema of an inference service is specified by name as
text.Text
, then the service must accept
JSON requests that look like:
{"text": "It was the best of times, "}
If the output schema is also specified by name as text.Text
, then the service must return JSON responses
that look like:
[{"text": "it was the worst of times"}, {"text": "it was the blurst of times"}]
In both cases, we would express that expectation by specifying that both input
and output must conform to the named schema text.Text
.
How schemas are specified¶
Dyff uses pydantic models to formalize all of its data schemas. From these pydantic schemas, we generate Arrow schemas for data in persistent storage, and JSON schemas for the specification of remote procedure call (RPC) interfaces. These three schema types are inter-convertible if we avoid a few specific features that don’t exist in all three.
When schemas are required in Dyff APIs, they are specified with the
DataSchema
type, which contains fields for all
three kinds of schema. The full DataSchema
can be
populated from just a pydantic model type.
You can specify your own schema with a mix of named schemas defined by the platform and new pydantic model types that you define yourself. The top-level schema is the product of a list of component schemas. Here, product type just means a schema that contains all the fields in all of its components. This is why top-level field names must be unique, so that creating a product schema doesn’t result in name collisions.
Required fields¶
Field names that begin and end with an underscore (e.g., _index_
) and field
names prefixed with dyff.io
or a sub-domain thereof (e.g.,
subdomain.dyff.io/fieldname
) are reserved for use by Dyff.
When you specify input and output schemas for resources like
InferenceService
, the following special fields
are mandatory in the input and/or output schema, as noted:
_index_ : int64
[Required in input and output]The
_index_
field uniquely identifies a single input item within its containing dataset. Every input item must have an_index_
field that is unique within its dataset. Every output item has an_index_
field that matches it to the corresponding input item._replication_ : string
[Required in output]The
_replication_
field identifies which replication of an evaluation the output item belongs to. It is a UUIDv5 identifier, where the “namespace” is the ID of the evaluation resource and the “name” is the sequential integer index of the replication (i.e.,0
,1
, …).responses : list(struct(response type))
[Required in output]The
responses
field contains the list of responses from the inference service for a corresponding input. It is always a list, even if the service only returns a single response.The elements of the
responses
list must contain the following fields:
_response_index_ : int64
[Required in output]The
_response_index_
field uniquely identifies each response to a given input item. Theresponses
list may contain more than one item with the same_response_index_
value. For example, text span tagging tasks like Named Entity Recognition may output zero or more predicted entities for a single input text.
Putting this all together, we can see that each combination of (_index_,
_replication_, _response_index_)
identifies one set of system inferences for
the single input item identified by _index_
.
You are responsible for ensuring all required fields are in the schemas you
specify. This is a design choice that we have made to ensure that data records
are self-describing. To make this easier, you can use
make_input_schema()
and
make_output_schema()
, which add the
required fields to another schema that you provide and then populate a full
DataSchema
instance using the result. For
example, if your service outputs both a piece of text and a classification
label, you can create the spec for its output schema like this:
from dyff.schema.dataset import arrow
from dyff.schema.platform import DataSchema, DyffDataSchema
dyff_schema = DyffDataSchema(components=["classification.Label", "text.Text"])
full_schema = DataSchema.make_output_schema(dyff_schema)
print(arrow.decode_schema(full_schema.arrowSchema))
_index_: int64
-- field metadata --
__doc__: 'The index of the item in the dataset'
_replication_: string
-- field metadata --
__doc__: 'ID of the replication the item belongs to.'
responses: list<item: struct<_response_index_: int64, text: string, label: string>>
child 0, item: struct<_response_index_: int64, text: string, label: string>
child 0, _response_index_: int64
-- field metadata --
__doc__: 'The index of the response among responses to the correspo' + 13
child 1, text: string
-- field metadata --
__doc__: 'Text data'
child 2, label: string
-- field metadata --
__doc__: 'The discrete label of the item'
-- field metadata --
__doc__: 'Inference responses'
Notice how the generated Arrow schema includes the required fields _index_
,
_replication_
, and responses
, and the items in responses
include a
_response_index_
field. You can also provide a type that derives from
DyffSchemaBaseModel
as the argument to
make_input_schema()
and
make_output_schema()
, which is useful
if you need to include data that doesn’t fit any of the named schemas.
InferenceService schema adapters¶
An inference service is the “runnable” form of a model that exposes an HTTP
API for making inference requests. An inference service is a “view” of a
model that allows it to perform a single task; there may be any number of
inference services backed by the same model. When creating an
InferenceService
resource, you must specify how
to convert the input data from the well-known task input schema to whatever
format the underlying model requires, and how to convert the model’s output to
the well-known task output schema.
In many cases these transformations are fairly simple. For example, the model
might expect the input field to be called "prompt"
instead of "text"
, so
the input adapter just has to re-name that field. The
TransformJSON
adapter is useful for this purpose.
This adapter can also be used to add literal fields, which is useful when the
model takes additional arguments that modify its behavior (such as sampling
parameters). This way, the inference service spec also fully specifies which
non-default model parameter settings that particular instantiation of the
service uses, which makes uses of the inference service fully reproducible.
For the output schemas, it is often necessary to transform a “column-oriented” schema to a “row-oriented” schema. For example, the service might return responses like:
{"text": ["response A", "response B"]}
that need to be transformed to:
[{"text": "response A"}, {"text": "response B"}]
The ExplodeCollections
adapter is useful for this
purpose. This transformation can convert a list to a set of rows while
optionally also adding one or more index fields that can be used to sort the
responses to an input in different ways. The
FlattenHierarchy
adapter can be used to flatten
nested structures into top-level fields.