Schema adapters¶

class dyff.schema.adapters.Adapter(*args, **kwargs)¶

Bases: Protocol

Transforms streams of JSON structures.

class dyff.schema.adapters.Drop(configuration: dict)¶

Bases: object

Drop named top-level fields.

The configuration is a dictionary:

{
    "fields": list[str]
}

class dyff.schema.adapters.ExplodeCollections(configuration: dict)¶

Bases: object

Explodes one or more top-level lists of the same length into multiple records, where each record contains the corresponding value from each list. This is useful for turning nested-list representations into “relational” representations where the lists are converted to multiple rows with a unique index.

The configuration argument is a dictionary:

{
    "collections": list[str],
    "index": dict[str, str | None]
}

For example, if the input data is:

[
    {"numbers": [1, 2, 3], "squares": [1, 4, 9], "scalar": "foo"},
    {"numbers": [4, 5], "squares": [16, 25], "scalar": bar"}
]

Then ExplodeCollections({"collections": ["numbers", "squares"]}) will yield this output data:

[
    {"numbers": 1, "squares": 1, "scalar": "foo"},
    {"numbers": 2, "squares": 4, "scalar": "foo"},
    {"numbers": 3, "squares": 9, "scalar": "foo"},
    {"numbers": 4, "squares": 16, "scalar": "bar"},
    {"numbers": 5, "squares": 25, "scalar": "bar"},
]

You can also create indexes for the exploded records. Given the following configuration:

{
    "collections": ["choices"],
    "index": {
        "collection/index": None,
        "collection/rank": "$.choices[*].meta.rank"
    }
}

then for the input:

   [
       {
           "choices": [
               {"label": "foo", "meta": {"rank": 1}},
               {"label": "bar", "meta": {"rank": 0}}
           ]
       },
       ...
   ]

the output will be::

   [
       {
           "choices": {"label": "foo", "meta": {"rank": 1}},
           "collection/index": 0,
           "collection/rank": 1
       },
       {
           "choices": {"label": "bar", "meta": {"rank": 0}},
           "collection/index": 1,
           "collection/rank": 0
       },
       ...
   ]

The None value for the "collection/index" index key means that the adapter should assign indices from 0...n-1 automatically. If the value is not None, it must be a JSONPath query to execute against the pre-transformation data that returns a list. Notice how the example uses $.choices[*] to get the list of choices.

class dyff.schema.adapters.FlattenHierarchy(configuration=None)¶

Bases: object

Flatten a JSON object – or the JSON sub-objects in named fields – by creating a new object with a key for each “leaf” value in the input.

The configuration options are:

{
    "fields": list[str],
    "depth": int | None,
    "addPrefix": bool
}

If fields is missing or empty, the flattening is applied to the root object. The depth option is the maximum recursion depth. If addPrefix is True (the default), then the resultint fields will be named like "path.to.leaf" to avoid name conflicts.

For example, if the configuration is:

{
    "fields": ["choices"],
    "depth": 1,
    "addPrefix": True
}

and the input is:

{
    "choices": {"label": "foo", "metadata": {"value": 42}},
    "scores": {"top1": 0.9}
}

then the output will be:

{
    "choices.label": "foo",
    "choices.metadata": {"value": 42},
    "scores": {"top1": 0.9}
}

Note that nested lists are considered “leaf” values, even if they contain objects.

class dyff.schema.adapters.HTTPData(content_type, data)¶

Bases: NamedTuple

content_type: str¶: Alias for field number 0

data: Any¶: Alias for field number 1

class dyff.schema.adapters.Map(configuration: dict)¶

Bases: object

For each input item, map another Adapter over the elements of each of the named nested collections within that item.

The configuration is a dictionary:

{
    "collections": list[str],
    "adapter": {
        "kind": <AdapterType>
        "configuration": <AdapterConfigurationDictionary>
    }
}

class dyff.schema.adapters.Pipeline(adapters: list[Adapter])¶

Bases: object

Apply multiple adapters in sequence.

class dyff.schema.adapters.Rename(configuration: dict)¶

Bases: object

Rename top-level fields in each JSON object.

The input is a dictionary {old_name: new_name}.

class dyff.schema.adapters.Select(configuration: dict)¶

Bases: object

Select named top-level fields and drop the others.

The configuration is a dictionary:

{
    "fields": list[str]
}

class dyff.schema.adapters.TransformJSON(configuration: dict)¶

Bases: object

Create a new JSON structure where the “leaf” values are populated by the results of transformation functions applied to the input.

The “value” for each leaf can be:

A JSON literal value, or
The result of a jsonpath query on the input structure, or
The result of a computation pipeline starting from (1) or (2).

To distinguish the specifications of leaf values from the specification of the output structure, we apply the following rules:

1. Composite values (``list`` and ``dict``) specify the structure of
the output.
2. Scalar values are output as-is, unless they are strings containing
JSONPath queries.
3. JSONPath queries are strings beginning with '$'. They are replaced
by the result of the query.
4. A ``dict`` containing the special key ``"$compute"`` introduces a
"compute context", which computes a leaf value from the input data.
Descendents of this key have "compute context semantics", which are
different from the "normal" semantics.

For example, if the configuration is:

{
    "id": "$.object.id",
    "name": "literal",
    "children": {"left": "$.list[0]", "right": "$.list[1]"}
    "characters": {
        "letters": {
            "$compute": [
                {"$scalar": "$.object.id"},
                {
                    "$func": "sub",
                    "pattern": "[A-Za-z]",
                    "repl": "",
                },
                {"$func": "list"}
            ]
        }
    }
}

and the data is:

{
    "object": {"id": "abc123", "name": "spam"},
    "list": [1, 2]
}

Then applying the transformation to the data will result in the new structure:

{
    "id": "abc123",
    "name": "literal",
    "children: {"left": 1, "right": 2},
    "characters": {
        "letters": ["a", "b", "c"]
    }
}

The .characters.letters field was derived by:

Extracting the value of the ``.object.id`` field in the input
Applying ``re.sub(r"[A-Za-z]", "", _)`` to the result of (1)
Applying ``list(_)`` to the result of (2)

Notice that descendents of the $compute key no longer describe the structure of the output, but instead describe steps of the computation. The value of "$compute" can be either an object or a list of objects. A list is interpreted as a “pipeline” where each step is applied to the output of the previous step.

Implicit queries¶

Outside of the $compute context, string values that start with a $ character are interpreted as jsonpath queries. Queries in this context must return exactly one value, otherwise a ValueError will be raised. This is because when multiple values are returned, there’s no way to distinguish a scalar-valued query that found 1 scalar from a list-valued query that found a list with 1 element. In the $compute context, you can specify which semantics you want.

If you need a literal string that starts with the ‘$’ character, escape it with a second ‘$’, e.g., “$$PATH” will appear as the literal string “$PATH” in the output. This works for both keys and values, e.g., {"$$key": "$$value"} outputs {"$key": "$value"}. All keys that begin with $ are reserved, and you must always escape them.

The $compute context¶

A $compute context is introduced by a dict that contains the key {"$compute": ...}. Semantics in the $compute context are different from semantics in the “normal” context.

$literal vs. $scalar vs. $list¶

Inside a $compute context, we distinguish explicitly between literal values, jsonpath queries that return scalars, and jsonpath queries that return lists. You specify which semantics you intend by using {"$literal": [1, 2]}, {"$scalar": "$.foo"}, or {"$list": $.foo[*]}. Items with $literal semantics are never interpreted as jsonpath queries, even if they start with $. In the $literal context, you should not escape the leading $ character.

A $scalar query has the same semantiics as a jsonpath query outside of the $compute context, i.e., it must return exactly 1 item. A $list query will return a list, which can be empty. Scalar-valued queries in a $list context will return a list with 1 element.

$func¶

You use blocks with a $func key to specify computation steps. The available functions are: findall, join, list, reduce, search, split, sub. These behave the same way as the corresponding functions from the Python standard library:

* ``findall``, ``search``, ``split``, and ``sub`` are from the
``re`` module.
* ``reduce`` uses the ``+`` operator with no starting value; it will
raise an error if called on an empty list.

All of these functions take named parameters with the same names and semantics as their parameters in Python.

dyff.schema.adapters.create_adapter(adapter_spec: SchemaAdapter | dict) → Adapter¶

dyff.schema.adapters.create_pipeline(adapter_specs: Iterable[SchemaAdapter | dict]) → Pipeline¶

dyff.schema.adapters.flatten_object(obj: dict, *, max_depth: int | None = None, add_prefix: bool = True) → dict¶

Flatten a JSON object the by creating a new object with a key for each “leaf” value in the input. If add_prefix is True, the key will be equal to the “path” string of the leaf, i.e., “obj.field.subfield”; otherwise, it will be just “subfield”.

Nested lists are considered “leaf” values, even if they contain objects.

dyff.schema.adapters.known_adapters() → dict[str, Type[Adapter]]¶

dyff.schema.adapters.map_structure(fn, data)¶: Given a JSON data structure data, create a new data structure instance with the same shape as data by applying fn to each “leaf” value in the nested data structure.