Schema adapters¶
- class dyff.schema.adapters.Adapter(*args, **kwargs)¶
Bases:
Protocol
Transforms streams of JSON structures.
- class dyff.schema.adapters.Drop(configuration: dict)¶
Bases:
object
Drop named top-level fields.
The configuration is a dictionary:
{ "fields": list[str] }
- class dyff.schema.adapters.ExplodeCollections(configuration: dict)¶
Bases:
object
Explodes one or more top-level lists of the same length into multiple records, where each record contains the corresponding value from each list. This is useful for turning nested-list representations into “relational” representations where the lists are converted to multiple rows with a unique index.
The
configuration
argument is a dictionary:{ "collections": list[str], "index": dict[str, str | None] }
For example, if the input data is:
[ {"numbers": [1, 2, 3], "squares": [1, 4, 9], "scalar": "foo"}, {"numbers": [4, 5], "squares": [16, 25], "scalar": bar"} ]
Then
ExplodeCollections({"collections": ["numbers", "squares"]})
will yield this output data:[ {"numbers": 1, "squares": 1, "scalar": "foo"}, {"numbers": 2, "squares": 4, "scalar": "foo"}, {"numbers": 3, "squares": 9, "scalar": "foo"}, {"numbers": 4, "squares": 16, "scalar": "bar"}, {"numbers": 5, "squares": 25, "scalar": "bar"}, ]
You can also create indexes for the exploded records. Given the following configuration:
{ "collections": ["choices"], "index": { "collection/index": None, "collection/rank": "$.choices[*].meta.rank" } }
then for the input:
[ { "choices": [ {"label": "foo", "meta": {"rank": 1}}, {"label": "bar", "meta": {"rank": 0}} ] }, ... ] the output will be:: [ { "choices": {"label": "foo", "meta": {"rank": 1}}, "collection/index": 0, "collection/rank": 1 }, { "choices": {"label": "bar", "meta": {"rank": 0}}, "collection/index": 1, "collection/rank": 0 }, ... ]
The
None
value for the"collection/index"
index key means that the adapter should assign indices from0...n-1
automatically. If the value is notNone
, it must be a JSONPath query to execute against the pre-transformation data that returns a list. Notice how the example uses$.choices[*]
to get the list of choices.
- class dyff.schema.adapters.FlattenHierarchy(configuration=None)¶
Bases:
object
Flatten a JSON object – or the JSON sub-objects in named fields – by creating a new object with a key for each “leaf” value in the input.
The
configuration
options are:{ "fields": list[str], "depth": int | None, "addPrefix": bool }
If
fields
is missing or empty, the flattening is applied to the root object. Thedepth
option is the maximum recursion depth. IfaddPrefix
is True (the default), then the resultint fields will be named like"path.to.leaf"
to avoid name conflicts.For example, if the configuration is:
{ "fields": ["choices"], "depth": 1, "addPrefix": True }
and the input is:
{ "choices": {"label": "foo", "metadata": {"value": 42}}, "scores": {"top1": 0.9} }
then the output will be:
{ "choices.label": "foo", "choices.metadata": {"value": 42}, "scores": {"top1": 0.9} }
Note that nested lists are considered “leaf” values, even if they contain objects.
- class dyff.schema.adapters.HTTPData(content_type, data)¶
Bases:
NamedTuple
- content_type: str¶
Alias for field number 0
- data: Any¶
Alias for field number 1
- class dyff.schema.adapters.Map(configuration: dict)¶
Bases:
object
For each input item, map another Adapter over the elements of each of the named nested collections within that item.
The configuration is a dictionary:
{ "collections": list[str], "adapter": { "kind": <AdapterType> "configuration": <AdapterConfigurationDictionary> } }
- class dyff.schema.adapters.Pipeline(adapters: list[Adapter])¶
Bases:
object
Apply multiple adapters in sequence.
- class dyff.schema.adapters.Rename(configuration: dict)¶
Bases:
object
Rename top-level fields in each JSON object.
The input is a dictionary
{old_name: new_name}
.
- class dyff.schema.adapters.Select(configuration: dict)¶
Bases:
object
Select named top-level fields and drop the others.
The configuration is a dictionary:
{ "fields": list[str] }
- class dyff.schema.adapters.TransformJSON(configuration: dict)¶
Bases:
object
Create a new JSON structure where the “leaf” values are populated by the results of transformation functions applied to the input.
The “value” for each leaf can be:
1. A JSON literal value, or 2. The result of a jsonpath query on the input structure, or 3. The result of a computation pipeline starting from (1) or (2).
To distinguish the specifications of leaf values from the specification of the output structure, we apply the following rules:
1. Composite values (``list`` and ``dict``) specify the structure of the output. 2. Scalar values are output as-is, unless they are strings containing JSONPath queries. 3. JSONPath queries are strings beginning with '$'. They are replaced by the result of the query. 4. A ``dict`` containing the special key ``"$compute"`` introduces a "compute context", which computes a leaf value from the input data. Descendents of this key have "compute context semantics", which are different from the "normal" semantics.
For example, if the
configuration
is:{ "id": "$.object.id", "name": "literal", "children": {"left": "$.list[0]", "right": "$.list[1]"} "characters": { "letters": { "$compute": [ {"$scalar": "$.object.id"}, { "$func": "sub", "pattern": "[A-Za-z]", "repl": "", }, {"$func": "list"} ] } } }
and the data is:
{ "object": {"id": "abc123", "name": "spam"}, "list": [1, 2] }
Then applying the transformation to the data will result in the new structure:
{ "id": "abc123", "name": "literal", "children: {"left": 1, "right": 2}, "characters": { "letters": ["a", "b", "c"] } }
The
.characters.letters
field was derived by:1. Extracting the value of the ``.object.id`` field in the input 2. Applying ``re.sub(r"[A-Za-z]", "", _)`` to the result of (1) 3. Applying ``list(_)`` to the result of (2)
Notice that descendents of the
$compute
key no longer describe the structure of the output, but instead describe steps of the computation. The value of"$compute"
can be either an object or a list of objects. A list is interpreted as a “pipeline” where each step is applied to the output of the previous step.Implicit queries¶
Outside of the
$compute
context, string values that start with a$
character are interpreted as jsonpath queries. Queries in this context must return exactly one value, otherwise aValueError
will be raised. This is because when multiple values are returned, there’s no way to distinguish a scalar-valued query that found 1 scalar from a list-valued query that found a list with 1 element. In the$compute
context, you can specify which semantics you want.If you need a literal string that starts with the ‘$’ character, escape it with a second ‘$’, e.g., “$$PATH” will appear as the literal string “$PATH” in the output. This works for both keys and values, e.g.,
{"$$key": "$$value"}
outputs{"$key": "$value"}
. All keys that begin with$
are reserved, and you must always escape them.The $compute context¶
A
$compute
context is introduced by adict
that contains the key{"$compute": ...}
. Semantics in the$compute
context are different from semantics in the “normal” context.$literal vs. $scalar vs. $list¶
Inside a
$compute
context, we distinguish explicitly between literal values, jsonpath queries that return scalars, and jsonpath queries that return lists. You specify which semantics you intend by using{"$literal": [1, 2]}
,{"$scalar": "$.foo"}
, or{"$list": $.foo[*]}
. Items with$literal
semantics are never interpreted as jsonpath queries, even if they start with$
. In the$literal
context, you should not escape the leading$
character.A
$scalar
query has the same semantiics as a jsonpath query outside of the$compute
context, i.e., it must return exactly 1 item. A$list
query will return a list, which can be empty. Scalar-valued queries in a$list
context will return a list with 1 element.$func¶
You use blocks with a
$func
key to specify computation steps. The available functions are:findall
,join
,list
,reduce
,search
,split
,sub
. These behave the same way as the corresponding functions from the Python standard library:* ``findall``, ``search``, ``split``, and ``sub`` are from the ``re`` module. * ``reduce`` uses the ``+`` operator with no starting value; it will raise an error if called on an empty list.
All of these functions take named parameters with the same names and semantics as their parameters in Python.
- dyff.schema.adapters.create_adapter(adapter_spec: SchemaAdapter | dict) Adapter ¶
- dyff.schema.adapters.create_pipeline(adapter_specs: Iterable[SchemaAdapter | dict]) Pipeline ¶
- dyff.schema.adapters.flatten_object(obj: dict, *, max_depth: int | None = None, add_prefix: bool = True) dict ¶
Flatten a JSON object the by creating a new object with a key for each “leaf” value in the input. If
add_prefix
is True, the key will be equal to the “path” string of the leaf, i.e., “obj.field.subfield”; otherwise, it will be just “subfield”.Nested lists are considered “leaf” values, even if they contain objects.
- dyff.schema.adapters.map_structure(fn, data)¶
Given a JSON data structure
data
, create a new data structure instance with the same shape asdata
by applyingfn
to each “leaf” value in the nested data structure.