Schema adapters¶
- class dyff.schema.adapters.Adapter(*args, **kwargs)¶
Bases:
ProtocolTransforms streams of JSON structures.
- class dyff.schema.adapters.Drop(configuration: dict)¶
Bases:
objectDrop named top-level fields.
The configuration is a dictionary:
{ "fields": list[str] }
- class dyff.schema.adapters.ExplodeCollections(configuration: dict)¶
Bases:
objectExplodes one or more top-level lists of the same length into multiple records, where each record contains the corresponding value from each list. This is useful for turning nested-list representations into “relational” representations where the lists are converted to multiple rows with a unique index.
The
configurationargument is a dictionary:{ "collections": list[str], "index": dict[str, str | None] }For example, if the input data is:
[ {"numbers": [1, 2, 3], "squares": [1, 4, 9], "scalar": "foo"}, {"numbers": [4, 5], "squares": [16, 25], "scalar": bar"} ]Then
ExplodeCollections({"collections": ["numbers", "squares"]})will yield this output data:[ {"numbers": 1, "squares": 1, "scalar": "foo"}, {"numbers": 2, "squares": 4, "scalar": "foo"}, {"numbers": 3, "squares": 9, "scalar": "foo"}, {"numbers": 4, "squares": 16, "scalar": "bar"}, {"numbers": 5, "squares": 25, "scalar": "bar"}, ]You can also create indexes for the exploded records. Given the following configuration:
{ "collections": ["choices"], "index": { "collection/index": None, "collection/rank": "$.choices[*].meta.rank" } }then for the input:
[ { "choices": [ {"label": "foo", "meta": {"rank": 1}}, {"label": "bar", "meta": {"rank": 0}} ] }, ... ] the output will be:: [ { "choices": {"label": "foo", "meta": {"rank": 1}}, "collection/index": 0, "collection/rank": 1 }, { "choices": {"label": "bar", "meta": {"rank": 0}}, "collection/index": 1, "collection/rank": 0 }, ... ]The
Nonevalue for the"collection/index"index key means that the adapter should assign indices from0...n-1automatically. If the value is notNone, it must be a JSONPath query to execute against the pre-transformation data that returns a list. Notice how the example uses$.choices[*]to get the list of choices.
- class dyff.schema.adapters.FlattenHierarchy(configuration=None)¶
Bases:
objectFlatten a JSON object – or the JSON sub-objects in named fields – by creating a new object with a key for each “leaf” value in the input.
The
configurationoptions are:{ "fields": list[str], "depth": int | None, "addPrefix": bool }If
fieldsis missing or empty, the flattening is applied to the root object. Thedepthoption is the maximum recursion depth. IfaddPrefixis True (the default), then the resultint fields will be named like"path.to.leaf"to avoid name conflicts.For example, if the configuration is:
{ "fields": ["choices"], "depth": 1, "addPrefix": True }and the input is:
{ "choices": {"label": "foo", "metadata": {"value": 42}}, "scores": {"top1": 0.9} }then the output will be:
{ "choices.label": "foo", "choices.metadata": {"value": 42}, "scores": {"top1": 0.9} }Note that nested lists are considered “leaf” values, even if they contain objects.
- class dyff.schema.adapters.HTTPData(content_type, data)¶
Bases:
NamedTuple- content_type: str¶
Alias for field number 0
- data: Any¶
Alias for field number 1
- class dyff.schema.adapters.Map(configuration: dict)¶
Bases:
objectFor each input item, map another Adapter over the elements of each of the named nested collections within that item.
The configuration is a dictionary:
{ "collections": list[str], "adapter": { "kind": <AdapterType> "configuration": <AdapterConfigurationDictionary> } }
- class dyff.schema.adapters.Pipeline(adapters: list[Adapter])¶
Bases:
objectApply multiple adapters in sequence.
- class dyff.schema.adapters.Rename(configuration: dict)¶
Bases:
objectRename top-level fields in each JSON object.
The input is a dictionary
{old_name: new_name}.
- class dyff.schema.adapters.Select(configuration: dict)¶
Bases:
objectSelect named top-level fields and drop the others.
The configuration is a dictionary:
{ "fields": list[str] }
- class dyff.schema.adapters.TransformJSON(configuration: dict)¶
Bases:
objectCreate a new JSON structure where the “leaf” values are populated by the results of transformation functions applied to the input.
The “value” for each leaf can be:
1. A JSON literal value, or 2. The result of a jsonpath query on the input structure, or 3. The result of a computation pipeline starting from (1) or (2).
To distinguish the specifications of leaf values from the specification of the output structure, we apply the following rules:
1. Composite values (``list`` and ``dict``) specify the structure of the output. 2. Scalar values are output as-is, unless they are strings containing JSONPath queries. 3. JSONPath queries are strings beginning with '$'. They are replaced by the result of the query. 4. A ``dict`` containing the special key ``"$compute"`` introduces a "compute context", which computes a leaf value from the input data. Descendents of this key have "compute context semantics", which are different from the "normal" semantics.
For example, if the
configurationis:{ "id": "$.object.id", "name": "literal", "children": {"left": "$.list[0]", "right": "$.list[1]"} "characters": { "letters": { "$compute": [ {"$scalar": "$.object.id"}, { "$func": "sub", "pattern": "[A-Za-z]", "repl": "", }, {"$func": "list"} ] } } }and the data is:
{ "object": {"id": "abc123", "name": "spam"}, "list": [1, 2] }Then applying the transformation to the data will result in the new structure:
{ "id": "abc123", "name": "literal", "children: {"left": 1, "right": 2}, "characters": { "letters": ["a", "b", "c"] } }The
.characters.lettersfield was derived by:1. Extracting the value of the ``.object.id`` field in the input 2. Applying ``re.sub(r"[A-Za-z]", "", _)`` to the result of (1) 3. Applying ``list(_)`` to the result of (2)
Notice that descendents of the
$computekey no longer describe the structure of the output, but instead describe steps of the computation. The value of"$compute"can be either an object or a list of objects. A list is interpreted as a “pipeline” where each step is applied to the output of the previous step.Implicit queries¶
Outside of the
$computecontext, string values that start with a$character are interpreted as jsonpath queries. Queries in this context must return exactly one value, otherwise aValueErrorwill be raised. This is because when multiple values are returned, there’s no way to distinguish a scalar-valued query that found 1 scalar from a list-valued query that found a list with 1 element. In the$computecontext, you can specify which semantics you want.If you need a literal string that starts with the ‘$’ character, escape it with a second ‘$’, e.g., “$$PATH” will appear as the literal string “$PATH” in the output. This works for both keys and values, e.g.,
{"$$key": "$$value"}outputs{"$key": "$value"}. All keys that begin with$are reserved, and you must always escape them.The $compute context¶
A
$computecontext is introduced by adictthat contains the key{"$compute": ...}. Semantics in the$computecontext are different from semantics in the “normal” context.$literal vs. $scalar vs. $list¶
Inside a
$computecontext, we distinguish explicitly between literal values, jsonpath queries that return scalars, and jsonpath queries that return lists. You specify which semantics you intend by using{"$literal": [1, 2]},{"$scalar": "$.foo"}, or{"$list": $.foo[*]}. Items with$literalsemantics are never interpreted as jsonpath queries, even if they start with$. In the$literalcontext, you should not escape the leading$character.A
$scalarquery has the same semantiics as a jsonpath query outside of the$computecontext, i.e., it must return exactly 1 item. A$listquery will return a list, which can be empty. Scalar-valued queries in a$listcontext will return a list with 1 element.$func¶
You use blocks with a
$funckey to specify computation steps. The available functions are:findall,join,list,reduce,search,split,sub. These behave the same way as the corresponding functions from the Python standard library:* ``findall``, ``search``, ``split``, and ``sub`` are from the ``re`` module. * ``reduce`` uses the ``+`` operator with no starting value; it will raise an error if called on an empty list.
All of these functions take named parameters with the same names and semantics as their parameters in Python.
- dyff.schema.adapters.create_adapter(adapter_spec: SchemaAdapter | dict) Adapter¶
- dyff.schema.adapters.create_pipeline(adapter_specs: Iterable[SchemaAdapter | dict]) Pipeline¶
- dyff.schema.adapters.flatten_object(obj: dict, *, max_depth: int | None = None, add_prefix: bool = True) dict¶
Flatten a JSON object the by creating a new object with a key for each “leaf” value in the input. If
add_prefixis True, the key will be equal to the “path” string of the leaf, i.e., “obj.field.subfield”; otherwise, it will be just “subfield”.Nested lists are considered “leaf” values, even if they contain objects.
- dyff.schema.adapters.map_structure(fn, data)¶
Given a JSON data structure
data, create a new data structure instance with the same shape asdataby applyingfnto each “leaf” value in the nested data structure.