Datasets¶
Create and Upload a Dataset¶
To use an Arrow dataset as a new Dyff Dataset requires two steps.
First, you create a Dataset
record. Dyff assigns a unique ID to the dataset
and returns a full Dataset
object.
Second, you upload the actual data, providing the Dataset
object so that
Dyff knows where to store the data. The hashes of the dataset files must match
the hashes calculated in the create
step.
Setup¶
Create an API client as described in the Python client guide:
from dyff.client import Client
client = Client(api_key="XXXXXX")
from dyff.audit.local import DyffLocalPlatform
client = DyffLocalPlatform()
dataset = client.datasets.create_arrow_dataset(
ARROW_DATASET_ROOT_DIRECTORY, account=ACCOUNT, name=DATASET_NAME
)
print(f"created dataset:\n{dataset}")
client.datasets.upload_arrow_dataset(dataset, ARROW_DATASET_ROOT_DIRECTORY)
If you created the dataset but couldn’t complete the upload, you can
fetch the dataset record and re-try the upload:
dataset = client.datasets.get(<dataset.id>)