Datasets

Create and Upload a Dataset

To use an Arrow dataset as a new Dyff Dataset requires two steps.

First, you create a Dataset record. Dyff assigns a unique ID to the dataset and returns a full Dataset object.

Second, you upload the actual data, providing the Dataset object so that Dyff knows where to store the data. The hashes of the dataset files must match the hashes calculated in the create step.

Setup

Create an API client as described in the Python client guide:

from dyff.client import Client

client = Client(api_key="XXXXXX")
from dyff.audit.local import DyffLocalPlatform

client = DyffLocalPlatform()
dataset = client.datasets.create_arrow_dataset(
    ARROW_DATASET_ROOT_DIRECTORY, account=ACCOUNT, name=DATASET_NAME
)
print(f"created dataset:\n{dataset}")

client.datasets.upload_arrow_dataset(dataset, ARROW_DATASET_ROOT_DIRECTORY)

If you created the dataset but couldn’t complete the upload, you can fetch the dataset record and re-try the upload: dataset = client.datasets.get(<dataset.id>)