Datasets Reference
This section provides a detailed API reference for all modules related to built‑in datasets in the datarec library.
Dataset entry points live in datarec.datasets and connect to registry metadata and versions.
On This Page
Minimal usage
Dataset Entry Points
list_datasets()
list_dataset_versions(name)
latest_dataset_version(name)
load_dataset(name, version='latest', **kwargs)
Instantiate a dataset by registry name, or load a dataset from a remote registry YAML.
If name is a URL, this delegates to load_dataset_from_url.
Source code in datarec/datasets/__init__.py
load_dataset_from_url(url, *, folder=None, prepare_and_load=True)
Load a dataset from a remote registry YAML.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
url
|
str
|
URL to a registry version YAML. |
required |
folder
|
str | None
|
Optional output folder override. |
None
|
prepare_and_load
|
bool
|
When True, returns a loaded DataRec; otherwise returns RegisteredDataset. |
True
|
Returns:
| Type | Description |
|---|---|
|
RegisteredDataset | DataRec: The dataset entrypoint or loaded dataset. |
Source code in datarec/datasets/__init__.py
Registry Utilities
available_datasets()
Return a list of available built-in datasets Returns: List[str]: list of built-in datasets
print_available_datasets()
Prints the list of available built-in datasets Returns: None
compute_dataset_characteristics(dataset_name, version, *, output_dir=REGISTRY_METRICS_FOLDER, use_cache=True, overwrite=False)
Compute and persist characteristics for a specific dataset/version.
Returns:
| Type | Description |
|---|---|
str
|
Path to the written YAML file. |
Source code in datarec/registry/utils.py
get_metrics_filepath(dataset_name, version)
Return the expected registry metrics filepath for a dataset/version.
compute_all_characteristics(output_dir=REGISTRY_METRICS_FOLDER, use_cache=True, overwrite=False)
Compute characteristics for every dataset/version and write YAML files.