Datasets

DataRec includes several commonly used recommendation datasets to facilitate reproducibility and standardization. These datasets have been carefully curated, with traceable sources and versioning information maintained whenever possible. For each dataset, DataRec provides metadata such as the number of users, items, and interactions and data characteristics known to impact recommendation performance (e.g., sparsity and user/item distribution shifts). The dataset collection in DataRec is continuously updated to include more recent and widely used datasets from the recommendation systems literature. The most recent and widely used version is included when the original data source is unavailable to ensure backward compatibility.

The following datasets are currently included in DataRec:

DatasetVersionDataset Page
Alibaba iFashionv1page
Amazon Baby2023page
Amazon Beauty2023page
Amazon Books2023page
Amazon Clothing2023page
Amazon Music2023page
Amazon Office2023page
Amazon Sports and Outdoors2023page
Amazon Toys and Games2023page
Amazon Video Games2023page
Ambar2024page
CiaoDVDv1page
CiteULikeapage
tpage
Epinionsv1page
Gowallacheckinspage
friendshipspage
LastFM2011page
Mindlargepage
smallpage
Movielens100kpage
1mpage
20mpage
Tmallv1page
Yelpv1page