Skip to content

Mind

Overview

Dataset name: Mind
Latest version: large
Available versions: small, large
Source: https://msnews.github.io/

MIcrosoft News Dataset (MIND) is a large-scale dataset for news recommendation research.


Citation

@inproceedings{DBLP:conf/acl/WuQCWQLLXGWZ20,
  author       = {Fangzhao Wu and
                  Ying Qiao and
                  Jiun{-}Hung Chen and
                  Chuhan Wu and
                  Tao Qi and
                  Jianxun Lian and
                  Danyang Liu and
                  Xing Xie and
                  Jianfeng Gao and
                  Winnie Wu and
                  Ming Zhou},
  title        = {{MIND:} {A} Large-scale Dataset for News Recommendation},
  booktitle    = {{ACL}},
  pages        = {3597--3606},
  publisher    = {Association for Computational Linguistics},
  year         = {2020}
}

Version: small

Data Sources

Name Source type Archive URL Checksum
train_archive ManualSource zip md5:bd6ae77fa15949653f39829e946d327c
validation_archive ManualSource zip md5:1c9e798fe440c1999547211cd5245e3e

Resources

train

  • Type: interactions
  • Format: sequence_tabular_inline
  • Required: yes
  • Source: train_archive
  • Filename: behaviors.tsv

Schema

user_col: user
sequence_col: sequence
timestamp_col: time
cols:
- impression_id
- user
- time
- sequence
- impressions
col_sep: "\t"
sequence_sep: ' '

validation

  • Type: interactions
  • Format: sequence_tabular_inline
  • Source: validation_archive
  • Filename: behaviors.tsv

Schema

user_col: user
sequence_col: sequence
timestamp_col: time
cols:
- impression_id
- user
- time
- sequence
- impressions
col_sep: "\t"
sequence_sep: ' '

Dataset Characteristics

Computed at: 2025-12-16

Metric Value
n_users 49108
n_items 33195
n_interactions 5107630
space_size 40.37499300309537
space_size_log 1.6061124600768104
shape 1.4793794246121403
shape_log 0.17007957418774253
density 0.00313324610892637
density_log -2.5040054909941873
gini_item 0.8403493747706466
gini_user 0.7792197590761297
ratings_per_user 104.00810458581087
ratings_per_item 153.8674499171562

License & Usage

Please refer to the official dataset page for licensing and usage restrictions. https://msnews.github.io/