


default search action
Journal of Data-centric Machine Learning Research, Volume 1
Volume 1, 2024
- Will Orr, Kate Crawford:

Building Better Datasets: Seven Recommendations for Responsible Design from Dataset Creators. (1):1-21 - Jielin Qiu, Yi Zhu, Xingjian Shi, Florian Wenzel, Zhiqiang Tang, Ding Zhao, Bo Li, Mu Li:

Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift. (2):1-56 - Yoshitomo Matsubara, Naoya Chiba, Ryo Igarashi, Yoshitaka Ushiku:

Rethinking Symbolic Regression Datasets and Benchmarks for Scientific Discovery. (3):1-38 - Sasha Luccioni, Kate Crawford:

The Nine Lives of ImageNet: A Sociotechnical Retrospective of a Foundation Dataset and the Limits of Automated Essentialism. (4):1-18 - Luis Oala, Manil Maskey, Lilith Bat-Leah, Alicia Parrish, Nezihe Merve Gürel, Tzu-Sheng Kuo, Yang Liu, Rotem Dror, Danilo Brajovic, Xiaozhe Yao, Max Bartolo, William Gaviria Rojas, Ryan Hileman, Rainier Aliment, Michael W. Mahoney, Meg Risdal, Matthew Lease, Wojciech Samek, Debojyoti Dutta, Curtis G. Northcutt, Cody Coleman, Braden Hancock, Bernard Koch, Girmaw Abebe Tadesse, Bojan Karlas, Ahmed Alaa, Adji Bousso Dieng, Natasha F. Noy, Vijay Janapa Reddi, James Zou, Praveen K. Paritosh, Mihaela van der Schaar, Kurt Bollacker, Lora Aroyo, Ce Zhang, Joaquin Vanschoren, Isabelle Guyon, Peter Mattson:

DMLR: Data-centric Machine Learning Research - Past, Present and Future. (5):1-27 - Hang Zhou, Jonas Mueller, Mayank Kumar, Jane-Ling Wang, Jing Lei:

Detecting Errors in a Numerical Response via any Regression Model. (6):1-25 - Jifan Zhang, Yifang Chen, Gregory Canal, Arnav Mohanty Das, Gantavya Bhatt, Stephen Mussmann, Yinglun Zhu, Jeff A. Bilmes, Simon Shaolei Du, Kevin Jamieson, Robert D. Nowak:

LabelBench: A Comprehensive Framework for Benchmarking Adaptive Label-Efficient Learning. (7):1-43 - Saeid Shamsaliei, Odd Erik Gundersen, Knut Tore Alfredsen, Jo Halvard Halleraker, Anders Foldvik:

Highlighting Challenges of State-of-the-Art Semantic Segmentation with HAIR - A Dataset of Historical Aerial Images. (8):1-31 - John Park, Riccardo de Lutio, Brendan Rappazzo, Barbara Ambrose, Fabian Michelangeli, Kimberly Watson, Serge J. Belongie, Damon Little:

NAFlora-1M: Continental-Scale High-Resolution Fine-Grained Plant Classification Dataset. (9):1-21 - Nabeel Seedat, Nicolas Huynh, Fergus Imrie, Mihaela van der Schaar:

You can't handle the (dirty) truth: Data-centric Insights Improve Pseudo-Labeling. (10):1-21 - Zizhang Chen, Ryan Paul Badman, Bethany Lachele Foley, Robert J Woods, Pengyu Hong:

GlycoNMR: Dataset and Benchmark of Carbohydrate-Specific NMR Chemical Shift for Machine Learning Research. (11):1-37 - Zuxin Liu, Zijian Guo, Haohong Lin, Yihang Yao, Jiacheng Zhu, Zhepeng Cen, Hanjiang Hu, Wenhao Yu, Tingnan Zhang, Jie Tan, Ding Zhao:

Datasets and Benchmarks for Offline Safe Reinforcement Learning. (12):1-29 - Soumadeep Saha, Saptarshi Saha, Utpal Garain:

VALUED - Vision and Logical Understanding Evaluation Dataset. (13):1-18 - Paolo Climaco, Jochen Garcke:

On Minimizing the Training Set Fill Distance in Machine Learning Regression. (14):1-36 - Samiul Alam, Tuo Zhang, Tiantian Feng, Hui Shen, Zhichao Cao, Dong Zhao, Jeonggil Ko, Kiran K. Somasundaram, Shrikanth Narayanan, Salman Avestimehr, Mi Zhang:

FedAIoT: A Federated Learning Benchmark for Artificial Intelligence of Things. (15):1-23 - Yvenn Amara-Ouali, Yannig Goude, Nathan Doumèche, Pascal Veyret, Alexis Thomas, Daniel Hebenstreit, Thomas Wedenig, Arthur Satouf, Aymeric Jan, Yannick Deleuze, Paul Berhaut, Sébastien Treguer:

Forecasting Electric Vehicle Charging Station Occupancy: Smarter Mobility Data Challenge. (16):1-27 - Andrea Pugnana, Lorenzo Perini, Jesse Davis, Salvatore Ruggieri:

Deep Neural Network Benchmarks for Selective Classification. (17):1-58 - Stefan Schoepf, Jack Foster, Alexandra Brintrup:

Potion: Towards Poison Unlearning. (18):1-31 - Aiden Grossman, Ludger Paehler, Konstantinos Parasyris, Tal Ben-Nun, Jacob Hegna, William S. Moses, Jose Manuel Monsalve Diaz, Mircea Trofin, Johannes Doerfert:

ComPile: A Large IR Dataset from Production Sources. (19):1-33 - Muberra Ozmen, Florence Regol, Thomas Markovich:

Benchmarking Edge Regression on Temporal Networks. (20):1-28 - Hao Chen, Bhiksha Raj, Xing Xie, Jindong Wang:

On Catastrophic Inheritance of Large Foundation Models. (21):1-33 - Hao Sun, Alex James Chan, Nabeel Seedat, Alihan Hüyük, Mihaela van der Schaar:

When is Off-Policy Evaluation (Reward Modeling) Useful in Contextual Bandits? A Data-Centric Perspective. (22):1-36

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














