


default search action
Journal of Data-centric Machine Learning Research, Volume 2
Volume 2, 2024/2025
- Mariana Pinto, André V. Carreiro, Pedro Madeira, Alberto Lopez, Hugo Gamboa:

The Matrix Reloaded: Towards Counterfactual Group Fairness in Machine Learning. (1):1-55 - Jung Youn Lee, Joonhyuk Yang:

Properties of Alternative Data for Fairer Credit Risk Predictions. (2):1-27 - Jingyang Zhang, Jingkang Yang, Pengyun Wang, Haoqi Wang, Yueqian Lin, Haoran Zhang, Yiyou Sun, Xuefeng Du, Yixuan Li, Ziwei Liu, Yiran Chen, Hai Li:

OpenOOD v1.5: Enhanced Benchmark for Out-of-Distribution Detection. (3):1-32 - Jielin Qiu, William Han, Xuandong Zhao, Shangbang Long, Christos Faloutsos, Lei Li:

Evaluating Durability: Benchmark Insights into Image and Text Watermarking. (4):1-44 - Juan Pablo Zuluaga Gomez, Karel Veselý, Igor Szöke, Alexander Blatt, Petr Motlícek, Martin Kocour, Khalid Choukri, Iuliia Nigmatulina, Claudia Cevenini, Allan Tart, Jan Cernocký, Dietrich Klakow:

ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications. (5):1-45 - Hannah Schulz-Kümpel, Sebastian Fischer, Roman Hornung, Anne-Laure Boulesteix, Thomas Nagler, Bernd Bischl:

Constructing Confidence Intervals for “the” Generalization Error – a Comprehensive Benchmark Study. (6):1-73 - David Rousseau, Antoine Marot, Zhen Xu:

Towards impactful challenges: post-challenge paper, benchmarks and other dissemination actions. (7):1-20 - Pu Ren, N. Benjamin Erichson, Junyi Guo, Shashank Subramanian, Omer San, Zarija Lukic, Michael W. Mahoney:

SuperBench: A Super-Resolution Benchmark Dataset for Scientific Machine Learning. (8):1-45 - Lukas Helff, Wolfgang Stammer, Hikaru Shindo, Devendra Singh Dhami, Kristian Kersting:

V-LoL: A Diagnostic Dataset for Visual Logical Learning. (9):1-41 - Hugo Jair Escalante, Isabelle Guyon, Addison Howard, Walter Reade, Sébastien Treguer:

Challenge design roadmap. (10):1-42 - Lingjiao Chen, Bilge Acun, Newsha Ardalani, Yifan Sun, Feiyang Kang, Hanrui Lyu, Yongchan Kwon, Ruoxi Jia, Carole-Jean Wu, Matei Zaharia, James Zou:

Data Acquisition: A New Frontier in Data-centric AI. (11):1-19 - Ziwei Yang, Xuxi Chen, Biqing Zhu, Tianlong Chen, Zhangyang Wang:

Deep Learning for Accurate Diagnosis of Viral Infections through scRNA-seq Analysis: A Comprehensive Benchmark Study. (12):1-19 - Vasu Sharma, Karthik Padthe, Newsha Ardalani, Kushal Tirumala, Russell Howes, Hu Xu, Po-Yao Huang, Daniel Li Chen, Armen Aghajanyan, Gargi Ghosh, Luke Zettlemoyer:

Text Quality-Based Pruning for Efficient Training of Language Models. (13):1-13 - Ronak Tali, Ali Rabeh, Cheng-Hau Yang, Mehdi Shadkhah, Samundra Karki, Abhisek Upadhyaya, Suriya Dhakshinamoorthy, Marjan Saadati, Soumik Sarkar, Adarsh Krishnamurthy, Chinmay Hegde, Aditya Balu, Baskar Ganapathysubramanian:

FlowBench: A Large Scale Benchmark for Flow Simulation over Complex Geometries. (14):1-35 - Angus Dempster, Navid Mohammadi Foumani, Chang Wei Tan, Lynn Miller, Amish Mishra, Mahsa Salehi, Charlotte Pelletier, Daniel F. Schmidt, Geoffrey I. Webb:

MONSTER: Monash Scalable Time Series Evaluation Repository. (15):1-47 - Christian Schultze, Niklas Kerkfeld, Kara Kuebart, Princilia Weber, Moritz Wolter, Felix Selgert:

Chronicling Germany: An Annotated Historical Newspaper Dataset. (16):1-29 - Helen Jin, Shreya Havaldar, Chaehyeon Kim, Anton Xue, Weiqiu You, Helen Qu, Marco Gatti, Daniel A. Hashimoto, Bhuvnesh Jain, Amin Madani, Masao Sako, Lyle H. Ungar, Eric Wong:

The FIX Benchmark: Extracting Features Interpretable to eXperts. (17):1-43 - Jost Arndt, Utku Isil, Michael Detzel, Wojciech Samek, Jackie Ma:

Synthetic Datasets for Machine Learning on Spatio-Temporal Graphs using PDEs. (18):1-36 - Siddhartha Laghuvarapu, Namkyeong Lee, Chufan Gao, Jimeng Sun:

MolTextQA: A Question-Answering Dataset and Benchmark for Evaluating Multimodal Architectures and LLMs on Molecular Structure-Text Understanding. (19):1-37 - Lev Telyatnikov, Guillermo Bernárdez, Marco Montagna, Mustafa Hajij, Martin Carrasco, Pavlo Vasylenko, Mathilde Papillon, Ghada Zamzmi, Michael T. Schaub, Jonas Verhellen, Pavel Snopov, Bertran Miquel-Oliver, Manel Gil-Sorribes, Alexis Molina, Victor Guallar, Theodore Long, Julian Suk, Patryk Rygiel, Alexander V. Nikitin, Giordan Escalona, Michael Banf, Dominik Filipiak, Liliya Imasheva, Max Schattauer, Alvaro L. Martinez, Halley Fritze, Marissa Masden, Valentina Sánchez, Manuel Lecha, Andrea Cavallo, Claudio Battiloro, Matthew Piekenbrock, Mauricio Tec, George Dasoulas, Nina Miolane, Simone Scardapane, Theodore Papamarkou:

TopoBench: A Framework for Benchmarking Topological Deep Learning. (20):1-39 - Mehdi Shadkhah, Ronak Tali, Ali Rabeh, Cheng-Hau Yang, Ethan Herron, Abhisek Upadhyaya, Adarsh Krishnamurthy, Chinmay Hegde, Aditya Balu, Baskar Ganapathysubramanian:

MPFBench: A Large Scale Dataset for SciML of Multi-Phase-Flows: Droplet and Bubble Dynamics. (21):1-35 - Surbhi Mittal, Rishi Dey Chowdhury, Mayank Vatsa, Richa Singh:

DecordFace: A Framework for Degraded and Corrupted Face Recognition. (22):1-43 - Hang Chen, Xinyu Yang, Keqing Du:

Towards Causal Relationship in indefinite data: New Datasets and Baseline Model. (23):1-40 - Yushun Dong, William Shiao, Yozen Liu, Jundong Li, Neil Shah, Tong Zhao:

SEESAW: Do Graph Neural Networks Improve Node Representation Learning for All? (24):1-42 - Konstantin Schürholt, Léo Meynent, Yefan Zhou, Haiquan Lu, Yaoqing Yang, Damian Borth:

A Model Zoo on Phase Transitions in Neural Networks. (25):1-34 - Siddharth Joshi, Besmira Nushi, Vidhisha Balachandran, Varun Chandrasekaran, Vibhav Vineet, Neel Joshi, Baharan Mirzasoleiman:

MM-GEN: Principled and Generalizable Data Curation for Enhancing Task Performance in VLMs. (26):1-28 - Gustavo Stolovitzky, Julio Saez-Rodriguez, Julie Bletz, Jake Albrecht, Gaia Andreoletti, James C. Costello, Paul C. Boutros:

The life cycle of challenges and benchmarks. (27):1-16 - Evgeny Saveliev, Jiashuo Liu, Nabeel Seedat, Anders Boyd, Mihaela van der Schaar:

Towards Human-Guided, Data-Centric LLM Co-Pilots. (28):1-74

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














