Make Data Count - Finding Data References
Overview
This hackathon challenges you to identify and classify references to scientific data within research papers, aiming to improve how data reuse is valued. You'll build a model to find data citations and determine if they are primary (newly generated for the study) or secondary (reused from existing sources). Your model will help automate the process of linking papers to the data they use, making scientific data more discoverable and impactful.
Requirements
You can participate in teams of any size, but you must accept the competition rules by September 2, 2025, and the last day to join or merge teams is also September 2, 2025. Submissions must be made using Kaggle Notebooks. Your notebook must run within 9 hours on CPU or GPU and have internet access disabled. You can use freely available external data and pre-trained models. The submission file must be a CSV named 'submission.csv' containing 'rowid', 'articleid', 'dataset_id', and 'type' (Primary or Secondary), identifying data references within articles.
Prizes
This competition offers a substantial prize pool of $100,000. The top five teams will receive cash prizes: 1st place gets $40,000, 2nd place receives $20,000, 3rd place gets $17,000, 4th place is awarded $13,000, and 5th place wins $10,000.
Comments
Please log in to post a comment.