MIT News August 30, 2024 To improve data transparency and understanding of training language models on vast, diverse and inconsistently documented datasets an international team of researchers (USA – MIT, Harvard, UC Irvine, industry, University of Colorado, Olin College of Engineering, Carnegie Mellon University, and France, Canada) convened a multi-disciplinary effort between legal and machine learning experts to systematically audit and trace more than 1,800 text datasets. They developed tools and standards to trace the lineage of these datasets, including their source, creators, licenses and subsequent use. They found sharp divides in the composition and focus of data licensed for […]
Category Archives: Modeling
How to figure out what you don’t know
TechXplore October 26, 2020 Machine learning optimizes flexible models to predict data. In scientific applications, there is a rising interest in interpreting these flexible models to derive hypotheses from data. Researchers from Cold Spring Harbor Laboratory tested this connection using a flexible, yet intrinsically interpretable framework for modelling neural dynamics. Many models discovered during optimization predict data equally well, yet they fail to match the correct hypothesis. They developed an alternative approach that identifies models with correct interpretation by comparing model features across data samples to separate true features from noise. Their results reveal that good predictions cannot substitute for […]