Phys.org June 27, 2018
One of the biggest issues causing bias in facial analysis is the lack of diverse data to train systems on. IBM is releasing a facial attribute and identity training dataset of over 1 million images to improve facial analysis. It is annotated with attributes and identity, leveraging geo-tags from Flickr images to balance data from multiple countries and active learning tools to reduce sample selection bias. Unlike the current datasets the IBM dataset has a single capability to match attributes (hair color, facial hair, etc.) and identify multiple images of the same person. A dataset which includes 36,000 facial images – equally distributed across all ethnicities, genders, and ages will be released to provide a more diverse dataset for people to use in the evaluation of their technologies. IBM will hold a technical workshop in Sept. 2018 to identify and reduce bias in facial analysis… read more.