Ensuring AI works with the right dose of curiosity

MIT News  November 10, 2022
To address the challenge of exploration, incentivizing the agent to visit novel states using an exploration bonus can lead to excellent results on hard exploration tasks but can suffer from intrinsic reward bias and underperform when compared to an agent trained using only task rewards. An international team of researchers (USA – MIT, Finland) has proposed a principled constrained policy optimization procedure that automatically tunes the importance of the intrinsic reward: it suppresses the intrinsic reward when exploration is unnecessary and increases it when exploration is required. According to the researchers this resulted in superior exploration that does not require manual tuning to balance the intrinsic reward against the task reward. The researchers demonstrated performance gains across sixty-one ATARI games to validate their claim and provided their coderead more. Open Access TECHNICAL ARTICLE

Credit: MIT

Posted in AI and tagged , .

Leave a Reply