Artificial Intelligence (AI) has become a powerful research catalyst in science. At the core of modern AI is the ability to automatically extract knowledge from data and build accurate predictive models. To maximize this impact, it is critical to have access to enough good quality data to allow machine learning algorithms to extract relevant knowledge and produce useful models. One of the main challenges in AI is therefore to compile such pivotal datasets, which is particularly difficult in drug discovery due to the confidential nature of the primary information: the chemical structure. Even with the availability of public data, the most valuable knowledge remains embedded and locked in private silos despite the willingness of industry to share non-competitive information.
This poster looks at an approach to generate shared knowledge, whilst maintaining the confidentiality of private data.