A new paper, “Data-driven federated learning in drug discovery with knowledge distillation”, was recently published in Nature Machine Intelligence, the leading journal for Machine Learning and AI. It explores the potential for federated learning to advance drug discovery through secure, collaborative research, representing a major step forward in privacy-preserving machine learning.
What is Federated Learning?
Federated Learning enables machine learning across multiple data sources while protecting private information. By allowing collaboration without directly sharing sensitive data, it paves the way for a new generation of improved models.
The paper introduces FLuID (Federated Learning Using Information Distillation), a novel approach designed to facilitate knowledge sharing across organisations without compromising the confidentiality of proprietary data. Developed through collaboration between Lhasa Limited and eight of its key members, this research will serve as a reference point for future studies and potential products aimed at enhancing federated learning in the pharmaceutical industry.
Why the research matters
In scientific research, the most valuable data often remains locked within private data silos, with organisations hesitant to share due to confidentiality concerns. Federated learning offers a promising solution by allowing collaboration without the need to share sensitive data directly. However, it comes with its own set of challenges, particularly around aligning models with partner-specific domains.
What makes FLuID unique is its data-centric approach, leveraging knowledge distillation to federate information effectively across multiple organisations. This method ensures original private labels remain anonymous and untraceable, adhering to data protection and governance requirements. It’s a ‘model-agnostic’ solution, meaning it supports various machine learning techniques and provides persistent, reusable knowledge that can benefit future AI advancements.
Thierry Hanser, Head of Molecular Informatics and AI at Lhasa and co-author of the paper, shared his thoughts on the impact of this research:
“FLuID’s innovative data-driven approach offers a multitude of advantages over classical model-driven federated learning. The resulting federated data is an extremely valuable and flexible asset to address the model alignment challenge.”
Real-world impact and potential of federated learning
The research was validated through two experiments: one that used public data to simulate a virtual consortium, and another that involved collaboration with eight pharmaceutical companies.
Key findings:
- FLuID significantly improves model performance and applicability domains when knowledge is shared among consortium partners.
- This collective intelligence enables pharmaceutical organisations to leverage a broader, more diverse dataset, improving predictions for biological activity.
Next steps:
- The first stage successfully focused on knowledge sharing.
- The next phase will explore how partners can apply this shared knowledge within their own chemical space and use cases.
Thierry also emphasised the importance of collaboration in advancing the research:
“We are delighted to collaborate with our members to explore the potential of this approach and its application in their own space; this second research phase is fascinating and tackles very important challenges in QSAR.”
This paper will serve as a critical reference for future research and potentially new products within this context. The insights from this work are expected to engage our members in the FLuID research initiative, paving the way for a follow-up paper in 2025.
The bigger picture
As a not-for-profit organisation and educational charity, Lhasa is dedicated to advancing scientific knowledge and promoting collaborative learning. This publication not only reinforces our commitment to developing innovative solutions but also highlights our leadership in federated learning for drug discovery.
We are excited to see how this research evolves and the positive impact it will have on collaborative AI models in the pharmaceutical industry.
Explore the full publication for more insights.
Discover more about how we can support your chemical safety assessments.
Last Updated on March 28, 2025 by lhasalimited