top of page
Search

What are foundation models? What powers them? And why should data behind models be governed?

  • Writer: jessicajolly3
    jessicajolly3
  • May 19, 2024
  • 4 min read

Updated: May 25, 2024


Foundation models…. What are they? What are their applications in A.I.- led drug discovery and development? What powers them? And what defensive data strategies should be considered to mitigate risks?


A foundation model (FM) is a type of AI model that is pre-trained on a large amount of data and can be fine-tuned for specific tasks. These models are capable of generalizing from the broad data they are trained on, thereby providing a foundation upon which more specialized models can be built.


In drug discovery, a FM can also help make predictions or generate outputs from a trained model. After a model has been trained on datasets, it can be used to “infer” or predict results on new data. So with such powerful training and emerging inference capabilities, FMs are a game-changer in the medicine development space, with potential to massively accelerate the drug discovery process in many ways including:


  • High-throughput Screening- the model can process large volume of data at a speed beyond human capability. The models can screen millions of chemical compounds quickly to identify potential drug candidates, significantly speeding up early stages of drug discovery

  • Predictive Modeling- FMs can predict the properties of a compound, its potential effectiveness against certain diseases, and potential side effects. This predictive ability can help prioritize certain compounds for further testing and eliminate others, saving time and resources


  • Drug Repurposing: FMs can help identify new uses for existing drugs. This process, known as drug repurposing, is faster and less risky than developing an entirely new drug


  • Personalized Medicine: FMs can analyze patient data to guide the selection of treatments tailored to individual patients. This not only increases the likelihood of a positive response, but also reduces the time spent on trial-and-error treatment approaches and can inform healthcare professionals in their decision making during patient care


  • Speeding up Clinical Trials: By identifying the most promising drug candidates and predicting patient responses, FMs can help design more efficient clinical trials


  • Biomarker Discovery: AI can be used to identify biomarkers that indicate the presence of disease, or how a disease may respond to treatment, helping to speed up the process of drug development


So, what powers FMs?


FMs are dependent on computational power. It is a fundamental aspect of its development and operations. As the models involve processing vast amounts of data and complex algorithms, they require significant computational resources. Some ways in which compute is related to foundation models:


  • Training: Building FMs involves training them on large datasets. This process requires substantial computational power. For example, models like GPT-3 are trained on hundreds of gigabytes of text data, which requires high-performance computing infrastructures


  • Performance: Of FMs is directly tied to available compute. More computational power can lead to faster processing and analysis of data, which can improve model performance


  • Scale: with respect to size of the models and the data they can handle , this is largely dependent on compute. Larger models that can process more data typically require more computational resources


  • Innovation: Advances in compute often drive advances in FMs. Enhancements in processing power and storage capabilities enable the development of much larger and more powerful AI models


  • Cost: The cost of deploying and running FMs is strongly tied to compute, as the use of computational resources incurs expenses


  • Accessibility: Of FMs to organizations is influenced by compute power. More compute allows for the training of larger models. Larger models can capture more complex patterns and relationships in data thereby generating more high quality insights for decision making.


More compute allows for the training of larger models with more parameters. These larger models can often capture more complex patterns and relationships in the data, leading to better performance. So, computational power is a key enabler (and limitation) in the development and deployment of FMs.


What are the risks to mitigate when working with FMs and training them on high compute environments like super-computers sprouting up ubiquitously across TechBio and Biopharma?


Models are trained on data- lots of data, for fine tuning and improvements.


Before training FMs on organizational proprietary data, it is crucial to ensure risk proportionate data governance dimensions including data quality, privacy, security, ethical and intellectual property implications. Training FMs on public data should also be thought through intentionally as public datasets are not always free for use. The terms and conditions may outline legal implications including regulatory (GDPR, CCPA etc) e.g., legal basis to process if sensitive human data, other considerations including data access, use, share and ownership implications for derived outputs from connected datasets, analyses or transformations. It is prudent to understand these terms and dimensions up front to firstly ensure the data terms are being respected and that organizational technology and intellectual property is being protected and not inadvertently subject to risk.


Building out an autonomous data governance capability across an organization’s data ecosystem including super-computers ensures responsible democratization of data and also makes data easily searchable, findable and usable as an asset for model training, inference and fine tuning efforts. The ultimate goal of FMs in Biopharma and TechBio is to generate insights which creates value and future monetization paths including capabilities, products and potential medicines for patients with unmet needs, waiting.



 
 
 

Comments


Subscribe Form

Thanks for submitting!

  • LinkedIn

©2024 by Jessica Jolly

bottom of page