Applications of Artificial Intelligence in Biotechnology and Medicine
Artificial Intelligence (AI) or Machine learning (ML) combines computer science with statistics and enables computers to learn from data and make decisions or predictions without being explicitly programmed to do so. AI has revolutionized numerous sectors within biotechnology, here I will highlight a few notable applications specifically in genomics and medicine.
All living organisms store their genetic instructions in Deoxyribonucleic acid (DNA), built from the same four building blocks (nucleotides) but arranged in different sequential combinations. DNA contains the biological instructions, while proteins carry out the functions encoded within it. DNA sequencing technology enables the identification of the sequential arrangement of DNA. With the rapid advancements in DNA sequencing technology, the cost of sequencing has plummeted to less than 1000 US Dollars per human genome. Consequently, more and more genomes are being sequenced and there has been a remarkable expansion in the volume of publicly available DNA sequencing data. A new scientific discipline emerged to extract valuable insights from this vast amount of data: Bioinformatics. It integrates principles from both biology and information technology to enhance our understanding of biological processes.
Why AI is required in genomics?
Genomics data is vast and complex, encompassing multi-omics data of the genome, transcriptome, proteome, and epigenome. These datasets offer complementary insights into biological systems, forming intricate hierarchical relationships, often nonlinear. Moreover, the precise mechanisms governing gene regulation still need to be explained.
The declining cost of sequencing has enabled measuring many data points per patient for a large patient population. AI algorithms have emerged as powerful tools for processing and analyzing large-scale datasets, extracting meaningful patterns, and revealing correlations that may elude traditional statistical approaches. They can predict outcomes, classify sequences, and identify patterns within genomic data. This capability is essential for tasks such as identifying disease markers, predicting patient responses to treatments, and classifying genetic mutations. AI techniques enable the integration and analysis of diverse omics datasets to gain a comprehensive understanding of biological systems and diseases.
Fueled by the explosion of biological data, all major biotech and pharma companies are embracing bioinformatics and AI to accelerate and optimize their research efforts. Moreover, major global IT companies such as Google, IBM, Microsoft, and Amazon have dedicated workforces in genomics and medicine working on AI-based technologies.
Applications of AI in genomics and medicine
One of the most prominent applications of AI in genomics is in genome sequence analysis. Sequencing technologies generate massive amounts of data, which require sophisticated computational approaches to analyze and interpret. AI algorithms, such as deep learning, can identify patterns within these large datasets. For example, Oxford Nanopore’s Guppy [1] leverages a deep learning model to translate the raw electronic signal generated by its DNA sequencer into bases. Google’s DeepVariant [2] uses a deep learning model to call genetic variants from high-throughput DNA sequencing data with higher accuracy than traditional methods. By accurately identifying variations in genomic sequences, researchers can better understand genetic predispositions to diseases and develop more personalized medical treatments.
Another impactful application of AI in genomics lies in disease prediction. By analyzing large datasets of patients’ genetic information and medical records, AI algorithms can identify patterns that associate specific genetic variants with an increased risk of developing certain diseases. For example, in oncology, AI models are being trained to predict a patient’s risk of developing specific cancers based on their genetic profile [3]. Using DNA obtained from blood samples, AI algorithms can predict if the patient has cancer or not [4]. This information can then be used for early detection and intervention, potentially saving lives.
AI has completely revolutionized the drug discovery and development process. AI techniques are used at various stages, from target identification and lead optimization to clinical trial design and patient stratification. Analyzing vast libraries of potential drug molecules and their interactions with biological targets is a daunting task. However, AI algorithms can efficiently sift through this data, identifying promising candidates with the desired properties. This not only accelerates the drug discovery process but also increases the likelihood of finding effective treatments. Using multi-omics profiles from patients, AI models can also predict if a patient will respond to a drug treatment [5,6]. These AI models can potentially help clinicians in appropriate treatment selection and further accelerate aid in precision medicine.
Antibody-based therapies have gained prominence in the treatment of various diseases, including cancer and autoimmune disorders. AI algorithms can analyze the structural and sequence data of antibodies to predict their binding affinity and selectivity, facilitating the design of antibodies with enhanced therapeutic properties [7]. Many biotech and pharma industries have resorted to AI to engineer desired antibodies for potential therapeutics or vaccines.
AI has emerged as a powerful tool in the field of protein structure prediction, revolutionizing our ability to accurately predict the three-dimensional structures of proteins from their amino acid sequences. This capability is crucial for understanding protein function, drug discovery, and the design of novel therapeutics. AI algorithms have been applied across various stages of protein structure prediction, from tertiary structure prediction to protein-protein interaction modeling, offering significant advancements in accuracy and efficiency. For instance, DeepMind’s AlphaFold [8] project uses AI to predict the 3D structure of proteins using the information of amino acid sequences of the protein alone. AlphaFold has generated structural predictions for millions of proteins, many of which remain undiscovered through other experimental or computational means. This represents perhaps the most significant contribution AI has made to biology, with profound implications for medicine, especially in the imminent advancement of drug development.
Recent advances in generative AI have also impacted biology and medicine to a greater extent. Generative AI refers to a class of AI algorithms that are designed to generate new content or data that is similar to, but not identical to, existing data. These powerful algorithms are developed by training on expansive datasets, during which they assimilate the fundamental patterns, connections, and frameworks inherent in the data. Essentially, they master the “language” or core patterns that define the underlying data. Using generative AI, scientists can create molecular structures of a novel compound that potentially inhibits the function of a protein of interest and be used for disease treatment [9]. Using generative AI, scientists can also engineer a new protein with a desired function [10]. Furthermore, generative AI has shown promise in personalized medicine by analyzing large-scale genomic and clinical datasets to tailor treatments to individual patients. By leveraging generative models, researchers can simulate the effects of different treatments on patient outcomes, guiding healthcare professionals in making more informed decisions. For example, large language models (LLMs) like OpenAI’s ChatGPT and Google’s PaLM2 are actively trained on vast repositories of patient clinical health records, facilitating researchers in gaining deeper insights into health, disease, and treatment modalities.
In summary, the applications of AI in biotechnology and medicine are diverse and far-reaching. The convergence of AI and genomics has ignited a revolution in healthcare, ushering in an era of personalized medicine and accelerated drug discovery. From deciphering complex genetic data to predicting disease risk and designing targeted therapies, AI is unlocking the immense potential of genomics for the betterment of human health. As research in this field continues to evolve at an unprecedented pace, we can anticipate even more transformative applications in the years to come.
References
- Wick, R. R., Judd, L. M. & Holt, K. E. Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biol 20, 129 (2019).
- Poplin, R. et al. A universal snp and small-indel variant caller using deep neural networks. Nat Biotechnol 36, 983 (2018).
- Moon, I. et al. Machine learning for genetics-based classification and treatment response prediction in cancer of unknown primary. Nat Med 29, 2057–2067 (2023).
- Cristiano, S. et al. Genome-wide cell-free DNA fragmentation in patients with cancer. Nature 570, 385–389 (2019).
- Sharifi-Noghabi, H., Zolotareva, O., Collins, C. C. & Ester, M. MOLI: Multi-omics late integration with deep neural networks for drug response prediction. in Bioinformatics vol. 35 i501–i509 (Oxford University Press, 2019).
- Sharifi-Noghabi, H., Peng, S., Zolotareva, O., Collins, C. C. & Ester, M. AITL: Adversarial inductive transfer learning with input and output space adaptation for pharmacogenomics. Bioinformatics 36, I380–I388 (2020).
- Graves, J. et al. A Review of Deep Learning Methods for Antibodies. Antibodies (Basel) 9, (2020).
- Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
- Jiang, P. et al. Big data in basic and translational cancer research. Nat Rev Cancer 22, 625–639 (2022).
- Madani, A. et al. Large language models generate functional protein sequences across diverse families. Nat Biotechnol 41, 1099–1106 (2023).