Google DeepMind has introduced AlphaGenome, a new model designed to predict gene regulation directly from raw DNA sequences. Although the name might seem a bit lackluster (we understand, it’s part of the “AlphaEverything” trend), the potential impact is significant. This model aims to assist scientists in understanding how genetic variants influence gene expression, chromatin states, and more through computational means.
In theory, this could be a researcher’s dream. Imagine being able to test hypotheses, predict the effects of mutations, or even design experiments without picking up a pipette. Sounds exciting, right? However, is AlphaGenome the next breakthrough on par with AlphaFold? Not quite—at least not yet. While it is currently the best tool for predicting gene regulation from raw DNA sequences, it’s still early in development. This isn’t a paradigm shift, but could be a significant stepping stone. Below we explore what AlphaGenome can do now, where it still has limitations, and what it would take to change the research landscape.
Starting with AlphaFold
Let’s take a moment to focus on AlphaFold—the standout achievement that made DeepMind well-known in the biology community. If AlphaGenome is the newcomer, AlphaFold is the seasoned sibling who has already won science fairs, delivered TED Talks, and earned a Nobel Prize in Chemistry for good reasons.
What makes AlphaFold exceptional? It can predict a protein’s 3D structure solely from its amino acid sequence. While this may seem like a simple trick, it has fundamentally transformed drug discovery. If you need a small molecule to fit perfectly into a protein’s binding pocket, AlphaFold can model that. If you want to understand how your drug interacts with its target, AlphaFold provides the structure. It also speeds up the prediction of off-target effects and the engineering of new proteins.
AlphaFold’s impact extends beyond small molecules. In theory, it could lead to the design of improved biologics, such as enzyme-based drugs or antibodies engineered through in silico evolution to perform specific functions. This is a significant development, so expectations for AlphaGenome are exceptionally high.
Bringing the Genome into the Fold
AlphaFold changed the game in protein structure prediction, but AlphaGenome is tackling an entirely different challenge.
While AlphaFold focused on a well-defined target (the 3D shape of a protein), AlphaGenome deals with a much more complex problem: predicting how raw DNA sequences translate into actual gene regulation. This process is significantly messier and less straightforward.
Genomics is not as tidy as protein structure. Gene expression can’t be determined solely from the genetic sequence. Anyone who has conducted a multi-omics experiment knows that just because the chromatin is open doesn’t mean a gene is expressed. Even if a gene is expressed and RNA levels are high, that doesn’t mean the protein is functioning as expected. A complicated layer of regulation is involved, including splicing, chromatin architecture, transcription factor binding, DNA accessibility, methylation, histone modifications, and RNA stability, among others. Phew!
AlphaGenome is ambitiously trying to model everything: where gene expression starts and stops, how transcripts are spliced, how much RNA is produced, which regions of the genome are open or closed, who’s binding where, and what the 3D structure of the genome looks like while all of that is happening.
That’s a heck of a lot!
While it’s impressive on paper, there’s a catch—unlike AlphaFold, many of these outputs don’t have clear benchmarks. We can’t always verify AlphaGenome’s predictions with a tidy protein structure like its sister AlphaFold. It’s like trying to score a game when the goalposts are blurry and half the field is invisible. That doesn’t make the work unimportant—it just means interpreting the results requires extra caution (and, ideally, more experimental data).
What Can AlphaGenome Do Right Now?
AlphaGenome still has some significant gaps that could be a TAD problematic (yes, pun fully intended). TADs—short for topologically associating domains—are large (~1 Mb) neighborhoods of DNA that tend to act as a unit. When one region within a TAD is open and active, its neighbors usually are too; when it’s closed, everything tends to shut down in sync.
We often picture DNA as a tidy, linear string—like spaghetti, dry and straight, or al dente and twisted on a fork. But functionally, the genome behaves more like a tangled subway map than a plate of pasta. DNA regions far apart linearly can be right next to each other in 3D space, interacting frequently. Imagine Chicago: the Loop is a dense, always-active hub (a highly open TAD), while lines from quieter neighborhoods connect to it, bringing bursts of traffic at the connection points but less activity farther out. It’s not just distance that matters, but who’s connected to whom, and how often.
AlphaGenome’s current model, which looks at 1 million base pairs at a time, captures some of this but misses a lot. Those long-range regulatory interactions? Still out of reach. And don’t even get us started on histones and chromatin states—the architectural scaffolding and bouncers of the genome. They play a huge role in what genes get expressed and when, but AlphaGenome isn’t fully modeling them yet.
In short, the genome is more like urban planning than a street map, and we’re still developing the tools to read its blueprints properly.
Why That’s Still Useful
Sure, it’s not perfect. It’s a prototype. But even in its current state, AlphaGenome could be a fantastic tool for innovation:
Got a weird mutation you can’t explain? Throw it at AlphaGenome and get a first-pass read on whether it’s messing with gene regulation.
Need a next step for that GWAS hit that lives in a regulatory desert? Here’s a way to test hypotheses before you start spending on wet-lab work.
Dealing with a Variant of Unknown Significance (VUS)? This might help narrow down what’s worth chasing.
Want to skip the painful round of reporter assays? In silico modeling could speed up that early phase of your project dramatically.
Looking Down the Line
So what could AlphaGenome do if it gets where it’s headed?
In many ways, we’re looking at a first-gen version, similar to where AlphaFold was back in 2018. It’s early, imperfect, but undeniably promising. Right now, AlphaGenome is working within a relatively small window: 1 million base pairs at a time. That sounds like a lot, but when it comes to genomic regulation, it’s like trying to understand a city by only looking at a single block.
Many enhancers—the genomic control knobs that regulate gene expression—can be over 1 Mb away from the gene they influence. Some of the most interesting action happens across long distances, between regions separated by both space and functional context. These are the distal enhancers, the tissue-specific regulators, the subtle and complex levers of gene activity. You need the whole train line to make sense of it.
Once AlphaGenome can account for that scale—and ideally layer in multi-omics data like chromatin accessibility, histone modifications, and transcription factor binding—then, and only then, will we really start seeing its power. Imagine a Google Maps-style genomics platform that also pulls in real-time traffic, construction detours, parades, weather alerts, and holiday patterns. This could become a comprehensive platform for navigating gene regulation across conditions, cell types, and diseases.
But here’s the catch: we’ll need a lot more data—and a lot more types of data—to get there. Multi-omics inputs are messy, expensive, and highly context-specific. And building models that can learn across all those modalities isn’t trivial. Right now, AlphaGenome is a solid hypothesis generator and a clever exploration tool. But it’s not ready to run the whole show just yet.
Still, if AlphaGenome gets there, it could reshape how we study disease, design therapies, and understand the genome at a systems level. It’s the best we’ve seen so far, but for now, it’s still a TAD overconfident. The ambition is enormous, the potential is real, but the data inputs, biological complexity, and modeling challenges ahead are equally massive. AlphaGenome might be speaking the right language, but it’s still learning the grammar, slang, and subtext.