A Google artificial intelligence has predicted the structure of almost every known protein; around 200 million molecules that are essential for understanding the biology of all living things on the planet and the mechanisms of some of the most widespread diseases, from malaria to Alzheimer’s to cancer.
“This work heralds a new era in digital biology,” celebrated Demis Hassabis, the 45-year-old programming and neuroscience expert and lead creator of AlphaFold, the neural network system that has almost completely solved one of the biggest problems in biology.
The Brit Hassabis was a young chess and video game talent who founded Deepmind in 2010, a company focused on developing an artificial intelligence that can learn like humans. In 2013, this system proved to be better than all Atari video games. The following year, Google bought the company for around 500 million euros. In 2017, AlphaGo defeated the top champions of Go, the highly complex chess-like Asian board game. Since then, Hassabis has turned his efforts to a much bigger challenge: predicting the three-dimensional shape a protein will have by just reading its genetic sequence, written in two dimensions with DNA letters.
Knowing the three-dimensional structure of these molecules from their genetic sequence is essential to understanding their function, but it is a problem of immense difficulty. It’s like completing a jigsaw puzzle with tens of thousands of pieces without knowing which picture it represents.
Until this system emerges, it could take 13.7 billion years, the age of the universe, to figure out the shape of a single protein made up of 100 basic units – called amino acids. Using X-ray microscopy or giant particle accelerators like the European synchrotron in Grenoble, France, it took scientists years at best. Instead, Google’s algorithm predicts the structure of each protein in a matter of seconds.
“This protein universe” is “a gift to humanity,” emphasized Hassabis when presenting the new database at a press conference on Tuesday together with scientists from the European Molecular Biology Laboratory (EMBL), a public institution that helped develop it by AlphaFold.
By the advent of this technology, the structure of about 200,000 proteins had been determined, a task that took 60 years and the involvement of millions of scientists. This database was the learning material for Google’s artificial intelligence, which searched for valid patterns that predicted the shape of proteins whose two-dimensional sequence is only known. In 2021, the system has already solved the structure of a million proteins, including all human ones. This year’s new supply extends the record to 200 million: virtually every known protein of every living thing on the planet.
The access to this new database it is open and free and its artificial intelligence computer code is open and downloadable. This Google of Life shows the two-dimensional sequence of any protein and a three-dimensional model that indicates the reliability of the prediction, which has a similar or even lower error rate than traditional methods.
It is important to note that AlphaFold does not determine reality, it predicts it. Read the genetic sequence and estimate the most likely configuration of the amino acids. The prediction has high reliability, saving scientists a lot of time and money to conduct theoretical work without using expensive equipment to determine the actual structure of a protein until it is absolutely necessary.
The applications of this new tool are nearly endless, as microscopic proteins are involved in every imaginable biological process, from the mass die-off of bees to the resistance of crops to heat, which undergo myriad diseases.
Matt Higgins’ team at the University of Oxford (UK) used AlphaFold as part of their project to develop an antibody – a type of protein – capable of neutralizing one of the proteins essential for the malaria virus to pathogen can multiply. Within a few years, this type of research could produce the first highly protective vaccine against this disease, as it would prevent the parasite from being transmitted from one person to another through mosquito bites.
Another milestone already achieved is the most detailed structure yet of the nuclear pore, a donut-shaped protein complex that represents the entrance and exit door to the nucleus of human cells and is implicated in myriad diseases, including cancer and cardiovascular disease stands diseases. This new tool allows unprecedented access to understand “how the recipe of life [escrita en el genoma] it is used when it is translated into proteins,” Jan Kosinski, a researcher at EMBL who co-authored the discovery, told the newspaper.
Hassabis and the other leaders of Deepmind and EMBL have assured that analyzes have been carried out of the possible risks linked to the publication of this database and its availability to everyone. “The benefits clearly outweigh the dangers,” stressed the creator of the system, adding that in the future, as this technology evolves, it must be up to the international community to decide whether its use should be restricted.
One of the most tangible applications is the design of tailored molecules that can block harmful proteins or, even better, modulate their activity, a much more desirable effect in the design of new drugs, explains Carlos Fernández, CSIC scientist and group leader of Structural Biology at the Spanish Society for molecular biology. His team used AlphaFold to elucidate part of the structure of a complex made up of several proteins essential for the proliferation of the trypanosome that causes sleeping sickness, which is found in sub-Saharan Africa.
We now have years of work ahead of us to confirm whether the predictions are correct, explains biologist José Márquez, an expert in protein structure at the Grenoble synchrotron. “The next frontier will be that AlphaFold can contribute to the design of protein-blocking or protein-activating drugs, a problem they are already addressing,” he explains. Another stumbling block: The system does not say why a protein gets its final shape, which can be essential when researching diseases such as Alzheimer’s or Parkinson’s related to defective protein folding.
Alfonso Valencia, Director of Life Sciences at the Barcelona Supercomputing Center, discusses the system’s shortcomings. “Not everything is solved because AlphaFold can only predict things that are within the range of known things. For example, it cannot predict the structure of a type of protein that protects well from frost because they are rare and there are not many examples in the databases. It also cannot predict the consequences of mutations, which is a very negative point in medicine,” he points out.
It also acknowledges one of its strengths: that the code is system-wide open, meaning other scientists can improve or modify it at will, even if Google decides to take the system offline. “It’s obvious that the people at Deepmind are trying to win the Nobel Prize by acting in this transparent way,” says Valencia. “On the one hand they get a great image and an advantage over their competitors like Facebook. On the other hand, they have already indicated that they reserve the private use of certain health data and for drug development,” he adds.