In silico evolution of nucleic acid-binding proteins from a nonfunctional scaffold

Raven, Samuel A.; Payne, Blake; Bruce, Mitchell; Filipovska, Aleksandra; Rackham, Oliver

doi:10.1038/s41589-022-00967-y

Article
Published: 24 February 2022

In silico evolution of nucleic acid-binding proteins from a nonfunctional scaffold

Nature Chemical Biology volume 18, pages 403–411 (2022)Cite this article

6674 Accesses
4 Citations
51 Altmetric
Metrics details

Subjects

Abstract

Directed evolution emulates the process of natural selection to produce proteins with improved or altered functions. These approaches have proven to be very powerful but are technically challenging and particularly time and resource intensive. To bypass these limitations, we constructed a system to perform the entire process of directed evolution in silico. We employed iterative computational cycles of mutation and evaluation to predict mutations that confer high-affinity binding activities for DNA and RNA to an initial de novo designed protein with no inherent function. Beneficial mutations revealed modes of nucleic acid recognition not previously observed in natural proteins, highlighting the ability of computational directed evolution to access new molecular functions. Furthermore, the process by which new functions were obtained closely resembles natural evolution and can provide insights into the contributions of mutation rate, population size and selective pressure on functionalization of macromolecules in nature.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Comparison between directed evolution and in silico evolution.**

**Fig. 2: The process of in silico evolution.**

**Fig. 3: Computational evolution of new DNA- and RNA-binding proteins.**

**Fig. 4: Evolved proteins bind nucleic acids in vitro and in vivo.**

**Fig. 5: Features of artificial evolution that produce de novo DNA/RNA-binding proteins.**

**Fig. 6: Effects of different variables on the in silico evolutionary process.**

A novel framework for engineering protein loops exploring length and compositional variation

Article Open access 28 April 2021

Machine-learning-guided directed evolution for protein engineering

Article 15 July 2019

In vivo hypermutation and continuous evolution

Article 19 May 2022

Data availability

The authors declare that the data supporting the findings of this study are available within the paper and its supplementary information files. The structural coordinates for DHR8 are available at the Protein Data Bank (PDB) under accession 5CWF. DP-Bind is available at http://lcg.rit.albany.edu/dp-bind/Source data are provided with this paper.

Code availability

Proseeker is available for download via GitHub (https://github.com/EvolveWithProseeker/Proseeker). The software package was initially produced in MATLAB (v.r2017b) before porting to other languages with a final optimized Python3 version made available.

Reference

Arnold, F. H. Innovation by evolution: bringing new chemistry to life (Nobel lecture). Angew. Chem. Int. Ed. Engl. 58, 14420–14426 (2019).
Article CAS PubMed Google Scholar
Wang, Y. et al. Directed evolution: methodologies and applications. Chem. Rev. https://doi.org/10.1021/acs.chemrev.1c00260 (2021).
Filipovska, A. & Rackham, O. Building a parallel metabolism within the cell. ACS Chem. Biol. 3, 51–63 (2008).
Article CAS PubMed Google Scholar
Packer, M. S. & Liu, D. R. Methods for the directed evolution of proteins. Nat. Rev. Genet. 16, 379–394 (2015).
Article CAS PubMed Google Scholar
Wrenbeck, E. E., Faber, M. S. & Whitehead, T. A. Deep sequencing methods for protein engineering and design. Curr. Opin. Struct. Biol. 45, 36–44 (2017).
Article CAS PubMed Google Scholar
Chandrasegaran, S. & Carroll, D. Origins of programmable nucleases for genome engineering. J. Mol. Biol. 428, 963–989 (2016).
Article CAS PubMed Google Scholar
Pickar-Oliver, A. & Gersbach, C. A. The next generation of CRISPR–Cas technologies and applications. Nat. Rev. Mol. Cell Biol. 20, 490–507 (2019).
Article CAS PubMed PubMed Central Google Scholar
Moore, R., Chandrahas, A. & Bleris, L. Transcription activator-like effectors: a toolkit for synthetic biology. ACS Synth. Biol. 3, 708–716 (2014).
Article CAS PubMed PubMed Central Google Scholar
Corley, M., Burns, M. C. & Yeo, G. W. How RNA-binding proteins interact with RNA: molecules and mechanisms. Mol. Cell 78, 9–29 (2020).
Article CAS PubMed PubMed Central Google Scholar
Hall, T. M. T. De-coding and re-coding RNA recognition by PUF and PPR repeat proteins. Curr. Opin. Struct. Biol. 36, 116–121 (2016).
Article CAS PubMed Google Scholar
Filipovska, A. & Rackham, O. Designer RNA-binding proteins: new tools for manipulating the transcriptome. RNA Biol. 8, 978–983 (2011).
Article CAS PubMed Google Scholar
Filipovska, A., Razif, M. F. M., Nygård, K. K. A. & Rackham, O. A universal code for RNA recognition by PUF proteins. Nat. Chem. Biol. 7, 425–427 (2011).
Article CAS PubMed Google Scholar
Brunette, T. J. et al. Exploring the repeat protein universe through computational protein design. Nature 528, 580–584 (2015).
Article CAS PubMed PubMed Central Google Scholar
Anfinsen, C. B. Principles that govern the folding of protein chains. Science 181, 223–230 (1973).
Article CAS PubMed Google Scholar
Kawashima, S. et al. AAindex: amino acid index database, progress report 2008. Nucleic Acids Res. 36, D202–D205 (2008).
Article CAS PubMed Google Scholar
Filipovska, A. & Rackham, O. Modular recognition of nucleic acids by PUF, TALE and PPR proteins. Mol. Biosyst. 8, 699–708 (2012).
Article CAS PubMed Google Scholar
Coquille, S. et al. An artificial PPR scaffold for programmable RNA recognition. Nat. Commun. 5, 5729 (2014).
Article CAS PubMed Google Scholar
Spåhr, H. et al. Modular ssDNA binding and inhibition of telomerase activity by designer PPR proteins. Nat. Commun. 9, 2212 (2018).
Article PubMed PubMed Central Google Scholar
Patel, P. H. & Loeb, L. A. DNA polymerase active site is highly mutable: evolutionary consequences. Proc. Natl Acad. Sci. USA 97, 5095–5100 (2000).
Article CAS PubMed PubMed Central Google Scholar
Rogozin, I. B. & Pavlov, Y. I. Theoretical analysis of mutation hotspots and their DNA sequence context specificity. Mutat. Res. 544, 65–85 (2003).
Article CAS PubMed Google Scholar
Starr, T. N. & Thornton, J. W. Epistasis in protein evolution. Protein Sci. 25, 1204–1218 (2016).
Article CAS PubMed PubMed Central Google Scholar
Helling, R. et al. The designability of protein structures. J. Mol. Graph. Model. 19, 157–167 (2001).
Article CAS PubMed Google Scholar
Hwang, S., Gou, Z. & Kuznetsov, I. B. DP-Bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins. Bioinformatics 23, 634–636 (2007).
Article CAS PubMed Google Scholar
Michnick, S. W., Remy, I., Campbell-Valois, F. X., Vallée-Bélisle, A. & Pelletier, J. N. Detection of protein–protein interactions by protein fragment complementation strategies. Methods Enzymol. 328, 208–230 (2000).
Article CAS PubMed Google Scholar
Codling, E. A., Plank, M. J. & Benhamou, S. Random walk models in biology. J. R. Soc. Interface 5, 813–834 (2008).
Article PubMed PubMed Central Google Scholar
Soskine, M. & Tawfik, D. S. Mutational effects and the evolution of new protein functions. Nat. Rev. Genet. 11, 572–582 (2010).
Article CAS PubMed Google Scholar
Ren, C., Wen, X., Mencius, J. & Quan, S. Selection and screening strategies in directed evolution to improve protein stability. Bioresour. Bioprocess. 6, 53 (2019).
Article Google Scholar
Cobb, R. E., Chao, R. & Zhao, H. Directed evolution: past, present, and future. Am. Inst. Chem. Eng. J. 59, 1432–1440 (2013).
Article CAS Google Scholar
Chen, K. & Arnold, F. H. Engineering new catalytic activities in enzymes. Nat. Catal. 3, 203–213 (2020).
Article CAS Google Scholar
Scott, L. H., Mathews, J. C., Filipovska, A. & Rackham, O. in Methods in Enzymology Vol. 633 (ed. Shukla, A. K.) 231–250 (Academic Press, 2020).
Rix, G. et al. Scalable continuous evolution for the generation of diverse enzyme variants encompassing promiscuous activities. Nat. Commun. 11, 5644 (2020).
Article CAS PubMed PubMed Central Google Scholar
Esvelt, K. M., Carlson, J. C. & Liu, D. R. A system for the continuous directed evolution of biomolecules. Nature 472, 499–503 (2011).
Article CAS PubMed PubMed Central Google Scholar
Crook, N. et al. In vivo continuous evolution of genes and pathways in yeast. Nat. Commun. 7, 13051 (2016).
Article CAS PubMed PubMed Central Google Scholar
Wittmann, B. J., Johnston, K. E., Wu, Z. & Arnold, F. H. Advances in machine learning for directed evolution. Curr. Opin. Struct. Biol. 69, 11–18 (2021).
Article CAS PubMed Google Scholar
Bedbrook, C. N. et al. Machine learning-guided channelrhodopsin engineering enables minimally invasive optogenetics. Nat. Methods 16, 1176–1184 (2019).
Article CAS PubMed PubMed Central Google Scholar
Wu, Z., Kan, S. B. J., Lewis, R. D., Wittmann, B. J. & Arnold, F. H. Machine learning-assisted directed protein evolution with combinatorial libraries. Proc. Natl Acad. Sci. USA 116, 8852–8858 (2019).
Article CAS PubMed PubMed Central Google Scholar
Saito, Y. et al. Machine-learning-guided mutagenesis for directed evolution of fluorescent proteins. ACS Synth. Biol. 7, 2014–2022 (2018).
Article CAS PubMed Google Scholar
Cadet, F. et al. A machine learning approach for reliable prediction of amino acid interactions and its application in the directed evolution of enantioselective enzymes. Sci. Rep. 8, 16757 (2018).
Article PubMed PubMed Central Google Scholar
Alley, E. C., Khimulya, G., Biswas, S., AlQuraishi, M. & Church, G. M. Unified rational protein engineering with sequence-based deep representation learning. Nat. Methods 16, 1315–1322 (2019).
Article CAS PubMed PubMed Central Google Scholar
Huang, P.-S., Boyken, S. E. & Baker, D. The coming of age of de novo protein design. Nature 537, 320–327 (2016).
Article CAS PubMed Google Scholar
Koga, N. et al. Principles for designing ideal protein structures. Nature 491, 222–227 (2012).
Article CAS PubMed PubMed Central Google Scholar
Vorobieva, A. A. et al. De novo design of transmembrane β barrels. Science 371, eabc8182 (2021).
Article PubMed PubMed Central Google Scholar
Shen, H. et al. De novo design of self-assembling helical protein filaments. Science 362, 705–709 (2018).
Article CAS PubMed PubMed Central Google Scholar
Cao, L. et al. De novo design of picomolar SARS-CoV-2 miniprotein inhibitors. Science 370, 426–431 (2020).
Article CAS PubMed PubMed Central Google Scholar
Butterfield, G. L. et al. Evolution of a designed protein assembly encapsulating its own RNA genome. Nature 552, 415–420 (2017).
Article CAS PubMed PubMed Central Google Scholar
Broom, A. et al. Ensemble-based enzyme design can recapitulate the effects of laboratory directed evolution in silico. Nat. Commun. 11, 4808 (2020).
Article CAS PubMed PubMed Central Google Scholar
Miles, A. J., Ramalli, S. G. & Wallace, B. A. DichroWeb, a website for calculating protein secondary structure from circular dichroism spectroscopic data. Protein Soc. https://doi.org/10.1002/pro.4153 (2021).

Download references

Acknowledgements

We thank the anonymous reviewers for their insightful suggestions and K. Young for assistance with circular dichroism experiments. Work in our laboratories is supported by fellowships from the National Health and Medical Research Council (APP1154646, to A.F., and APP1154932, to O.R.) and an Australian Research Council Centre of Excellence (CE200100029, to A.F. and O.R.). S.A.R. and B.P. are supported by Australian Postgraduate Awards. This work was supported by resources provided by the Pawsey Supercomputing Centre with funding from the Australian Government and the Government of Western Australia.

Author information

Authors and Affiliations

Harry Perkins Institute of Medical Research, Nedlands, Western Australia, Australia
Samuel A. Raven, Blake Payne, Aleksandra Filipovska & Oliver Rackham
University of Western Australia Centre for Medical Research, Nedlands, Western Australia, Australia
Samuel A. Raven, Blake Payne & Aleksandra Filipovska
Curtin Medical School, Curtin University, Bentley, Western Australia, Australia
Mitchell Bruce & Oliver Rackham
School of Molecular Sciences, The University of Western Australia, Crawley, Western Australia, Australia
Aleksandra Filipovska
Telethon Kids Institute, Northern Entrance, Perth Children’s Hospital, Nedlands, Western Australia, Australia
Aleksandra Filipovska & Oliver Rackham
Curtin Health Innovation Research Institute, Curtin University, Bentley, Western Australia, Australia
Oliver Rackham

Authors

Samuel A. Raven
View author publications
You can also search for this author in PubMed Google Scholar
Blake Payne
View author publications
You can also search for this author in PubMed Google Scholar
Mitchell Bruce
View author publications
You can also search for this author in PubMed Google Scholar
Aleksandra Filipovska
View author publications
You can also search for this author in PubMed Google Scholar
Oliver Rackham
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.A.R., A.F. and O.R. designed the research. S.A.R. wrote the computer code and carried out the computational and biological experiments. B.P. and M.B. carried out biological experiments. S.A.R., A.F. and O.R. analyzed the experiments. A.F. and O.R. supervised the research. S.A.R. and O.R. wrote the manuscript, with contributions from all authors, and all authors edited the manuscript.

Corresponding author

Correspondence to Oliver Rackham.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Chemical Biology thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Sequences of synthetically evolved proteins (SEPROs).

a, SEPRO1-9 and their progenitor protein DHR8. SEPRO1-5 were chosen from the productive phase of the synthetic evolution cycle, SEPRO6-9 were chosen from earlier generations. SEPRO1-3 were found to bind nucleic acids. b, Comparison of nucleic acid binding SEPROs and PPRs (designated by their UniProt IDs). c, The variant masking scheme used in the experiment shown in Fig. 6b.

Extended Data Fig. 2 Electrophoretic mobility shift assay of SEPRO5.

Incubation of SEPRO5 with a selection of RNA and DNA homopolymer probes.

Source data

Extended Data Fig. 3 Characterization of SEPRO2 by protein titration EMSAs.

a, Binding of SEPRO2 to poly(G) ssRNA, b, ssDNA and c, poly(C/G) dsDNA was assessed. Low apparent binding of C/G dsDNA by SEPRO2 may be due to the presence of a small amount of free poly(G) ssDNA probe in these reactions, or an ability of these proteins to invade dsDNA duplexes (Extended Data Fig. 3c). d, Quantitation of SEPRO2 binding to poly(G) ssRNA and ssDNA revealed Kd values of 9.0×10⁻⁸ and 1.8×10⁻⁷, respectively.

Source data

Extended Data Fig. 4 SEPROs from early stages of evolution have not yet attained functionality.

a, Protein sequences were selected from generations 4 and 10 of the directed evolution process (designated SEPRO6-9). The mean, range and standard deviation (SD) for each run are shown. b-e, Electrophoretic mobility shift assays testing RNA, ssDNA and dsDNA-binding of SEPRO6-9.

Source data

Extended Data Fig. 5 Bacterial three-hybrid system expression cassettes.

Fusion proteins are expressed from a pET24(+) backbone, allowing IPTG inducible protein expression in T7 RNA polymerase expressing Escherichia coli strains. Hybrid RNAs are constitutively expressed from an rrnB promoter within a p15A plasmid backbone.

Extended Data Fig. 6 Neutral mutations in computational evolution.

Instances where previously neutral mutations eventually contributed to the development of new binding sites are identified where the Levenshtein distance between progeny sequences and their progenitor is greater than 2 (the number of new mutations introduced in each generation). Windows of relevant protein sequence are shown as examples in specific cases.

Extended Data Fig. 7 Dynamic residue similarity.

a, Dynamic residue similarity (DRS) map showing number of residues to which each residue is considered similar to at each similarity level with unique, dominant and effective outliers shown (the data from this panel are provided in a tabular form in Supplementary Table 1). b, Similarity pathing of methionine and cystine from the highest two similarity levels. c, Optimized PPR library residue frequency heatmap. d, Less diverse PPR library residue frequency heatmap.

Extended Data Fig. 8 Protein characterization.

a, Purified proteins were assessed by SDS-PAGE followed by Coomassie blue staining. b, Circular dichroism spectrum for SEPRO2. Experimentally determined secondary structure proportions are compared with those predicted using PSIPRED. c, Proteins do not bind fluorescein dye alone, as determined using electrophoretic mobility shift assays of SEPRO2 and SEPRO6.

Source data

Supplementary information

Supplementary Information

Supplementary Notes 1–3, Tables 1–3 and description of datasets.

Reporting Summary

Supplementary Dataset 1

Amino acid descriptors used to assess protein elements in Proseeker.

Supplementary Dataset 2

Scoring of an example assessment window in Proseeker. Related to Supplementary Note 2.

Source data

Source Data Fig. 4

Unprocessed EMSAs for Fig. 4.

Source Data Extended Data Fig. 2

Unprocessed EMSAs for Extended Data Fig. 2.

Source Data Extended Data Fig. 3

Unprocessed EMSAs for Extended Data Fig. 3.

Source Data Extended Data Fig. 4

Unprocessed EMSAs for Extended Data Fig. 4.

Source Data Extended Data Fig. 8

Unprocessed Coomassie-stained protein gel and EMSA for Extended Data Fig. 8.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Raven, S.A., Payne, B., Bruce, M. et al. In silico evolution of nucleic acid-binding proteins from a nonfunctional scaffold. Nat Chem Biol 18, 403–411 (2022). https://doi.org/10.1038/s41589-022-00967-y

Download citation

Received: 30 March 2021
Accepted: 04 January 2022
Published: 24 February 2022
Issue Date: April 2022
DOI: https://doi.org/10.1038/s41589-022-00967-y

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

Reference

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links