Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

In silico evolution of nucleic acid-binding proteins from a nonfunctional scaffold

Abstract

Directed evolution emulates the process of natural selection to produce proteins with improved or altered functions. These approaches have proven to be very powerful but are technically challenging and particularly time and resource intensive. To bypass these limitations, we constructed a system to perform the entire process of directed evolution in silico. We employed iterative computational cycles of mutation and evaluation to predict mutations that confer high-affinity binding activities for DNA and RNA to an initial de novo designed protein with no inherent function. Beneficial mutations revealed modes of nucleic acid recognition not previously observed in natural proteins, highlighting the ability of computational directed evolution to access new molecular functions. Furthermore, the process by which new functions were obtained closely resembles natural evolution and can provide insights into the contributions of mutation rate, population size and selective pressure on functionalization of macromolecules in nature.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Comparison between directed evolution and in silico evolution.
Fig. 2: The process of in silico evolution.
Fig. 3: Computational evolution of new DNA- and RNA-binding proteins.
Fig. 4: Evolved proteins bind nucleic acids in vitro and in vivo.
Fig. 5: Features of artificial evolution that produce de novo DNA/RNA-binding proteins.
Fig. 6: Effects of different variables on the in silico evolutionary process.

Similar content being viewed by others

Data availability

The authors declare that the data supporting the findings of this study are available within the paper and its supplementary information files. The structural coordinates for DHR8 are available at the Protein Data Bank (PDB) under accession 5CWF. DP-Bind is available at http://lcg.rit.albany.edu/dp-bind/Source data are provided with this paper.

Code availability

Proseeker is available for download via GitHub (https://github.com/EvolveWithProseeker/Proseeker). The software package was initially produced in MATLAB (v.r2017b) before porting to other languages with a final optimized Python3 version made available.

Reference

  1. Arnold, F. H. Innovation by evolution: bringing new chemistry to life (Nobel lecture). Angew. Chem. Int. Ed. Engl. 58, 14420–14426 (2019).

    Article  CAS  PubMed  Google Scholar 

  2. Wang, Y. et al. Directed evolution: methodologies and applications. Chem. Rev. https://doi.org/10.1021/acs.chemrev.1c00260 (2021).

  3. Filipovska, A. & Rackham, O. Building a parallel metabolism within the cell. ACS Chem. Biol. 3, 51–63 (2008).

    Article  CAS  PubMed  Google Scholar 

  4. Packer, M. S. & Liu, D. R. Methods for the directed evolution of proteins. Nat. Rev. Genet. 16, 379–394 (2015).

    Article  CAS  PubMed  Google Scholar 

  5. Wrenbeck, E. E., Faber, M. S. & Whitehead, T. A. Deep sequencing methods for protein engineering and design. Curr. Opin. Struct. Biol. 45, 36–44 (2017).

    Article  CAS  PubMed  Google Scholar 

  6. Chandrasegaran, S. & Carroll, D. Origins of programmable nucleases for genome engineering. J. Mol. Biol. 428, 963–989 (2016).

    Article  CAS  PubMed  Google Scholar 

  7. Pickar-Oliver, A. & Gersbach, C. A. The next generation of CRISPR–Cas technologies and applications. Nat. Rev. Mol. Cell Biol. 20, 490–507 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Moore, R., Chandrahas, A. & Bleris, L. Transcription activator-like effectors: a toolkit for synthetic biology. ACS Synth. Biol. 3, 708–716 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Corley, M., Burns, M. C. & Yeo, G. W. How RNA-binding proteins interact with RNA: molecules and mechanisms. Mol. Cell 78, 9–29 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Hall, T. M. T. De-coding and re-coding RNA recognition by PUF and PPR repeat proteins. Curr. Opin. Struct. Biol. 36, 116–121 (2016).

    Article  CAS  PubMed  Google Scholar 

  11. Filipovska, A. & Rackham, O. Designer RNA-binding proteins: new tools for manipulating the transcriptome. RNA Biol. 8, 978–983 (2011).

    Article  CAS  PubMed  Google Scholar 

  12. Filipovska, A., Razif, M. F. M., Nygård, K. K. A. & Rackham, O. A universal code for RNA recognition by PUF proteins. Nat. Chem. Biol. 7, 425–427 (2011).

    Article  CAS  PubMed  Google Scholar 

  13. Brunette, T. J. et al. Exploring the repeat protein universe through computational protein design. Nature 528, 580–584 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Anfinsen, C. B. Principles that govern the folding of protein chains. Science 181, 223–230 (1973).

    Article  CAS  PubMed  Google Scholar 

  15. Kawashima, S. et al. AAindex: amino acid index database, progress report 2008. Nucleic Acids Res. 36, D202–D205 (2008).

    Article  CAS  PubMed  Google Scholar 

  16. Filipovska, A. & Rackham, O. Modular recognition of nucleic acids by PUF, TALE and PPR proteins. Mol. Biosyst. 8, 699–708 (2012).

    Article  CAS  PubMed  Google Scholar 

  17. Coquille, S. et al. An artificial PPR scaffold for programmable RNA recognition. Nat. Commun. 5, 5729 (2014).

    Article  CAS  PubMed  Google Scholar 

  18. Spåhr, H. et al. Modular ssDNA binding and inhibition of telomerase activity by designer PPR proteins. Nat. Commun. 9, 2212 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  19. Patel, P. H. & Loeb, L. A. DNA polymerase active site is highly mutable: evolutionary consequences. Proc. Natl Acad. Sci. USA 97, 5095–5100 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Rogozin, I. B. & Pavlov, Y. I. Theoretical analysis of mutation hotspots and their DNA sequence context specificity. Mutat. Res. 544, 65–85 (2003).

    Article  CAS  PubMed  Google Scholar 

  21. Starr, T. N. & Thornton, J. W. Epistasis in protein evolution. Protein Sci. 25, 1204–1218 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Helling, R. et al. The designability of protein structures. J. Mol. Graph. Model. 19, 157–167 (2001).

    Article  CAS  PubMed  Google Scholar 

  23. Hwang, S., Gou, Z. & Kuznetsov, I. B. DP-Bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins. Bioinformatics 23, 634–636 (2007).

    Article  CAS  PubMed  Google Scholar 

  24. Michnick, S. W., Remy, I., Campbell-Valois, F. X., Vallée-Bélisle, A. & Pelletier, J. N. Detection of protein–protein interactions by protein fragment complementation strategies. Methods Enzymol. 328, 208–230 (2000).

    Article  CAS  PubMed  Google Scholar 

  25. Codling, E. A., Plank, M. J. & Benhamou, S. Random walk models in biology. J. R. Soc. Interface 5, 813–834 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Soskine, M. & Tawfik, D. S. Mutational effects and the evolution of new protein functions. Nat. Rev. Genet. 11, 572–582 (2010).

    Article  CAS  PubMed  Google Scholar 

  27. Ren, C., Wen, X., Mencius, J. & Quan, S. Selection and screening strategies in directed evolution to improve protein stability. Bioresour. Bioprocess. 6, 53 (2019).

    Article  Google Scholar 

  28. Cobb, R. E., Chao, R. & Zhao, H. Directed evolution: past, present, and future. Am. Inst. Chem. Eng. J. 59, 1432–1440 (2013).

    Article  CAS  Google Scholar 

  29. Chen, K. & Arnold, F. H. Engineering new catalytic activities in enzymes. Nat. Catal. 3, 203–213 (2020).

    Article  CAS  Google Scholar 

  30. Scott, L. H., Mathews, J. C., Filipovska, A. & Rackham, O. in Methods in Enzymology Vol. 633 (ed. Shukla, A. K.) 231–250 (Academic Press, 2020).

  31. Rix, G. et al. Scalable continuous evolution for the generation of diverse enzyme variants encompassing promiscuous activities. Nat. Commun. 11, 5644 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Esvelt, K. M., Carlson, J. C. & Liu, D. R. A system for the continuous directed evolution of biomolecules. Nature 472, 499–503 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Crook, N. et al. In vivo continuous evolution of genes and pathways in yeast. Nat. Commun. 7, 13051 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Wittmann, B. J., Johnston, K. E., Wu, Z. & Arnold, F. H. Advances in machine learning for directed evolution. Curr. Opin. Struct. Biol. 69, 11–18 (2021).

    Article  CAS  PubMed  Google Scholar 

  35. Bedbrook, C. N. et al. Machine learning-guided channelrhodopsin engineering enables minimally invasive optogenetics. Nat. Methods 16, 1176–1184 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Wu, Z., Kan, S. B. J., Lewis, R. D., Wittmann, B. J. & Arnold, F. H. Machine learning-assisted directed protein evolution with combinatorial libraries. Proc. Natl Acad. Sci. USA 116, 8852–8858 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Saito, Y. et al. Machine-learning-guided mutagenesis for directed evolution of fluorescent proteins. ACS Synth. Biol. 7, 2014–2022 (2018).

    Article  CAS  PubMed  Google Scholar 

  38. Cadet, F. et al. A machine learning approach for reliable prediction of amino acid interactions and its application in the directed evolution of enantioselective enzymes. Sci. Rep. 8, 16757 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  39. Alley, E. C., Khimulya, G., Biswas, S., AlQuraishi, M. & Church, G. M. Unified rational protein engineering with sequence-based deep representation learning. Nat. Methods 16, 1315–1322 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Huang, P.-S., Boyken, S. E. & Baker, D. The coming of age of de novo protein design. Nature 537, 320–327 (2016).

    Article  CAS  PubMed  Google Scholar 

  41. Koga, N. et al. Principles for designing ideal protein structures. Nature 491, 222–227 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Vorobieva, A. A. et al. De novo design of transmembrane β barrels. Science 371, eabc8182 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  43. Shen, H. et al. De novo design of self-assembling helical protein filaments. Science 362, 705–709 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Cao, L. et al. De novo design of picomolar SARS-CoV-2 miniprotein inhibitors. Science 370, 426–431 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Butterfield, G. L. et al. Evolution of a designed protein assembly encapsulating its own RNA genome. Nature 552, 415–420 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Broom, A. et al. Ensemble-based enzyme design can recapitulate the effects of laboratory directed evolution in silico. Nat. Commun. 11, 4808 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Miles, A. J., Ramalli, S. G. & Wallace, B. A. DichroWeb, a website for calculating protein secondary structure from circular dichroism spectroscopic data. Protein Soc. https://doi.org/10.1002/pro.4153 (2021).

Download references

Acknowledgements

We thank the anonymous reviewers for their insightful suggestions and K. Young for assistance with circular dichroism experiments. Work in our laboratories is supported by fellowships from the National Health and Medical Research Council (APP1154646, to A.F., and APP1154932, to O.R.) and an Australian Research Council Centre of Excellence (CE200100029, to A.F. and O.R.). S.A.R. and B.P. are supported by Australian Postgraduate Awards. This work was supported by resources provided by the Pawsey Supercomputing Centre with funding from the Australian Government and the Government of Western Australia.

Author information

Authors and Affiliations

Authors

Contributions

S.A.R., A.F. and O.R. designed the research. S.A.R. wrote the computer code and carried out the computational and biological experiments. B.P. and M.B. carried out biological experiments. S.A.R., A.F. and O.R. analyzed the experiments. A.F. and O.R. supervised the research. S.A.R. and O.R. wrote the manuscript, with contributions from all authors, and all authors edited the manuscript.

Corresponding author

Correspondence to Oliver Rackham.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Chemical Biology thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Sequences of synthetically evolved proteins (SEPROs).

a, SEPRO1-9 and their progenitor protein DHR8. SEPRO1-5 were chosen from the productive phase of the synthetic evolution cycle, SEPRO6-9 were chosen from earlier generations. SEPRO1-3 were found to bind nucleic acids. b, Comparison of nucleic acid binding SEPROs and PPRs (designated by their UniProt IDs). c, The variant masking scheme used in the experiment shown in Fig. 6b.

Extended Data Fig. 2 Electrophoretic mobility shift assay of SEPRO5.

Incubation of SEPRO5 with a selection of RNA and DNA homopolymer probes.

Source data

Extended Data Fig. 3 Characterization of SEPRO2 by protein titration EMSAs.

a, Binding of SEPRO2 to poly(G) ssRNA, b, ssDNA and c, poly(C/G) dsDNA was assessed. Low apparent binding of C/G dsDNA by SEPRO2 may be due to the presence of a small amount of free poly(G) ssDNA probe in these reactions, or an ability of these proteins to invade dsDNA duplexes (Extended Data Fig. 3c). d, Quantitation of SEPRO2 binding to poly(G) ssRNA and ssDNA revealed Kd values of 9.0×10−8 and 1.8×10−7, respectively.

Source data

Extended Data Fig. 4 SEPROs from early stages of evolution have not yet attained functionality.

a, Protein sequences were selected from generations 4 and 10 of the directed evolution process (designated SEPRO6-9). The mean, range and standard deviation (SD) for each run are shown. b-e, Electrophoretic mobility shift assays testing RNA, ssDNA and dsDNA-binding of SEPRO6-9.

Source data

Extended Data Fig. 5 Bacterial three-hybrid system expression cassettes.

Fusion proteins are expressed from a pET24(+) backbone, allowing IPTG inducible protein expression in T7 RNA polymerase expressing Escherichia coli strains. Hybrid RNAs are constitutively expressed from an rrnB promoter within a p15A plasmid backbone.

Extended Data Fig. 6 Neutral mutations in computational evolution.

Instances where previously neutral mutations eventually contributed to the development of new binding sites are identified where the Levenshtein distance between progeny sequences and their progenitor is greater than 2 (the number of new mutations introduced in each generation). Windows of relevant protein sequence are shown as examples in specific cases.

Extended Data Fig. 7 Dynamic residue similarity.

a, Dynamic residue similarity (DRS) map showing number of residues to which each residue is considered similar to at each similarity level with unique, dominant and effective outliers shown (the data from this panel are provided in a tabular form in Supplementary Table 1). b, Similarity pathing of methionine and cystine from the highest two similarity levels. c, Optimized PPR library residue frequency heatmap. d, Less diverse PPR library residue frequency heatmap.

Extended Data Fig. 8 Protein characterization.

a, Purified proteins were assessed by SDS-PAGE followed by Coomassie blue staining. b, Circular dichroism spectrum for SEPRO2. Experimentally determined secondary structure proportions are compared with those predicted using PSIPRED. c, Proteins do not bind fluorescein dye alone, as determined using electrophoretic mobility shift assays of SEPRO2 and SEPRO6.

Source data

Supplementary information

Supplementary Information

Supplementary Notes 1–3, Tables 1–3 and description of datasets.

Reporting Summary

Supplementary Dataset 1

Amino acid descriptors used to assess protein elements in Proseeker.

Supplementary Dataset 2

Scoring of an example assessment window in Proseeker. Related to Supplementary Note 2.

Source data

Source Data Fig. 4

Unprocessed EMSAs for Fig. 4.

Source Data Extended Data Fig. 2

Unprocessed EMSAs for Extended Data Fig. 2.

Source Data Extended Data Fig. 3

Unprocessed EMSAs for Extended Data Fig. 3.

Source Data Extended Data Fig. 4

Unprocessed EMSAs for Extended Data Fig. 4.

Source Data Extended Data Fig. 8

Unprocessed Coomassie-stained protein gel and EMSA for Extended Data Fig. 8.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Raven, S.A., Payne, B., Bruce, M. et al. In silico evolution of nucleic acid-binding proteins from a nonfunctional scaffold. Nat Chem Biol 18, 403–411 (2022). https://doi.org/10.1038/s41589-022-00967-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41589-022-00967-y

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing