You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+3-2Lines changed: 3 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@ It provides the `progres` Python package that lets you search structures against
10
10
Searching typically takes 1-2 s and is much faster for multiple queries.
11
11
For the AlphaFold database, initial data loading takes around a minute but subsequent searching takes a tenth of a second per query.
12
12
13
-
Currently [SCOPe](https://scop.berkeley.edu), [CATH](http://cathdb.info), [ECOD](http://prodata.swmed.edu/ecod), the whole [PDB](https://www.rcsb.org), the [AlphaFold structures for 21 model organisms](https://doi.org/10.1093/nar/gkab1061) and the [AlphaFold database TED domains](https://www.biorxiv.org/content/10.1101/2024.03.18.585509) are provided for searching against.
13
+
Currently [SCOPe](https://scop.berkeley.edu), [CATH](http://cathdb.info), [ECOD](http://prodata.swmed.edu/ecod), the whole [PDB](https://www.rcsb.org), [BFVD](https://bfvd.foldseek.com), the [AlphaFold structures for 21 model organisms](https://doi.org/10.1093/nar/gkab1061) and the [AlphaFold database TED domains](https://www.biorxiv.org/content/10.1101/2024.03.18.585509) are provided for searching against.
14
14
Searching is done by domain but [Chainsaw](https://github.com/JudeWells/chainsaw) can be used to automatically split query structures into domains.
15
15
16
16
A [web server](https://progres.mrc-lmb.cam.ac.uk) is available to run Progres.
3. Run `pip install progres`, which will also install [Biopython](https://biopython.org), [mmtf-python](https://github.com/rcsb/mmtf-python), [einops](https://github.com/arogozhnikov/einops) and [pydantic](https://github.com/pydantic/pydantic) if they are not already present.
30
-
4. The first time you search with the software the trained model and pre-embedded databases (~660 MB) will be downloaded to the package directory from [Zenodo](https://zenodo.org/record/7782088), which requires an internet connection. This can take a few minutes. You can set the environmental variable `PROGRES_DATA_DIR` to change where this data is stored, for example if you cannot write to the package directory. Remember to keep it set the next time you run Progres.
30
+
4. The first time you search with the software the trained model and pre-embedded databases (~850 MB) will be downloaded to the package directory from [Zenodo](https://zenodo.org/record/7782088), which requires an internet connection. This can take a few minutes. You can set the environmental variable `PROGRES_DATA_DIR` to change where this data is stored, for example if you cannot write to the package directory. Remember to keep it set the next time you run Progres.
31
31
5. The first time you search against the AlphaFold database TED domains the pre-embedded database (~33 GB) will be downloaded similarly. This can take a while. Make sure you have enough disk space!
32
32
33
33
Alternatively, a Docker file is available in the `docker` directory.
@@ -93,6 +93,7 @@ The available pre-embedded databases are:
93
93
|`cath40`| S40 non-redundant domains from [CATH](http://cathdb.info) 23/11/22 | 31,884 | 1.38 s | 2.79 s |
94
94
|`ecod70`| F70 representative domains from [ECOD](http://prodata.swmed.edu/ecod) develop287 | 71,635 | 1.46 s | 3.82 s |
95
95
|`pdb100`| All [PDB](https://www.rcsb.org) protein chains as of 02/08/24 split into domains with Chainsaw | 1,177,152 | 2.90 s | 27.3 s |
96
+
|`bfvd`|[Big Fantastic Virus Database (BFVD)](https://bfvd.foldseek.com) structures split into domains with Chainsaw | 446,655 | 2.66 s | 13.3 s |
96
97
|`af21org`|[AlphaFold](https://alphafold.ebi.ac.uk) structures for 21 model organisms split into domains by [CATH-Assign](https://doi.org/10.1038/s42003-023-04488-9)| 338,258 | 2.21 s | 11.0 s |
97
98
|`afted`|[AlphaFold database](https://alphafold.ebi.ac.uk) structures split into domains by [TED](https://www.biorxiv.org/content/10.1101/2024.03.18.585509) and clustered at 50% sequence identity | 53,344,209 | 67.7 s | 73.1 s |
0 commit comments