Validation of cryo-EM based atomic models



1. Csh script: fsc_based_validation

Running the script without any parameters or using a keyword "help" will display possible options.

The updated version of script checks the map origin and if necessary, both the map and the respective atomic model in pdb format will be translated. The resultant map has the origin at grid point [0,0,0]).

Current version of the script does not support mmcif file format. Please use a PHENIX conversion tool to convert mmcif to PDB format prior to validation, e.g.:

phenix.cif_as_pdb  model_file.cif

As a result you will obtain a file: model_file.pdb, which you should use for validation.


Please run the script with at least three parameters given on a command line (preferably in a directory containing the cryo-EM map and the respective atomic model, order is not important):

./fsc_based_validation.csh    model:init_3j6e.pdb     resolution:4.5    map:emd_5895.map

Existing files will not be overwritten. Use --overwrite to change it.

  • The script calculates a smoothed model based mask abbreviated as mask3 (by default using a resolution dependent radius around atomic positions). The mask is smoothed: removal of small void volumes and a "soft edge" smoothing using chimera program, e.g.:

    mask3smooth3v3_rad4_0_init_3j6e.mrc - smooth3 => 3 cycles of smoothing using chimera, v3 => features (void volumes) smaller than 3A radius inside the mask have been incorporated into the mask, rad4_0 => radius around atomic positions 4.0 A, init_3j6e => model name used for mask creation. This is the mask which has been used to mask the map specified using the map: keyword.

    mask3smooth5v3_rad4_0_init_3j6e.mrc - smooth5 => 5 cycles of smoothing using chimera, v3 => features (void volumes) smaller than 3A radius inside the mask have been incorporated into the mask, rad4_0 => radius around atomic positions 4.0 A, init_3j6e => model name used for mask creation.

    User defined additional options:
    smooth:5 - the mask will be created using 5 cycles of smoothing (chimera) instead of default 3
    radius:3.3 - use a user specified radius around atomic positions
    voidsize:4 - must be an integer, larger void volumes will be incorporated into the mask

    ./fsc_based_validation.csh  model:init_3j6e.pdb resolution:4.5 map:emd_5895.map smooth:5 radius:3.3 voidsize:4

    The user can also apply own mask (use mrc format, mask must be compatible with the cryo-EM map)
    mask:user_defined_mask.mrc

    ./fsc_based_validation.csh  model:init_3j6e.pdb resolution:4.5 map:emd_5895.map mask:user_defined_mask.mrc

  • The calculated/user defined mask is subsequently applied to mask the cryo-EM map. The resultant masked cryo-EM map is converted to complex structure factors stored in a mtz file.

    map2mtz_init_3j6e_sm3v3rad4_0res4_5.mrc - the masked cryo-EM map
    map2mtz_init_3j6e_sm3v3rad4_0res4_5.mtz - the masked cryo-EM map converted to mtz file

  • The user can also supply a mtz file containing model structure factors using a keyword modelmtz, e.g.:  modelmtz:my_model.mtz
    These model structure factors will be used to calculate the FSC curve between the model map and the masked cryo-EM map.

    ./fsc_based_validation.csh  model:init_3j6e.pdb resolution:4.5 map:emd_5895.map mask:user_defined_mask.mrc modelmtz:my_model.mtz

    If modelmtz keyword will not be used, the script will calculate model structure factors using phenix.fmodel based on the specified model (model: keyword).
    The script will adjust model's ADPs if these will be unrealistically low, e.g. 0).

    If columns named: F-model,PHIF-model or FMODEL,PHIFMODEL are present - these will be automatically selected.
    If the above columns will not be identified, please use in addition an option specifying the respective column names: structure factor module and corresponding phase (comma separated), e.g.:

    modelmtzcolumn:Fmodel,PHIFmodel

    ./fsc_based_validation.csh  model:init_3j6e.pdb resolution:4.5 map:emd_5895.map mask:user_defined_mask.mrc modelmtz:my_model.mtz modelmtzcolumn:Fmodel,PHIFmodel


    The FSC curve can be calculated using a maximum number of 100 resolution shells (shells:50 is default). Corresponding FSCaverage between the cryo-EM map and the model map will be calculated as well. Description of files as well as a short summary of results will be stored in a RESULTS log file:

    RESULTS_map2mtz_init_3j6e_sm3v3rad4_0res4_5.log


  • The script requires CCP4, PHENIX and chimera to be installed. Running it without any arguments displays possible options. Preferably use the provided version of sftools (sftools_big2M, download it only if you are a registered user of the CCP4 suite) which supports larger number of shells and up to 20.000.000 structure factors (the version from CCP4 suite supports 10.000.000 structure factors). The script should run on any Redhat based linux box (Centos, Scientific Linux, OpenSuse) as well as on Ubuntu/Debian (Ubuntu requires installation of tcsh and setting csh to point to tcsh using update-alternatives). Download the script and make it executable, (chmod a+rx file). Any comments and suggestions are welcome.

2. Csh script: rscc_based_validation

Running the script without any parameters or using a keyword "help" will display possible options.


Atomic model should be in PDB format, map should have the origin at grid point [0,0,0]).


Please run the script with at least three parameters given on a command line (preferably in a directory containing the cryo-EM map and the respective atomic model, order is not important):

./rscc_based_validation.csh    model:init_3j6e.pdb     resolution:4.5    map:emd_5895.map

Existing files will not be overwritten. Use --overwrite to change it.

  • The script calculates the Real Space Correlation Coefficient (RSCC) between each residue comprising the atomic model and the map.

  • Default radius around atomic positions (for selecting map fragment corresponding to each residue) is dependent on the specified resolution limit (keyword radius: does not need to be specified).

    radius 2.0 A for resolution higher than 4.0 A
    radius 2.5 A for resolution lower than 4.0 A resolution"

    Using a keyword "radius:" allows the user to specify it, e.g.: radius:1.7

    ./rscc_based_validation.csh model:init_3j6e.pdb resolution:4.5 map:emd_5895.map radius:1.7

    It is also possible to use an automatically determined radius (by phenix.map_model_cc program) - however this radius may change depending on PHENIX version used. Please check python source code for details as it is usually not well documented how automatically determined radius is estimated. In order to use the automatic (phenix.map_model_cc based) radius please use one of the following keywords:
    radius:auto
    radius:none
    radius:phenix
    radius:Auto
    radius:None
    radius:Phenix

    ./rscc_based_validation.csh model:init_3j6e.pdb resolution:4.5 map:emd_5895.map radius:auto

  • You can also specify an experimental map file in the form of map coefficient using mapmtz keyword:  mapmtz:map2mtz_init_3j6e_sm3v3rad4_0res4_5.mtz

    ./rscc_based_validation.csh model:init_3j6e.pdb resolution:4.5 mapmtz:map2mtz_init_3j6e_sm3v3rad4_0res4_5.mtz radius:2.0


    Using mapmtz keyword usually requires selecting which columns should be used for calculating the experimental map.
    Therefore you should specify Fmap and corresponding phases (in that order, separated by a comma).
    If columns will be missing, the script will print the columns in the specified mtz file for your convenience.

    Use mapmtzcolumn: keyword, e.g.: mapmtzcolumn:F,PHIF

    ./rscc_based_validation.csh model:init_3j6e.pdb resolution:4.5 mapmtz:map2mtz_init_3j6e_sm3v3rad4_0res4_5.mtz mapmtzcolumn:F,PHIF


    Combination of mapmtz: and resolution: keywords will allow you to calculate RSCC against map at a lower resolution.


    ./rscc_based_validation.csh model:init_3j6e.pdb resolution:4.5 mapmtz:map2mtz_init_3j6e_sm3v3rad4_0res4_5.mtz mapmtzcolumn:F,PHIF
    ./rscc_based_validation.csh model:init_3j6e.pdb resolution:5.0 mapmtz:map2mtz_init_3j6e_sm3v3rad4_0res4_5.mtz mapmtzcolumn:F,PHIF
    ./rscc_based_validation.csh model:init_3j6e.pdb resolution:5.5 mapmtz:map2mtz_init_3j6e_sm3v3rad4_0res4_5.mtz mapmtzcolumn:F,PHIF
    ./rscc_based_validation.csh model:init_3j6e.pdb resolution:6.0 mapmtz:map2mtz_init_3j6e_sm3v3rad4_0res4_5.mtz mapmtzcolumn:F,PHIF

    Filenames of individual histograms and SUMMARY log files include a resolution keyword (4_5, 5_0 and so on).


  • Per residue list of RSCCs starts with "Per residue:" keyword (used by phenix.map_model_cc program). The script uses it to select the desired fragment of the log file to calculate RSCC histogram and cumulative RSCC scores.

  • The script calculates histogram using an awk script: histogram.awk - it will be automatically downloaded from this web page.
    If histogram.awk is not present (no INTERNET connection), the histogram will be calculated using a less elegant way.




3. The FSC percentile ranking plot as a Qti file: QtiPlot_FSC_ranking_plot



4. Csh script: compare_2mtzs


The script calculates various comparisons of two maps (stored as complex structure factors, F and PHASE) using sftools program from the CCP4 suite. The log file contains correlation of both maps (MAP CORRELATION = FSC),  correlation of Fs and phases as well as calculated phase differences. For larger maps it may be useful to use a recompiled version of sftools program supporting larger mtz files (download it only if you are a registered user of the CCP4 suite, sftools_big2M).


5. Download sftools_big2M

If the links above do not work (there seems to be errors in the database of the Content Management System hosting this web page) , please download the sftools_big2M program from the following url:

sftools_big2M



emploi