Casanovo CLI#
After installing the Casanovo package and all of its dependencies, the Casanovo CLI can be used to access Casanovo functionality from the command line. For Casanovo installation instructions, see the Getting Started.
casanovo#
Casanovo is a state-of-the-art deep learning tool designed for de novo peptide sequencing. Powered by a transformer neural network, Casanovo “translates” peaks in MS/MS spectra into amino acid sequences.
Links:
Documentation: https://casanovo.readthedocs.io
Official code repository: Noble-Lab/casanovo
If you use Casanovo in your work, please cite: - Yilmaz, M., Fondrie, W. E., Bittremieux, W., Oh, S. & Noble, W. S. De novo mass spectrometry peptide sequencing with a transformer model. Proceedings of the 39th International Conference on Machine Learning - ICML ‘22 (2022). [https://proceedings.mlr.press/v162/yilmaz22a.html]().
For more information on how to cite different versions of Casanovo, please see [https://casanovo.readthedocs.io/en/latest/cite.html]().
Usage
casanovo [OPTIONS] COMMAND [ARGS]...
configure#
Generate a Casanovo configuration file to customize.
The Casanovo configuration file is in the YAML format.
Usage
casanovo configure [OPTIONS]
Options
- -d, --output_dir <output_dir>#
The destination directory for output files.
- -o, --output_root <output_root>#
The root name for all output files.
- -f, --force_overwrite#
Whether to overwrite output files.
- Default:
False
- -v, --verbosity <verbosity>#
Set the verbosity of console logging messages. Log files are always set to ‘debug’.
- Options:
debug | info | warning | error
db-search#
Perform a database search on MS/MS data using Casanovo-DB.
PEAK_PATH must be one or more mzML, mzXML, or MGF files. FASTA_PATH must be one FASTA file.
Usage
casanovo db-search [OPTIONS] PEAK_PATH... FASTA_PATH
Options
- --export#
Dumps peptides digested from data for debugging. Contains mass of peptide, sequence, and proteins it is associated with
- -d, --output_dir <output_dir>#
The destination directory for output files.
- -o, --output_root <output_root>#
The root name for all output files.
- -f, --force_overwrite#
Whether to overwrite output files.
- Default:
False
- -v, --verbosity <verbosity>#
Set the verbosity of console logging messages. Log files are always set to ‘debug’.
- Options:
debug | info | warning | error
- -m, --model <model>#
Either the model weights (.ckpt file) or a URL pointing to the model weights file. If not provided, Casanovo will try to download the latest release automatically.
- -c, --config <config>#
The YAML configuration file overriding the default options.
Arguments
- PEAK_PATH#
Required argument(s)
- FASTA_PATH#
Required argument
sequence#
De novo sequence peptides from tandem mass spectra.
PEAK_PATH must be one or more mzML, mzXML, or MGF files from which to sequence peptides. If evaluate is set to True PEAK_PATH must be one or more annotated MGF file.
Usage
casanovo sequence [OPTIONS] PEAK_PATH...
Options
- -e, --evaluate#
Run in evaluation mode. When this flag is set the peptide and amino acid precision will be calculated and logged at the end of the sequencing run. All input files must be annotated MGF files if running in evaluation mode.
- -d, --output_dir <output_dir>#
The destination directory for output files.
- -o, --output_root <output_root>#
The root name for all output files.
- -f, --force_overwrite#
Whether to overwrite output files.
- Default:
False
- -v, --verbosity <verbosity>#
Set the verbosity of console logging messages. Log files are always set to ‘debug’.
- Options:
debug | info | warning | error
- -m, --model <model>#
Either the model weights (.ckpt file) or a URL pointing to the model weights file. If not provided, Casanovo will try to download the latest release automatically.
- -c, --config <config>#
The YAML configuration file overriding the default options.
Arguments
- PEAK_PATH#
Required argument(s)
train#
Train a Casanovo model on your own data.
TRAIN_PEAK_PATH must be one or more annoated MGF files, such as those provided by MassIVE-KB, from which to train a new Casnovo model.
Usage
casanovo train [OPTIONS] TRAIN_PEAK_PATH...
Options
- -p, --validation_peak_path <validation_peak_path>#
An annotated MGF file for validation, like from MassIVE-KB. Use this option multiple times to specify multiple files. Loss from these files contributes to the aggregate valid_CELoss used for checkpoint selection.
- -t, --tracking_peak_path <tracking_peak_path>#
An annotated MGF file used to monitor validation loss during training without influencing checkpoint selection (useful for detecting catastrophic forgetting). Use this option multiple times to specify multiple files.
- --load_all_states#
Flag to indicate whether all states are loaded when re-starting training, or only the weights. Defaults to False.
- -d, --output_dir <output_dir>#
The destination directory for output files.
- -o, --output_root <output_root>#
The root name for all output files.
- -f, --force_overwrite#
Whether to overwrite output files.
- Default:
False
- -v, --verbosity <verbosity>#
Set the verbosity of console logging messages. Log files are always set to ‘debug’.
- Options:
debug | info | warning | error
- -m, --model <model>#
Either the model weights (.ckpt file) or a URL pointing to the model weights file. If not provided, Casanovo will try to download the latest release automatically.
- -c, --config <config>#
The YAML configuration file overriding the default options.
Arguments
- TRAIN_PEAK_PATH#
Required argument(s)
version#
Get the Casanovo version information.
Usage
casanovo version [OPTIONS]