Command-line scripts

pygenbank-extract-CDS

pygenbank-extract-CDS is a tool to extract CDS summaries from GenBank records and to produce fasta file with unique amino-acid sequences if needed (e.g. to prepare a clustering analysis).

Type pygenbank-extract-CDS --help for detailed usage.

Examples

  • Get CDS summaries for all GenBank files in the current directory:

    pygenbank-extract-CDS *.gb > mySummaries
    

How to profile command-line script execution

cprofilev is a convenient tool to visualize the results of a profiling run of a Python script:

sudo pip install cprofilev

cprofilev can be used to profile the execution of a script this way:

python -m cprofilev myScript.py [args]

The ouput is visible at the address http://localhost:4000.

Using cprofilev with the command-line scripts

pygenbank-search and pygenbank-extract-CDS use entry points in the genbank.py module, and cannot be called directly with the Python interpreter to use the cprofilev module at the same time (at least I didn’t find a way to do it for now).

To solve this problem, there is a bit of code added at the end of the genbank.py module to make it callable from the python interpreter. The module can then be called with:

python genbank.py search [args]
python genbank.py extract-CDS [args]

where [args] are passed to the corresponding _main_... functions. For example:

python genbank.py search -q "hemocyanin" > summaries
python genbank.py extract-CDS --help

Note that the full path to genbank.py must be provided (so here we assume we are running the profiling run from within the module folder).

To perform a profiling run:

python -m cprofilev genbank.py extract-CDS -u toto.fasta *.gb > summaries

and then visit http://localhost:4000 with a web browser.