Command-line scripts ==================== pygenbank-search ---------------- ``pygenbank-search`` is a tool to perform searches on GenBank and to retrieve GenBank records, either as document summaries or as full records. The user has to provide an email address for use of the Entrez resource. Type ``pygenbank-search --help`` for detailed usage. See http://www.ncbi.nlm.nih.gov/books/NBK49540/ for more details on GenBank search queries. Examples ******** Please use your own email address as the ``--email`` argument. * Search GenBank and retrieve document summaries:: pygenbank-search --query "hemoglobin AND mammal" --retmax 10000 --email "name@address" > mySearch less -S mysearch * Search GenBank and retrieve full records:: mkdir gbResults # Records will be saved here pygenbank-search -q "myoglobin AND sperm whale" -e "name@address" -d -o gbResults * Specify a length range in the GenBank query:: pygenbank-search -q "carcinus maenas" -e "name@address" -r 10000 > mySearch pygenbank-search -q "carcinus maenas AND 1000:100000[SLEN]" -e "name@address" -r 10000 > mySearchLength * Specify a taxon in the GenBank query:: MY_QUERY="complete genome AND staphylococcus aureus [PORGN]" pygenbank-search -q "$MY_QUERY" -e "name@address" -r 10000 > mySearch * More complex query to get all complete *Staphylococcus aureus* genomes (up to 10000):: MY_QUERY="complete genome AND staphylococcus aureus [PORGN] AND 1000000:10000000 [SLEN]" echo $MY_QUERY pygenbank-search -q "$MY_QUERY" -e "name@address" -r 10000 > mySearch pygenbank-extract-CDS --------------------- ``pygenbank-extract-CDS`` is a tool to extract CDS summaries from GenBank records and to produce fasta file with unique amino-acid sequences if needed (e.g. to prepare a clustering analysis). Type ``pygenbank-extract-CDS --help`` for detailed usage. Examples ******** * Get CDS summaries for all GenBank files in the current directory:: pygenbank-extract-CDS *.gb > mySummaries How to profile command-line script execution -------------------------------------------- `cprofilev` is a convenient tool to visualize the results of a profiling run of a Python script:: sudo pip install cprofilev `cprofilev` can be used to profile the execution of a script this way:: python -m cprofilev myScript.py [args] The ouput is visible at the address ``http://localhost:4000``. Using `cprofilev` with the command-line scripts *********************************************** `pygenbank-search` and `pygenbank-extract-CDS` use entry points in the `genbank.py` module, and cannot be called directly with the Python interpreter to use the `cprofilev` module at the same time (at least I didn't find a way to do it for now). To solve this problem, there is a bit of code added at the end of the `genbank.py` module to make it callable from the python interpreter. The module can then be called with:: python genbank.py search [args] python genbank.py extract-CDS [args] where `[args]` are passed to the corresponding `_main_...` functions. For example:: python genbank.py search -q "hemocyanin" > summaries python genbank.py extract-CDS --help Note that the full path to `genbank.py` must be provided (so here we assume we are running the profiling run from within the module folder). To perform a profiling run:: python -m cprofilev genbank.py extract-CDS -u toto.fasta *.gb > summaries and then visit ``http://localhost:4000`` with a web browser.