genbank module ============== Description ----------- ``genbank`` provides some light wrappers around Biopython functions to download records from the GenBank server. The module can be used from within Python, or through the command line tool ``pygenbank``. Tutorial -------- This is a simple tutorial to learn how to use the :mod:`genbank` module. Setup the environment ********************* :: import genbank as gb # Setup your email address for Entrez gb.Entrez.email = "yourname@youraddress" Search GenBank and retrieve record ids ************************************** Performing a GenBank search is as simple as:: # Perform a GenBank search mySearch = gb.search(term = "hemocyanin") The returned value is a ``Bio.Entrez.Parser.DictionaryElement`` which contains information about the returned results:: mySearch.keys() # Available information mySearch["Count"] # Number of entries found mySearch["QueryTranslation"] # How the query was understood by GenBank mySearch["IdList"] # List of GenBank identifiers returned Any query string that you would be using on the GenBank web page can be used as the term:: mySearch = gb.search(term = "hemocyanin AND lito* [ORGN]") mySearch["Count"] mySearch = gb.search(term = "citrate synthase AND mus m* [ORGN] To get more details about the returned entries, you can fetch the record summaries using the previous search result:: summaries = gb.getDocSum(mySearch) The search results can be used to get summaries of the results and apply some simple filtering on the record id before proceeding to the actual record downloading:: # Get the summaries from the results summaries = gb.getDocSum(mySearch) # Extract the id of interest ("Gi" field) myId = [x["Gi"] for x in summaries if int(x["Length"]) < 10000] len(myId) Download the GenBank records **************************** :: # Download the GenBank records genbank.downloadRecords(idList = myId, destDir = ".", batchSize = 20) Functions --------- .. graphviz:: digraph G { rankdir=LR; subgraph cluster_1 { node[shape=box,style=filled,fillcolor="#dfaf8f"]; _processOutfmtArg; _fileLinesToList; _makeSummaryForCDS; _checkEmailOption; _getRecordBatch; _recordIsWGS; _downloadBatch; _processArgsToLogic_extract_CDS; _parseDocSumXML; _getProteinHashFromCDS; _getDocSumXML; _makeParser_search; _checkRetmax; _summarizeRecord; _downloadWGS; _makeWGSurl; _makeParser_extract_CDS; _processArgsToLogic_search; node[shape=box,style=filled,fillcolor="#7cb8bb"]; getDocSumFromId; downloadRecords; getDocSum; search; downloadWGS; writeDocSums; node[shape=box,style=filled,fillcolor="#9fc59f"]; _main_search; _main_extract_CDS; getDocSum -> _getDocSumXML; getDocSum -> _parseDocSumXML; getDocSumFromId -> _getDocSumXML; getDocSumFromId -> _parseDocSumXML; downloadRecords -> _downloadBatch; _processArgsToLogic_extract_CDS -> _processOutfmtArg; downloadWGS -> _downloadWGS; downloadWGS -> _makeWGSurl; downloadWGS -> _recordIsWGS; _downloadBatch -> downloadWGS; _downloadBatch -> _getRecordBatch; _downloadBatch -> _recordIsWGS; _main_search -> search; _main_search -> getDocSum; _main_search -> _makeParser_search; _main_search -> downloadRecords; _main_search -> getDocSumFromId; _main_search -> _fileLinesToList; _main_search -> _processArgsToLogic_search; _main_search -> writeDocSums; _processArgsToLogic_search -> _checkRetmax; _processArgsToLogic_search -> _checkEmailOption; _main_extract_CDS -> _processArgsToLogic_extract_CDS; _main_extract_CDS -> _makeParser_extract_CDS; _main_extract_CDS -> _summarizeRecord; _summarizeRecord -> _getProteinHashFromCDS; _summarizeRecord -> _makeSummaryForCDS; } } .. automodule:: genbank :members: :private-members: :undoc-members: :show-inheritance: