Querying NCBI database in R

  1. Follow ( https://vlab.amrita.edu/index.php?sub=3&brch=311&sim=1835&cnt=2) to install R in personal computer.

  2. Install the SeqinR package.

  3. To load “SeqinR” R package follow > library("seqinr")

  4. For retrieving a specific sequence with particular NCBI accession, use R function “getncbiseq()”.

  > getncbiseq <- function(accession) { require("seqinr") # this function requires the SeqinR R package`

  first find which ACNUC database the accession is stored in:

    dbs <- c("genbank","refseq","refseqViruses","bacterial")
              numdbs <- length(dbs)
              for (i in 1:numdbs)
              db <- dbs[i]

check if the sequence is in ACNUC database 'db':

    resquery <- try(query(".tmpquery", paste("AC=", accession)), silent = TRUE)
              if (!(inherits(resquery, "try-error")))
              queryname <- "query2"
              thequery <- paste("AC=",accession,sep="")

see if a sequence was retrieved:

    seq <- getSequence(query2$req[[1]])
              print(paste("ERROR: accession",accession,"was not found"))


EXAMPLE • After entering function getncbiseq() into R, retrieve a sequence from the NCBI Nucleotide database, for example accession NC_001477.

• Follow the code

    > dengueseq <- getncbiseq("NC_001477")

• Dengueseq variable is a vector containing the nucleotide sequence


Procedure to work the simulator

  1. Follow the code in the command window:

          choosebank("genbank") # ChossingGenBank for the sequence
          choosebank("refseqViruses") # Choosing the sub database which we want to search
          test_query<- query("Dengue1", "AC=NC_001477") # querying and setting the name and passing accession number
          attributes(test_query) # for viewing the attriutes of query()
          dengueseq<- getSequence(test_query$req[[1]]) # get the dna sequence
          seqname<- getName(test_query)
          #write.fasta(sequences = dengueseq, names = seqname, file.out="denguevirus.fasta")


Fig.3. R console for querying NCBI database in R


  1. Click Execute Button for output.



This experiment retrieves the sequence data directly from the NCBI database using R programming. Using the choosebank() function in the ‘seqinr’ library the R platform connects to the GenBank database. Using the R programming, user can retrieve and save the species specific sequence data in a variable.