Computing scoring matrices for amino acids and long pairwise alignment in R

Procedure to run the simulator  

  1. User can enter the sequence to be compared in respective “ Sequence 1 and Sequence 2” boxes.
  2. Retrieve sequence data in FASTA format from NCBI.   Consider examples,  

Sequence 1 - Keratin[Homosapiens]  

AAB59562.1 keratin [Homo sapiens] 

  MTTCSRQFTSSSSMKGSCGIGGGIGAGSSRISSVLAGGSCRAPNTYGGGLSVSSSRFSSGGAYGLGGGYG
            GGFSSSSSSFGSGFGGGYGGGLGAGLGGGFGGGFAGGDGLLVGSEKVTMQNLNDRLASYLDKVRALEEAN
            ADLEVKIRDWYQRQRPAEIKDYSPYFKTIEDLRNKILTATVDNANVLLQIDNARLAADDFRTKYETELNL
            RMSVEADINGLRRVLDELTLARADLEMQIESLKEELAYLKKNHEEEMNALRGQVGGDVNVEMDAAPGVDL
            SRILNEMRDQYEKMAEKNRKDAEEWFFTKTEELNREVATNSELVQSGKSEISELRRTMQNLEIELQSQLS
            MKASLENSLEETKGRYCMQLAQIQEMIGSVEEQLAQLRCEMEQQNQEYKILLDVKTRLEQEIATYRRLLE
            GEDAHLSSSQFSSGSQSSRDVTSSSRQIRTKVMDVHDGKVVSTHEQVLRTKN
          

 

Sequence 2 - keratin [Rattus norvegicus]  

AAA41473.1 keratin [Rattus norvegicus] 

  MDFSRRSFHRSLSSSSQGPALSTSGSLYRKGTMQRLGLHSVYGGWRHGTRISVSKTTMSYGNHLSNGGDL
            FGGNEKLAMQNLNDRLASYLEKVRSLEQSNSKLEAQIKQWYETNAPSTIRDYSSYYAQIKELQDQIKDAQ
            IENARCVLQIDNAKLAAEDFRLKFETERGMRITVEADLQGLSKVYDDLTLQKTDLEIQIEELNKDLALLK
            KEHQEEVEVLRRQLGNNVNVEVDAAPGLNLGEIMNEMRQKYEILAQKNLQEAKEQFERQTQTLEKQVTVN
            IEELRGTEVQVTELRRSYQTLEIELQSQLSMKESLERTLEETKARYASQLAAIQEMLSSLEAQLMQIRSD
            TERQNQEYNILLDIKTRLEQEIATYRRLLEGEDIKTTEYQLNTLEAKDIKKTRKIKTVVEEVVDGKVVSS
            EVKEIEENI
          

 

 
  1. Then click on Align Seq tab to run simulator.  

     
  2. The alignment score for the given aminoacid sequence matrix is given as output which indicates the relative score obtained by matching two characters in a sequence alignment.  

  3. User can also download simple file given in the GUI and follow Steps 1 and 2 to get output.

     

DIY    

  1. Follow ( https://vlab.amrita.edu/index.php?sub=3&brch=311&sim=1835&cnt=2) to install R in personal computer.

  2. Install the SeqinR package.

  3. To load “SeqinR” R package follow

    library("seqinr")

 Import “seqinr” library to R workspace
           
           Connect to the ACNUC database
           
           Query the sequences using accession number and assign to a variable
           
           Import “Biostrings” library to R workspace
           
           Load Blosum50 scoring matrix
           
           Convert the amino acid sequences into strings
           
           Convert the string into uppercase
           
           Pairwise align the sequences with gap values input
           
           Assign the alignment data into a matrix