Parsing Emboss Water output with Ruby
By kenglish
First, you will need to install the emboss suite on your computer:
sudo apt-get install emboss emboss-lib
If don’t already have the BioRuby installed, you will need that too:
sudo gem install bio --no-ri --no-rdoc
Your first ruby script calling Emboss Water from Ruby:
1 2 3 4 5 | require 'rubygems' require 'bio' test_filename =ARGV.shift target_filename =ARGV.shift result = Bio::EMBOSS.run('water', '-asequence', test_filename, '-bsequence', target_filename) |
Unforntunately, there is not a nice report result class in BioRuby for Emboss Water so you will have to parse the output yourself. Here’s an example script that finds percent similarity:
require 'rubygems' require 'bio' test_filename =ARGV.shift target_filename =ARGV.shift result = Bio::EMBOSS.run('water', '-asequence', test_filename, '-bsequence', target_filename) # result now has the text output of water... # Here's an example of looping through each line of the result to get the similary: test_seq = "" target_seq = "" similarity = '' result.split("\n").each do | line | # This mean if line =~ /^# Aligned_sequences/ puts "Seq '#{test_seq}' has similarity to Seq '#{target_seq}' of #{similarity}" unless (test_seq == "" ) && (target_seq == "") test_seq = "" target_seq = "" end # Get sequence numbers if line =~ /^# (\d+): (\d+)/ test_seq = $2 if $1 == '1' target_seq = $2 if $1 == '2' end # parse similarity if line =~ /^# Similarity:.*\((.*)%\)/ similarity = $1 end end puts "Seq '#{test_seq}' has similarity to Seq '#{target_seq}' of #{similarity}"
Place this in a file called water.rb and run it with frags.fasta and frags1.fasta and the above script will output this.
$ ruby water.rb fastas/frags1.fasta frags.fasta Seq '1' has similarity to Seq '1' of 100.0 Seq '1' has similarity to Seq '2' of 96.6 Seq '1' has similarity to Seq '3' of 64.3 Seq '1' has similarity to Seq '4' of 97.9 Seq '1' has similarity to Seq '5' of 96.9 Seq '1' has similarity to Seq '6' of 94.1 Seq '1' has similarity to Seq '7' of 62.5 Seq '1' has similarity to Seq '8' of 61.1 Seq '1' has similarity to Seq '9' of 62.5 Seq '1' has similarity to Seq '10' of 57.1 Seq '1' has similarity to Seq '11' of 57.4 Seq '1' has similarity to Seq '12' of 97.8 Seq '1' has similarity to Seq '13' of 50.0 Seq '1' has similarity to Seq '14' of 62.5 Seq '1' has similarity to Seq '15' of 97.9 Seq '1' has similarity to Seq '16' of 62.5 Seq '1' has similarity to Seq '17' of 59.1 Seq '1' has similarity to Seq '18' of 55.9 Seq '1' has similarity to Seq '19' of 61.9 Seq '1' has similarity to Seq '20' of 60.0 Seq '1' has similarity to Seq '21' of 56.4 Seq '1' has similarity to Seq '22' of 56.2
Water is the worse name for a program, EVER. Because it is impossible to Google…



November 20th, 2009