Remove Invalid XML Characters with an SSIS Visual Basic Script

By kenglish

My boss is forcing us to use Microsoft SQL Service Intergration Services for our ETL process. I Googled around for a bit and could not find a good example of how to do this simple task: open an XML file, read the text, replace any invalid characters and write it back out to the same file. My VB is very rusty but this works pretty well. The ReadVariable portion is specific to SSIS but the rest should be generic. Hopefully, the next poor who needs to do this will be able to find this blog entry!

 
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
Imports System.Text.RegularExpressions
Imports System.Object
 
 
Public Class ScriptMain
 
    Public Sub Main()
        Dts.TaskResult = Dts.Results.Success
        Dim strPath, strXML As String
        strPath = CStr(ReadVariable("User::strFileName"))
        strXML = FileIO.FileSystem.ReadAllText(strPath)
 
        Dim rgx As Regex = New Regex("[\x00-\x08\x0B-\x0C\x0E-\x1F]", RegexOptions.None)
        rgx.Replace(strXML, " ")
        FileIO.FileSystem.WriteAllText(strPath, strXML, False)
    End Sub
 
    Private Function ReadVariable(ByVal varName As String) As Object
        Dim result As Object
 
        Try
            Dim vars As Variables
            Dts.VariableDispenser.LockForRead(varName)
            Dts.VariableDispenser.GetVariables(vars)
            Try
                result = vars(varName).Value
            Catch ex As Exception
                Throw ex
            Finally
                vars.Unlock()
            End Try
        Catch ex As Exception
            Throw ex
        End Try
 
        Return result
    End Function
 
End Class

categoriaProgramming commentoNo Comments dataNovember 24th, 2009
Read All

Install NetBeans jVi plugin

By kenglish

NetBeans is the only IDE with a great VI key binding plugin. This was the sole reason I switched to NetBeans as my Ruby/Rails IDE of choice this year. There are 2 vi plugins for Eclipse: one you have to pay for and the other relies on gVIM. The NetBeans Vi plugin is called jVi and can be found at http://jvi.sourceforge.net. There are a few caveats and extra configuration options that you need to set.

  1. Download the latest jVi release. Unzip the file in your home directory. This will create the directory nbvi-1.2.6.
  2. In the Netbeans menu bar, select Tools | Plugins. Click on the Downloaded tab.
  3. Press the “Add Plugin” button. Browse to the nbvi-1.2.6 directory and select the two file: org-netbeans-modules-jvi.nbm and com-raelity-jvi.nbm. You should now have 2 plugins availabe in the downloaded list: jVi Key Bindings and jVi Core.
  4. Click Install and then click through the installation process. NetBeans will need to restart.
  5. After NetBeans has restarted, from the menu bar, select Tools | Options. Click on the last tab which should be jVi Config.
  6. Select the “Buffer Modifications” panel in the jVi Config screen.
  7. Make sure the ‘expandtab’ value is checked.
  8. Change the value of ‘shiftwidth’ to 2.
  9. Change the value of ‘tabstop’ to 2.

The last option for setting the tabstop and expandtab are pretty important. If you don’t use these, jVi will insert tabs into your Ruby files. This could cause your co-workers to complain about you ruining the formatting in the project.

Congratuations, jVi is now setup in NetBeans. Enjoy.

This blog entry was written for NetBeans 6.7 and with jVi version 1.2.6.

categoriaProgramming commento3 Comments dataNovember 22nd, 2009
Read All

Parsing Emboss Water output with Ruby

By kenglish

First, you will need to install the emboss suite on your computer:

sudo apt-get install emboss emboss-lib

If don’t already have the BioRuby installed, you will need that too:

sudo gem install bio --no-ri --no-rdoc

Your first ruby script calling Emboss Water from Ruby:

1
2
3
4
5
require 'rubygems'
require 'bio'
test_filename =ARGV.shift
target_filename =ARGV.shift
result = Bio::EMBOSS.run('water', '-asequence', test_filename, '-bsequence', target_filename)

Unforntunately, there is not a nice report result class in BioRuby for Emboss Water so you will have to parse the output yourself. Here’s an example script that finds percent similarity:

require 'rubygems'
require 'bio'
test_filename =ARGV.shift
target_filename =ARGV.shift
result = Bio::EMBOSS.run('water', '-asequence', test_filename, '-bsequence', target_filename)
# result now has the text output of water...
# Here's an example of looping through each line of the result to get the similary:
 
test_seq = ""
target_seq = ""
similarity = ''
 
 
result.split("\n").each do | line |
  # This mean
  if line =~ /^# Aligned_sequences/
    puts "Seq '#{test_seq}' has similarity to Seq '#{target_seq}' of #{similarity}"  unless (test_seq == "" ) && (target_seq == "")
    test_seq = ""
    target_seq = ""
  end
  # Get sequence numbers 
  if line =~ /^# (\d+): (\d+)/
     test_seq  = $2 if $1 == '1'
     target_seq = $2 if $1 == '2'
  end
  # parse similarity
  if line =~ /^# Similarity:.*\((.*)%\)/
    similarity  = $1
  end
end
 
puts "Seq '#{test_seq}' has similarity to Seq '#{target_seq}' of #{similarity}"

Place this in a file called water.rb and run it with frags.fasta and frags1.fasta and the above script will output this.

$ ruby water.rb fastas/frags1.fasta frags.fasta 
Seq '1' has similarity to Seq '1' of 100.0
Seq '1' has similarity to Seq '2' of 96.6
Seq '1' has similarity to Seq '3' of 64.3
Seq '1' has similarity to Seq '4' of 97.9
Seq '1' has similarity to Seq '5' of 96.9
Seq '1' has similarity to Seq '6' of 94.1
Seq '1' has similarity to Seq '7' of 62.5
Seq '1' has similarity to Seq '8' of 61.1
Seq '1' has similarity to Seq '9' of 62.5
Seq '1' has similarity to Seq '10' of 57.1
Seq '1' has similarity to Seq '11' of 57.4
Seq '1' has similarity to Seq '12' of 97.8
Seq '1' has similarity to Seq '13' of 50.0
Seq '1' has similarity to Seq '14' of 62.5
Seq '1' has similarity to Seq '15' of 97.9
Seq '1' has similarity to Seq '16' of 62.5
Seq '1' has similarity to Seq '17' of 59.1
Seq '1' has similarity to Seq '18' of 55.9
Seq '1' has similarity to Seq '19' of 61.9
Seq '1' has similarity to Seq '20' of 60.0
Seq '1' has similarity to Seq '21' of 56.4
Seq '1' has similarity to Seq '22' of 56.2

Water is the worse name for a program, EVER. Because it is impossible to Google…

categoriaBioinformatics, Programming commentoNo Comments dataNovember 20th, 2009
Read All

Defining Class methods in a Module

By kenglish

The code should speak for itself. Make sense?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
module Loveable
  module ClassMethods
    def give_hug
    end
  end
  def self.included(base)
    base.extend(ClassMethods)
  end
end
 
class Person
  include Loveable
 
  give_hug
 
end

I fuzzy as to why a certain Rails genius would suggest it is better to do it this way
(see line 7):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
module Loveable
  module ClassMethods
    def give_hug
    end
  end
  def self.included(base)
    base.send :extend, ClassMethods
  end
end
 
class Person
  include Loveable
 
  give_hug
 
end

Feel free to comment…

categoriaProgramming commentoNo Comments dataNovember 20th, 2009
Read All

Install thoughtbot shoulda and rcov (the right way)

By kenglish

Install rcov & ruby-prof (rcov-0.9.6 & ruby-prof-0.7.3 at the time of this writing).

sudo gem install ruby-prof rcov --no-ri --no-rdoc

Update your test/test_helper.rb, add:

require 'shoulda/rails'

Install Thoughbot’s Shoulda gem (shoulda-2.10.2 at the time of this writing). Make sure you have added GemCutter as one of your ruby gem sources.

sudo gem install shoulda --no-ri --no-rdoc

Edit your applicaitons main Rakefile and add:

require(File.join(File.dirname(__FILE__), 'config', 'boot'))
require 'rake'
require 'rake/testtask'
require 'rake/rdoctask'
require 'tasks/rails'
require 'shoulda/tasks'
 
def run_coverage(files)
  rm_f "coverage"
  rm_f "coverage.data"
 
  # turn the files we want to run into a  string
  if files.length == 0
    puts "No files were specified for testing"
    return
  end
 
  files = files.join(" ")
 
  if PLATFORM =~ /darwin/
    exclude = '--exclude "gems/*"'
  else
    exclude = '--exclude "rubygems/*"'
  end
 
  rcov = "rcov --rails -Ilib:test --sort coverage --text-report #{exclude}  --aggregate coverage.data"
  cmd = "#{rcov} #{files}"
  puts cmd
  sh cmd
end
namespace :test do
 
  desc "Measures unit, functional, and integration test coverage"
  task :coverage do
    run_coverage Dir["test/**/*.rb"]
  end
 
  namespace :coverage do
    desc "Runs coverage on unit tests"
    task :units do
      run_coverage Dir["test/unit/**/*.rb"]
    end
    desc "Runs coverage on functional tests"
    task :functionals do
      run_coverage Dir["test/functional/**/*.rb"]
    end
    desc "Runs coverage on integration tests"
    task :integration do
      run_coverage Dir["test/integration/**/*.rb"]
    end
  end
end

Checkout your new coverage rake tasks:

rake -T | grep cov

Should show you:

rake test:coverage                        # Measures unit, functional, and integration test coverage
rake test:coverage:functionals            # Runs coverage on functional tests
rake test:coverage:integration            # Runs coverage on integration tests
rake test:coverage:units                  # Runs coverage on unit tests

categoriaProgramming commento1 Comment dataNovember 19th, 2009
Read All

Already some haters of Google Closure

By kenglish

I saw this article on Sitepoint about Google Closure: How not to write JavaScript. The author claims that Closure is just Java programmers trying to make Javascript like Java. Having spent a lot of time doing ExtJS over the past few months, I’ve grown rather fond of Javascript. I would say the worst part about it is the scoping problems.

categoriaProgramming commentoNo Comments dataNovember 15th, 2009
Read All

Bash script to copy files in order to my Coby mp305

By kenglish

I’m one of those people that refuses to get an IPOD. I think they are too expensive and they don’t play nice with Linux.

Last Christmas, I bought myself the 4GB Coby mp305 because it has more capacity than the Sandisk Sansa m200. The interface is crap compared to the Sansa m200. It doesn’t read the mp3 ID3tags at all. The navigation tree is simply the directory structure.

The major flaw is that it does not always sort files in the directory in the correct order. I finally figured out that it sorts files by the order that they were put on the device. However, for some reason in linux if you do “cp -R”, it doesn’t put them on in the proper order.

Here’s my script to put files on the device, it’s call coby_copy.sh:

#!/bin/bash
 
if [ !-d $1 ]; then
   echo "Source Directory does not exists"
   exit
fi
 
if [ !-d $2 ]; then
   echo "Target Directory does not exists"
   exit
fi
 
echo "arg1 = $1 arg2 = $2"
 
IFS=`echo -en "\n\b"`
 
for FILENAME in `find $1 -type f -iname "*mp3" -print | sort | sed 's/^\.\///'` 
do
  DIR=`dirname $FILENAME`
  mkdir -p $2/$DIR
  echo $FILENAME
  cp $FILENAME "$2/$DIR"
done

To run it:

coby_copy.sh "Harry Potter and Leopard-Walk-Up-to-Dragon"  "/mnt/disk/Audio Books"

categoriaLinux, Programming commento1 Comment dataAugust 18th, 2009
Read All

MsSql: Select table column names

By kenglish

Sometimes I need to match table column names in Microsoft SQL Server. This seems to be the best way to do it:

SELECT COLUMN_NAME, TABLE_NAME 
FROM INFORMATION_SCHEMA.COLUMNS
WHERE COLUMN_NAME LIKE '%COST%'

categoriaProgramming commentoNo Comments dataAugust 18th, 2009
Read All

Install ruby gem libxml-ruby on Ubuntu 9.04 (Jaunty)

By kenglish

Quick note on how to install the libxml-ruby gem on Ubuntu:

sudo apt-get install libxml2 libxml2-dev
sudo gem install libxml-ruby

categoriaLinux, Programming commentoNo Comments dataAugust 18th, 2009
Read All

What makes a really good Ruby IDE?

By kenglish

Chad Woolley over at pivotal has a blog entry about The Great Ruby IDE Smackdown of ’09. He compares the IDE’s by doing a task that no Rails developer will ever need to do. What do we really do all day: Model, View, Controller, Test, routes, etc. The IDE should provide an easy way to switch between these. Netbeans does this. Aptana does this. RudyMine does this. They are all functional and when used properly, very effecient. What really matter to me? VI intergration. Netbeans has this with jVi. I love it.

I got a kick out of this:

“To me, the benefits of a memory- and processor-sucking IDE with tons of unnecessary, unconfigurable, resource-eating tiny-ass-fonts and chrome did not justify giving up the speed and responsiveness of a great text editor.”

Memory- and processor-sucking IDE? Is he running a 486dx? Are Macs really that slow? Dude, switch to linux! Or, here’s 10 Reasons You Should Not Switch To Linux.

Here’s another nice feature of NetBeans that your Text Editor won’t do. Notice on line 127, I have a mispelling of the word worksheet. Netbeans bolds the misspelled varialbe to tell me that I have a variable here that has never been used before. This is very helpful.

NetBeans-coolness

categoriaProgramming commentoNo Comments dataJuly 16th, 2009
Read All