| Class | Bio::GCG::Seq |
| In: |
lib/bio/appl/gcg/seq.rb
|
| Parent: | Object |
This is GCG sequence file format (.seq or .pep) parser class.
www.accelrys.com/products/gcg_wisconsin_package .
www.hgmp.mrc.ac.uk/Software/EMBOSS/Themes/SequenceFormats.html
| DELIMITER | = | RS = nil | delimiter used by Bio::FlatFile |
| checksum | [R] | "Check:" field, which indicates checksum of current sequence. |
| date | [R] | Date field of this entry. |
| definition | [R] | Description field. |
| entry_id | [R] | ID field. |
| heading | [R] | heading (’!!NA_SEQUENCE 1.0’ or whatever like this) |
| length | [R] | "Length:" field. Note that sometimes this might differ from real sequence length. |
| seq_type | [R] | "Type:" field, which indicates sequence type. "N" means nucleic acid sequence, "P" means protein sequence. |
Calculates checksum from given string.
# File lib/bio/appl/gcg/seq.rb, line 141
141: def self.calc_checksum(str)
142: # Reference: Bio::SeqIO::gcg of BioPerl-1.2.3
143: idx = 0
144: sum = 0
145: str.upcase.tr('^A-Z.~', '').each_byte do |c|
146: idx += 1
147: sum += idx * c
148: idx = 0 if idx >= 57
149: end
150: (sum % 10000)
151: end
Creates new instance of this class. str must be a GCG seq formatted string.
# File lib/bio/appl/gcg/seq.rb, line 38
38: def initialize(str)
39: @heading = str[/.*/] # '!!NA_SEQUENCE 1.0' or like this
40: str = str.sub(/.*/, '')
41: str.sub!(/.*\.\.$/m, '')
42: @definition = $&.to_s.sub(/^.*\.\.$/, '').to_s
43: desc = $&.to_s
44: if m = /(.+)\s+Length\:\s+(\d+)\s+(.+)\s+Type\:\s+(\w)\s+Check\:\s+(\d+)/.match(desc) then
45: @entry_id = m[1].to_s.strip
46: @length = (m[2] ? m[2].to_i : nil)
47: @date = m[3].to_s.strip
48: @seq_type = m[4]
49: @checksum = (m[5] ? m[5].to_i : nil)
50: end
51: @data = str
52: @seq = nil
53: @definition.strip!
54: end
Creates a new GCG sequence format text. Parameters can be omitted.
Examples:
Bio::GCG::Seq.to_gcg(:definition=>'H.sapiens DNA',
:seq_type=>'N', :entry_id=>'gi-1234567',
:seq=>seq, :date=>date)
# File lib/bio/appl/gcg/seq.rb, line 161
161: def self.to_gcg(hash)
162: seq = hash[:seq]
163: if seq.is_a?(Bio::Sequence::NA) then
164: seq_type = 'N'
165: elsif seq.is_a?(Bio::Sequence::AA) then
166: seq_type = 'P'
167: else
168: seq_type = (hash[:seq_type] or 'P')
169: end
170: if seq_type == 'N' then
171: head = '!!NA_SEQUENCE 1.0'
172: else
173: head = '!!AA_SEQUENCE 1.0'
174: end
175: date = (hash[:date] or Time.now.strftime('%B %d, %Y %H:%M'))
176: entry_id = hash[:entry_id].to_s.strip
177: len = seq.length
178: checksum = self.calc_checksum(seq)
179: definition = hash[:definition].to_s.strip
180: seq = seq.upcase.gsub(/.{1,50}/, "\\0\n")
181: seq.gsub!(/.{10}/, "\\0 ")
182: w = len.to_s.size + 1
183: i = 1
184: seq.gsub!(/^/) { |x| s = sprintf("\n%*d ", w, i); i += 50; s }
185:
186: [ head, "\n", definition, "\n\n",
187: "#{entry_id} Length: #{len} #{date} " \
188: "Type: #{seq_type} Check: #{checksum} ..\n",
189: seq, "\n" ].join('')
190: end
If you know the sequence is AA, use this method. Returns a Bio::Sequence::AA object.
If you call naseq for protein sequence, or aaseq for nucleic sequence, RuntimeError will be raised.
# File lib/bio/appl/gcg/seq.rb, line 108
108: def aaseq
109: if seq.is_a?(Bio::Sequence::AA) then
110: @seq
111: else
112: raise 'seq_type != \'P\''
113: end
114: end
If you know the sequence is NA, use this method. Returens a Bio::Sequence::NA object.
If you call naseq for protein sequence, or aaseq for nucleic sequence, RuntimeError will be raised.
# File lib/bio/appl/gcg/seq.rb, line 121
121: def naseq
122: if seq.is_a?(Bio::Sequence::NA) then
123: @seq
124: else
125: raise 'seq_type != \'N\''
126: end
127: end
Sequence data. The class of the sequence is Bio::Sequence::NA, Bio::Sequence::AA or Bio::Sequence::Generic, according to the sequence type.
# File lib/bio/appl/gcg/seq.rb, line 88
88: def seq
89: unless @seq then
90: case @seq_type
91: when 'N', 'n'
92: k = Bio::Sequence::NA
93: when 'P', 'p'
94: k = Bio::Sequence::AA
95: else
96: k = Bio::Sequence
97: end
98: @seq = k.new(@data.tr('^-a-zA-Z.~', ''))
99: end
100: @seq
101: end