=head1 NAME

iPE::SequenceReader - Base class for sequence reader objects.

=head1 DESCRIPTION

This is the base class for all sequence reader objects.  All sequences should be read in with a sequence reader which supports all the functions defined below.  FASTA.pm is the canonical example, but using this object infrastructure allows for different types of sequence file formats to be read in.

=head1 FUNCTIONS

=over 8

=cut

package iPE::SequenceReader;
use strict;

=item new(memberHash)

This new function requires a hash reference with the filehandle for the sequence file defined as a typeglob reference.  An example of this might be \*STDIN.  The script dies on failure.

This base class will handle the opening of the file.  You may call this superclass constructor to open the file or just set the filehandle, or completely override it.

The following keys are required to instantiate iPE::SequenceReader:

=over 8

=item filename

The name of the file to parse

=back

The following are optional keys:

=over 8

=item fh

Filehandle of the file if it is already opened.

=item split_string

If the split_string key is defined, the sequences will be split into arrays which can be accessed via the arr variable.

This is an optional key, since many formats won't need this.  An example of where this is used is quality sequences in standard fasta files.

=back

=item def (), seqRef (), arr ()

def () returns the definition line of the current sequence.
seqRef () returns a reference to the string of the current sequence.
If split_string was supplied in new (), then arr () will return the array of items in the seuquence split on the string that was supplied.
All of these are returned as references, since duplicating a large sequence can be costly.

None of these are defined in this superclass, and they must be overridden if they are expected to be used.

ALL OF THESE SHOULD RETURN REFERENCES.  Duplication is dangerous, since the sizes can be quite large with genomic sequence.

=cut
sub new
{
	my ($class, $m) = @_;
	my $this = bless {}, $class;

    die "Incomplete instantiation of $class.\n" 
        if( !defined $m->{filename} );

    if( !defined $m->{fh} )  {
        my $fh;
        open $fh, "<$m->{filename}" or die __PACKAGE__.
            ": could not open file $m->{filename} for reading.\n";
        $m->{fh} = $fh;
    }

    $this->{filename_}      = $m->{filename};
    $this->{fh_}            = $m->{fh};
    $this->{split_string_}  = $m->{split_string};
    $this->{cur_seq_}    = "";
    $this->{cur_arr_}       = [];
    $this->{cur_def_}    = "";
    #$this->{next_def_}   = "";

    return $this;
}

sub DESTROY {
    my ($this) = @_;
    my $fh = $this->{fh_};
    close $fh;
}

sub _undefed_subroutine {
    my ($this, $name) = @_;
    die __PACKAGE__." does not define subroutine $name.\n".
       "Override in ".ref($this)."\n";
}

sub filename    { shift->{filename_}    }
sub fh          { shift->{fh_}                      }
sub def         { undef } #shift->_undefed_subroutine("def") }
sub seqRef      { undef } #shift->_undefed_subroutine("seq") }
sub arrRef      { undef } #shift->_undefed_subroutine("arr") }

=item type () 

This returns the type of sequence reader, which is generally a loading or nonloading one.  Return values should be load, noload, or undef (in the case that the sequence reader is directly inherited from the SequenceReader baseclass, which shouldn't be done).

=cut
sub type { "undef" }

=item numSeqs ()

This function should return the number of sequences in the current file being read.  For the most part this will be 1, and the base class provides this as a default.  If this is not the case, you can override this.

=cut
sub numSeqs { 1 }

=item close ()

Close the sequence file.  Fasta file is opened upon instantiation and remains open until close () is called.

=cut
sub close { close shift->{fh_} }

=back

=head1 SEE ALSO

L<iPE::Sequence>

=head1 AUTHOR

Bob Zimmermann (rpz@cs.wustl.edu).

=cut

1;
