rna_library

Package Contents

Classes

BasePair

Enumerated type for canoncial and wobble basepairs.

Nucleotide

Enumerated type for all nucleotide types.

MotifType

Enumerated type for all motif types

Motif

Abstract base class that Hairpin(), Helix(), Junction() and SingleStrand() all inherit from.

Helix

Represents a helix or stack in an RNA structure. Inherits from Motif().

Hairpin

Represents a hairpin loop in an RNA structure. Inherits from Motif().

Junction

Represents a junction of any size in an RNA structure including bulges and multi-loops.

SingleStrand

Represents a single stranded region in an RNA structure. Does not include unpaired regions that are part of a Junction() or Helix().

SecStruct

Represents a

JunctionData

Composite class that represents a collection of JunctionEntry objects in an experiment.

JunctionEntry

Represents a single junction entry from an RNA construct

Functions

parse_to_motifs(structure, sequence)

Method takes a structure sequence pair and returns a root Motif() with a complete associated graph.

highest_id(m, best = 0)

Figures out the highest id number in a given Motif() graph.

build_react_df(**kwargs)

Builds the reactivity dataframe from the supplied arguments. Here each row reprsents a construct.

build_motif_df(df)

Function that creates a motif dataframe from a reactivity dataframe. Here each row represents a Motif.

normalize_hairpin(df, seq, ss, **kwargs)

Normalizes a reactivity pattern to a normalization hairpin. Creates fully normalize values

normalize_coeff_fit(reactivity_df)

build_barcodes(secstruct, start = None, distance = 3)

Attributes

__author__

__email__

__version__

ALLOWED_PAIRS

A set() containing all 6 canonical and wobble basepairings.

BPS

A tuple() of allowed canonical and wobble basepairings.

LEGAL_BPS

A set() of all 4 allowed nucleotide types.

NTS

A tuple() of all 4 canonical nucleotide types. Ordered

BP_VALS

A list() that contains the integer values for all of the

BASEPAIR_MAPPER

A dict() object that maps a canonical basepair to its

NT_VALS

A list() that contains the integer values for all of the

NUCLEOTIDE_MAPPER

A dict() object that maps a canoncial nucleotide to its

TYPE_MAPPER

A :class: dict() object that maps a :class: MotifType to its value

rna_library.__author__ = Chris Jurich
rna_library.__email__ = cjurich2@huskers.unl.edu
rna_library.__version__ = 0.1.0
rna_library.ALLOWED_PAIRS

A set() containing all 6 canonical and wobble basepairings.

rna_library.BPS = ['GU', 'UG', 'AU', 'UA', 'GC', 'CG']

A tuple() of allowed canonical and wobble basepairings. Ordered for easy conversion by the BasePair() class.

rna_library.LEGAL_BPS

A set() of all 4 allowed nucleotide types.

rna_library.NTS = ['A', 'C', 'G', 'U']

A tuple() of all 4 canonical nucleotide types. Ordered for each conversion by the Nucleotide() class.

class rna_library.BasePair

Bases: enum.IntEnum

Enumerated type for canoncial and wobble basepairs.

GU = 0
UG = 1
AU = 2
UA = 3
GC = 4
CG = 5
is_GU(self)
Returns

If the instance is a UG or GU pair.

Return type

bool()

is_AU(self)
Returns

If the instance is a UA or AU pair.

Return type

bool()

is_GC(self)
Returns

If the instance is a CG or GC pair.

Return type

bool()

is_canoncial(self)
Returns

If the instance is a canonical Watson-Crick basepair.

Return type

bool()

to_str(self)
Returns

The BasePair() instance in text form.

Return type

str()

rna_library.BP_VALS

A list() that contains the integer values for all of the BasePair() enumerations.

rna_library.BASEPAIR_MAPPER

A dict() object that maps a canonical basepair to its BasePair() value.

class rna_library.Nucleotide

Bases: enum.IntEnum

Enumerated type for all nucleotide types.

A = [0]
C = [1]
G = [2]
U = 3
to_str(self)
Returns

The Nucleotide() instance in text form.

Return type

str()

rna_library.NT_VALS

A list() that contains the integer values for all of the Nucleotide() enumerations.

rna_library.NUCLEOTIDE_MAPPER

A dict() object that maps a canoncial nucleotide to its Nucleotide() value.

class rna_library.MotifType

Bases: enum.IntEnum

Enumerated type for all motif types

UNASSIGNED = 0
SINGLESTRAND = 1
HELIX = 2
HAIRPIN = 3
JUNCTION = 4
rna_library.TYPE_MAPPER

A :class: dict() object that maps a :class: MotifType to its value as a str().

class rna_library.Motif(**kwargs)

Bases: abc.ABC

Abstract base class that Hairpin(), Helix(), Junction() and SingleStrand() all inherit from.

Method used to link a Motif() object to its children and vice versa. Should only be called once by the root Motif().

Parameters

depth (int) – depth of the current Motif() object. defaults to 0

Return type

None

str(self)

Creates a recursive string representation of the current Motif() object.

Return type

str()

__eq__(self, other)

Overloaded == operator for Motif(). Requires that type of motif, sequence and token are identical.

Parameters

other (Motif) – Another Motif() to be compared against.

Return type

bool

__str__(self)

String representation of just the motif at hand.

Returns

The str() representation of the Motif().

Return type

str()

is_helix(self)

If the motif is a helix or not. Overridden by child Helix() class.

Returns

If the motif is of type Helix()

Return type

bool()

is_singlestrand(self)

If the motif is a singlestrand or not. Overridden by child SingleStrand() class.

Returns

If the motif is of type SingleStrand()

Return type

bool()

is_hairpin(self)

If the motif is a hairpin or not. Overridden by child Hairpin() class.

Returns

If the motif is of type Hairpin()

Return type

bool

is_junction(self)

If the motif is a junction or not. Overridden by child Junction() class.

Returns

If the motif is of type Junction()

Return type

bool()

type(self)

Returns the MotifType() type for the given motif.

Returns

The MotifType() enum value for the given motif.

Return type

MotifType()

children(self)

Getter for the Motif()’s child motifs. Returned as a list for iteration. Only returns direct children or an empty list if the motif has not children.

Returns

A list() of Motif() if the current Motif() has any.

Return type

list[Motif]

add_child(self, other)

Appends a new Motif() to the internal list of children for the current Motif().

Warning

Should NOT be called directly. Other function calls must occur to ensure that the internal graph is accurate.

Param

Motif other: Another Motif() to be appended to the internal children list.

Parameters

other (Motif) –

Return type

None

set_children(self, other)

Sets the entire list of Motif() to the internal list of children for the current Motif().

Warning

Should NOT be called directly. Other function calls must occur to ensure that the internal graph is accurate.

Parameters

other (List[Motif]) – Another Motif() to be appended to the internal children list.

Return type

None

parent(self, other)

Sets the Motif()’s parent to the supplied Motif().

Parameters

other (Motif) – The new parent for the current Motif().

Returns

None

Return type

NoneType

parent(self)

Gets the parent Motif()’s for the current Motif().

Returns

the parent motif

Return type

Motif()

token(self, tk)

Sets the Motif()’s identifying token to an inputted string. Input is NOT validated.

Parameters

tk (str) – the new token for the Motif().

Returns

None

Return type

NoneType

token(self)

Gets the identifying token for the Motif().

Returns

token

Return type

str

structure(self, secstruct)

Sets the Motif()’s structure to an inputted string. Input is NOT validated.

Parameters

tk (str) – the new structure for the Motif().

Returns

None

Return type

NoneType

structure(self)

Gets the secondary structure for the Motif().

Returns

token

Return type

str

strands(self)

Returns a list of list of int()’ss where each sub list contains a contiguous set of nucleotides that “belong” to the Motif(). Output varies by motif type and the expected values are below:

Returns

strands

Return type

List[List[int]]

sequence(self)

Gets the sequence for the Motif(). Because the nucleotides owned by the Motif() may not be contiguous, breaks will be separated by an ampersand ‘&’.

Returns

sequence

Return type

str

sequence(self, seq)

Sets the sequence for the Motif() to the supplied string. Warning the input NOT validated.

Parameters

seq (str) – the new sequence for the Motif().

id(self)

Gets the id int value for the given Motif().

Returns

id

Return type

int

id(self, new_id)

Sets the id for the Motif(). Warning: It is NOT currently validated.

Parameters

new_id (int) – the new id for the Motif()

Returns

none

Return type

NoneType

depth(self)

The depth of the Motif(), which describes how deep it is in the internal graph.

Returns

depth

Return type

int

depth(self, value)

Sets the depth of the current Motif().

Parameters

value (int) – the new depth value for the current Motif().

abstract buffer(self)

Buffer refers to the size of the closest adjacent :motif:`Helix()`. Varies by type of motif as seen below:

  • Helix() => size of the helix itself

  • Hairpin() => size of its parent helix

  • SingleStrand() => -1, meaningless in this context

  • Junction() => a list() of the branching helices’ length with the parent helix going first the in the direction of increasing nucleotide index.

Returns

buffer

Return type

int

has_children(self)

Returns whether the Motif() has any children.

Returns

has_children

Return type

bool

has_parent(self)

Returns whether the Motif() has a parent.

Returns

has_parent

Return type

bool

abstract recursive_sequence(self)

Builds and returns the continguous sequence of the structure viewing the current Motif() as the root of the structure. The returned sequence will be part of the main sequence.

Returns

sequence

Return type

str

abstract recursive_structure(self)

Builds and returns the continguous structure of the structure viewing the current Motif() as the root of the structure. The returned structure will be part of the main structure.

Returns

structure

Return type

str

abstract has_non_canonical(self)

Checks if the Motif() has any non-canonical (i.e. non AU/UA, UG/GU or GC/CG) pairings.

Returns

has_nc

Return type

bool

same_pattern(self, sequence)

Checks if a template sequence is compatible with an inputted sequence. Specifically if the length and placement of ‘&’ are the same.

Parameters

sequence (str) – template string to compare against.

Returns

is_same

Return type

bool

start_pos(self)

Starting (or lowest) nucleotide index owned by the Motif().

Returns

start_pos

Return type

int

end_pos(self)

Ending (or highest) nucleotide index owned by the Motif().

Returns

end_pos

Return type

int

contains(self, pos)

Indicates if a nucleotide index is contained or belongs to the current Motif().

Parameters

pos (list[int]) – the querying index

Returns

is_contained

Return type

bool

sequences(self, seqs)

Used to set the internal list of barcode temp sequences.

Parameters

seqs (List[str]) – the new barcode sequences to be applied to the current Motif().

Return type

None

number_sequences(self)

Gives the number of barcode sequences that the Motif() currently has.

Returns

num_sequence

Return type

int

set_sequence(self, idx)

Sets the current sequence to the sequence of the existing index from the internal barcodes list. Note that the Motif.number_sequences() method should be queried prior so that the index call will be known to be valid.

Parameters

idx (int) – The index to be used.

Return type

None

abstract generate_sequences(self)

Builds out all possible barcode sequences that fit the known constraints.

is_barcode(self)

Returns whether the current Motif() serves as a barcode.

Returns

is_barcode

Return type

bool

class rna_library.Helix(**kwargs)

Bases: rna_library.structure.motif.Motif

Represents a helix or stack in an RNA structure. Inherits from Motif().

size(self)

Returns the size of the Helix() which is just the number of pairs in the stack.

Returns

size

Return type

int

size(self, val)

Sets the current size for the Helix().

Parameters

val (int) – the new size of the helix.

buffer(self)

Returns the buffer of the Helix() which is just the number of pairs in the stack.

Returns

buffer

Return type

int

pairs(self)

Returns the basepairs in the stack as a list of strings of length 2. Pairs are returned in order of lowest 3 prime starting index.

Returns

pairs

Return type

List[str]

is_helix(self)

Indicates that the Motif() is of type Helix().

Returns

is_helix

Return type

bool

recursive_structure(self)

Builds and returns the continguous sequence of the structure viewing the current Motif() as the root of the structure. The returned sequence will be part of the main sequence.

Returns

sequence

Return type

str

recursive_sequence(self)

Builds and returns the continguous structure of the structure viewing the current Motif() as the root of the structure. The returned structure will be part of the main structure.

Returns

structure

Return type

str

has_non_canonical(self)

Checks if any of the basepairs are non-canonical (i.e. non- AU/UA, GU/UG, GC/CG).

Returns

has_non_canonical

Return type

bool

generate_sequences(self)

Generates all possible sequences for the Helix() that are compatible with the constraints for the motif.

class rna_library.Hairpin(**kwargs)

Bases: rna_library.structure.motif.Motif

Represents a hairpin loop in an RNA structure. Inherits from Motif().

buffer(self)

For the Hairpin() type, this is simply the size of the closing helix meaning the number of closing pairs.

Returns

buffer

Return type

int

is_hairpin(self)

Indicates that the Motif() is of type Hairpin().

Returns

is_hairpin

Return type

bool

recursive_structure(self)

Returns the owned portion of the structure. In this coding of structure it is just the loop portion and does not include the closing pair.

Returns

recursive_structure

Return type

str

recursive_sequence(self)

Returns the owned portion of the sequence. In this coding of sequence it is just the loop portion and does not include the closing pair.

Returns

recursive_sequence

Return type

str

has_non_canonical(self)

Returns whether or not the closing pair is canonical (i.e. is AU/UA, CG/GC, GU/UG).

Returns

has_non_canonical

Return type

bool

generate_sequences(self)

Generates all possible sequences for the Hairpin() that are compatible with the constraints for the motif.

class rna_library.Junction(**kwargs)

Bases: rna_library.structure.motif.Motif

Represents a junction of any size in an RNA structure including bulges and multi-loops.

dms_active_idxs(self)
buffer(self)

For the Junction() type this is a list() of int()’s where the first is the size of the parent Helix() and then they are arranged in 3’ to 5’ order. Will have the same size as number of branches in the Jucntion().

Returns

buffers

Return type

List[int]

gaps(self)

Returns a list() of int()’s of gap sizes in 3’ to 5’ order. Will have the same size as number of branches in the Jucntion().

Returns

gaps

Return type

List[int]

is_junction(self)

Indicates that the Motif() is of type Junction().

Returns

is_hairpin

Return type

bool

recursive_structure(self)

Returns the owned portion of the structure. In this coding of structure it is the closing pairs as well as the child Helix()’s and their children.

Returns

recursive_structure

Return type

str

recursive_sequence(self)

Returns the owned portion of the sequence. In this coding of structure it is the closing pairs as well as the child Helix()’s and their children.

Returns

recursive_sequence

Return type

str

closing_pairs(self)

Returns a list() of str()’s that correspond to the closing pairs in the Junction() Motif.

Returns

closing_pairs

Return type

List[str]

has_non_canonical(self)

Returns whether or not any of the closing pairs are non-canonical (i.e. not AU/UA, CG/GC, GU/UG).

Returns

has_non_canonical

Return type

bool

number_branches(self)

Returns the number of branches in the current Junction().

Returns

number_branches

Return type

int

symmetric(self)

Indicates if the current Junction() is symmetric, that is the sizes of all of the gaps are the same.

Returns

is_symmetric

Return type

bool

generate_sequences(self)

Would generate all possible sequences for the Junction() that are compatible with the constraints for the motif. Not currently implemented.

Raises

TypeError

class rna_library.SingleStrand(**kwargs)

Bases: rna_library.structure.motif.Motif

Represents a single stranded region in an RNA structure. Does not include unpaired regions that are part of a Junction() or Helix().

buffer(self)

For the SingleStrand() type, this does not have any meaning and is always the value -1.

Returns

buffer

Return type

int

is_singlestrand(self)

Indicates that the Motif() is of type SingleStrand().

Returns

is_singlestrand

Return type

bool

recursive_structure(self)

Returns the owned portion of the structure. In this coding of structure it is just the nucleotides in the single strand plus its child if it exists.

Returns

recursive_structure

Return type

str

recursive_sequence(self)

Returns the owned portion of the sequence. In this coding of sequence it is just the nucleotides in the single strand plus its child if it exists.

Returns

recursive_sequence

Return type

str

has_non_canonical(self)

Because there are no pairs “owned” by SingleStrand()’s, it always returns False.

Returns

has_non_canonical

Return type

bool

generate_sequences(self)

Generates all possible sequences for the SingleStrand() that are compatible with the constraints for the motif.

class rna_library.SecStruct(secstruct, sequence)

Represents a

set_ids_(self, m)
Parameters

m (Motif) –

display(self)
property sequence(self)
property structure(self)
helix_replace_(self, id, secstruct, sequence)
motif_replace_(self, id, new_secstruct, new_sequence)
change_motif(self, id, new_secstruct, new_sequence)
get_sequence_structure(self)
_get_ids_internal(self, m, ids, mtype)
get_ids(self, motif_type)
get_motif(self, id)
Return type

Motif

get_substructure(self, id1, id2=None)
get(self, id)
__iter__(self)
itermotifs(self)
hairpins(self, **kwargs)
helix(self)
junctions(self)
singlestrands(self)
set_barcode(self, m_id, bc_seq)
__add__(self, other)
rna_library.parse_to_motifs(structure, sequence)

Method takes a structure sequence pair and returns a root Motif() with a complete associated graph.

Parameters
  • structure (str) – a valid dot-bracket structure

  • sequence (str) – the corresponding sequence composed of the alphabet [ACGUTNB]

Returns

motif

Return type

Motif

rna_library.highest_id(m, best=0)

Figures out the highest id number in a given Motif() graph.

Parameters
  • m (Motif) – motif to start the query on

  • best (int) – current highest or “best” motif id at that recursion level.

Returns

highest_id

Return type

int

class rna_library.JunctionData(**kwargs)

Composite class that represents a collection of JunctionEntry objects in an experiment.

get_active_data(self)
rebuild_data(self)

Method that rebuilds the internal data representation from the JunctionEntry objects.

Return type

NoneType

is_symmetrical(self)

Getter that tells if the current JunctionData object models a symmetrical junction. :rtype: bool

plot(self, plot_dir, overwrite=False)

Method that saves a plot of the JunctionData’s data points to the supplied directory.

Param

str plot_dir: The directory where the plot will be saved. Does not have to exist.

Return type

NoneType

Parameters

plot_dir (str) –

show(self)

Method that brings up a plot of the JunctionData’s data points

Return type

NoneType

bind(self, ax)

Method that binds the JunctionData points to a supplied matplotlib Axes object.

Param

matplotlib.axes.Axes ax: the Axes object which the plot will be bound to

Return type

NoneType

Parameters

ax (matplotlib.axes.Axes) –

measure_variance(self)
Return type

Dict[str, float]

class rna_library.JunctionEntry(**kwargs)

Represents a single junction entry from an RNA construct

validate_arguments_(self)

Helper method that validates arguments in the constructor.

key(self)

Getter that accesses the (sequence, structure) key for the JunctionEntry

Return type

Tuple[str,str]

is_symmetrical(self)

Getter that checks if the JunctionEntry is for a symmetrical unction.

Return type

bool

__getitem__(self, idx)
Parameters

idx (int) –

Return type

float

rna_library.build_react_df(**kwargs)
Builds the reactivity dataframe from the supplied arguments. Here each row reprsents a construct.

Note that all arguments are supplied as kwargs.

Params

str out_dir: base output directory where rna_library.process_histos was called :params: str start_seq: common start sequence for the RNA constructs

Params

str end_seq: common end sequence for the RNA constructs

Params

str fasta_file: path to the fast file for the construct

Params

str histos_file: path to histogram file from DREEM analysis :rtype: pd.DataFrame

Return type

pandas.DataFrame

rna_library.build_motif_df(df)

Function that creates a motif dataframe from a reactivity dataframe. Here each row represents a Motif.

Param

pd.DataFrame df: reactivity dataframe which is generated from build_react_df() :rtype: pd.DataFrame

Parameters

df (pandas.DataFrame) –

Return type

pandas.DataFrame

rna_library.normalize_hairpin(df, seq, ss, **kwargs)

Normalizes a reactivity pattern to a normalization hairpin. Creates fully normalize values for an entire pd.DataFrame

Param

pd.DataFrame df: reactivity_df created from rna_library.build_react_df

Param

str seq: reference hairpin sequence

Param

str ss: reference hairpin structure

Param

float factor: factor to set the hairpin values to, is a keyword argument

Param

str nts: string of nucleotide’s to be used for calc, is a keywrod argument

Parameters
  • df (pandas.DataFrame) –

  • seq (str) –

  • ss (str) –

Return type

List[List[float]]

rna_library.normalize_coeff_fit(reactivity_df)
rna_library.build_barcodes(secstruct, start=None, distance=3)
Parameters
  • secstruct (str) –

  • start (Union[str, None]) –

  • distance (int) –