rna_library¶
Package Contents¶
Classes¶
Enumerated type for canoncial and wobble basepairs. |
|
Enumerated type for all nucleotide types. |
|
Enumerated type for all motif types |
|
Abstract base class that |
|
Represents a helix or stack in an RNA structure. Inherits from |
|
Represents a hairpin loop in an RNA structure. Inherits from |
|
Represents a junction of any size in an RNA structure including bulges and multi-loops. |
|
Represents a single stranded region in an RNA structure. Does not include unpaired regions that are part of a |
|
Represents a |
|
Composite class that represents a collection of JunctionEntry objects in an experiment. |
|
Represents a single junction entry from an RNA construct |
Functions¶
|
Method takes a structure sequence pair and returns a root |
|
Figures out the highest id number in a given |
|
Builds the reactivity dataframe from the supplied arguments. Here each row reprsents a construct. |
|
Function that creates a motif dataframe from a reactivity dataframe. Here each row represents a Motif. |
|
Normalizes a reactivity pattern to a normalization hairpin. Creates fully normalize values |
|
|
|
Attributes¶
A |
|
A |
|
A |
|
A |
|
A |
|
A |
|
A |
|
A |
|
A :class: dict() object that maps a :class: MotifType to its value |
-
rna_library.__email__= cjurich2@huskers.unl.edu¶
-
rna_library.__version__= 0.1.0¶
-
rna_library.ALLOWED_PAIRS¶ A
set()containing all 6 canonical and wobble basepairings.
-
rna_library.BPS= ['GU', 'UG', 'AU', 'UA', 'GC', 'CG']¶ A
tuple()of allowed canonical and wobble basepairings. Ordered for easy conversion by theBasePair()class.
-
rna_library.LEGAL_BPS¶ A
set()of all 4 allowed nucleotide types.
-
rna_library.NTS= ['A', 'C', 'G', 'U']¶ A
tuple()of all 4 canonical nucleotide types. Ordered for each conversion by theNucleotide()class.
-
class
rna_library.BasePair¶ Bases:
enum.IntEnumEnumerated type for canoncial and wobble basepairs.
-
GU= 0¶
-
UG= 1¶
-
AU= 2¶
-
UA= 3¶
-
GC= 4¶
-
CG= 5¶
-
is_GU(self)¶ - Returns
If the instance is a UG or GU pair.
- Return type
bool()
-
is_AU(self)¶ - Returns
If the instance is a UA or AU pair.
- Return type
bool()
-
is_GC(self)¶ - Returns
If the instance is a CG or GC pair.
- Return type
bool()
-
is_canoncial(self)¶ - Returns
If the instance is a canonical Watson-Crick basepair.
- Return type
bool()
-
to_str(self)¶ - Returns
The
BasePair()instance in text form.- Return type
str()
-
-
rna_library.BP_VALS¶ A
list()that contains the integer values for all of theBasePair()enumerations.
-
rna_library.BASEPAIR_MAPPER¶ A
dict()object that maps a canonical basepair to itsBasePair()value.
-
class
rna_library.Nucleotide¶ Bases:
enum.IntEnumEnumerated type for all nucleotide types.
-
A= [0]¶
-
C= [1]¶
-
G= [2]¶
-
U= 3¶
-
to_str(self)¶ - Returns
The
Nucleotide()instance in text form.- Return type
str()
-
-
rna_library.NT_VALS¶ A
list()that contains the integer values for all of theNucleotide()enumerations.
-
rna_library.NUCLEOTIDE_MAPPER¶ A
dict()object that maps a canoncial nucleotide to itsNucleotide()value.
-
class
rna_library.MotifType¶ Bases:
enum.IntEnumEnumerated type for all motif types
-
UNASSIGNED= 0¶
-
SINGLESTRAND= 1¶
-
HELIX= 2¶
-
HAIRPIN= 3¶
-
JUNCTION= 4¶
-
-
rna_library.TYPE_MAPPER¶ A :class: dict() object that maps a :class: MotifType to its value as a str().
-
class
rna_library.Motif(**kwargs)¶ Bases:
abc.ABCAbstract base class that
Hairpin(),Helix(),Junction()andSingleStrand()all inherit from.-
link_children(self, depth=0)¶ Method used to link a
Motif()object to its children and vice versa. Should only be called once by the rootMotif().- Parameters
depth (int) – depth of the current
Motif()object. defaults to 0- Return type
None
-
__eq__(self, other)¶ Overloaded
==operator forMotif(). Requires that type of motif, sequence and token are identical.
-
__str__(self)¶ String representation of just the motif at hand.
-
is_helix(self)¶ If the motif is a helix or not. Overridden by child
Helix()class.- Returns
If the motif is of type
Helix()- Return type
bool()
-
is_singlestrand(self)¶ If the motif is a singlestrand or not. Overridden by child
SingleStrand()class.- Returns
If the motif is of type
SingleStrand()- Return type
bool()
-
is_hairpin(self)¶ If the motif is a hairpin or not. Overridden by child
Hairpin()class.- Returns
If the motif is of type
Hairpin()- Return type
bool
-
is_junction(self)¶ If the motif is a junction or not. Overridden by child
Junction()class.- Returns
If the motif is of type
Junction()- Return type
bool()
-
type(self)¶ Returns the
MotifType()type for the given motif.- Returns
The
MotifType()enum value for the given motif.- Return type
-
children(self)¶ Getter for the
Motif()’s child motifs. Returned as a list for iteration. Only returns direct children or an empty list if the motif has not children.
-
add_child(self, other)¶ Appends a new
Motif()to the internal list of children for the currentMotif().Warning
Should NOT be called directly. Other function calls must occur to ensure that the internal graph is accurate.
-
set_children(self, other)¶ Sets the entire list of Motif() to the internal list of children for the current
Motif().Warning
Should NOT be called directly. Other function calls must occur to ensure that the internal graph is accurate.
-
parent(self, other)¶
-
parent(self)¶ Gets the parent
Motif()’s for the currentMotif().- Returns
the parent motif
- Return type
-
token(self, tk)¶ Sets the
Motif()’s identifying token to an inputted string. Input is NOT validated.- Parameters
tk (str) – the new token for the
Motif().- Returns
None
- Return type
NoneType
-
structure(self, secstruct)¶ Sets the
Motif()’s structure to an inputted string. Input is NOT validated.- Parameters
tk (str) – the new structure for the
Motif().- Returns
None
- Return type
NoneType
-
strands(self)¶ Returns a list of list of
int()’ss where each sub list contains a contiguous set of nucleotides that “belong” to theMotif(). Output varies by motif type and the expected values are below:Hairpin()=> 1Helix()=> 2SingleStrand()=> 1Junction()=> number of branches inJunction()
- Returns
strands
- Return type
List[List[int]]
-
sequence(self)¶ Gets the sequence for the
Motif(). Because the nucleotides owned by theMotif()may not be contiguous, breaks will be separated by an ampersand ‘&’.- Returns
sequence
- Return type
str
-
sequence(self, seq)¶ Sets the sequence for the
Motif()to the supplied string. Warning the input NOT validated.- Parameters
seq (str) – the new sequence for the
Motif().
-
id(self, new_id)¶ Sets the id for the
Motif(). Warning: It is NOT currently validated.- Parameters
new_id (int) – the new id for the
Motif()- Returns
none
- Return type
NoneType
-
depth(self)¶ The depth of the
Motif(), which describes how deep it is in the internal graph.- Returns
depth
- Return type
int
-
depth(self, value)¶ Sets the depth of the current
Motif().- Parameters
value (int) – the new depth value for the current
Motif().
-
abstract
buffer(self)¶ Buffer refers to the size of the closest adjacent :motif:`Helix()`. Varies by type of motif as seen below:
Helix()=> size of the helix itselfHairpin()=> size of its parent helixSingleStrand()=> -1, meaningless in this contextJunction()=> alist()of the branching helices’ length with the parent helix going first the in the direction of increasing nucleotide index.
- Returns
buffer
- Return type
int
-
has_children(self)¶ Returns whether the
Motif()has any children.- Returns
has_children
- Return type
bool
-
abstract
recursive_sequence(self)¶ Builds and returns the continguous sequence of the structure viewing the current
Motif()as the root of the structure. The returned sequence will be part of the main sequence.- Returns
sequence
- Return type
str
-
abstract
recursive_structure(self)¶ Builds and returns the continguous structure of the structure viewing the current
Motif()as the root of the structure. The returned structure will be part of the main structure.- Returns
structure
- Return type
str
-
abstract
has_non_canonical(self)¶ Checks if the
Motif()has any non-canonical (i.e. non AU/UA, UG/GU or GC/CG) pairings.- Returns
has_nc
- Return type
bool
-
same_pattern(self, sequence)¶ Checks if a template sequence is compatible with an inputted sequence. Specifically if the length and placement of ‘&’ are the same.
- Parameters
sequence (str) – template string to compare against.
- Returns
is_same
- Return type
bool
-
start_pos(self)¶ Starting (or lowest) nucleotide index owned by the
Motif().- Returns
start_pos
- Return type
int
-
end_pos(self)¶ Ending (or highest) nucleotide index owned by the
Motif().- Returns
end_pos
- Return type
int
-
contains(self, pos)¶ Indicates if a nucleotide index is contained or belongs to the current
Motif().- Parameters
pos (list[int]) – the querying index
- Returns
is_contained
- Return type
bool
-
sequences(self, seqs)¶ Used to set the internal list of barcode temp sequences.
- Parameters
seqs (List[str]) – the new barcode sequences to be applied to the current
Motif().- Return type
None
-
number_sequences(self)¶ Gives the number of barcode sequences that the
Motif()currently has.- Returns
num_sequence
- Return type
int
-
set_sequence(self, idx)¶ Sets the current sequence to the sequence of the existing index from the internal barcodes list. Note that the Motif.number_sequences() method should be queried prior so that the index call will be known to be valid.
- Parameters
idx (int) – The index to be used.
- Return type
None
-
abstract
generate_sequences(self)¶ Builds out all possible barcode sequences that fit the known constraints.
-
-
class
rna_library.Helix(**kwargs)¶ Bases:
rna_library.structure.motif.MotifRepresents a helix or stack in an RNA structure. Inherits from
Motif().-
size(self)¶ Returns the size of the
Helix()which is just the number of pairs in the stack.- Returns
size
- Return type
int
-
size(self, val)¶ Sets the current size for the
Helix().- Parameters
val (int) – the new size of the helix.
-
buffer(self)¶ Returns the buffer of the
Helix()which is just the number of pairs in the stack.- Returns
buffer
- Return type
int
-
pairs(self)¶ Returns the basepairs in the stack as a list of strings of length 2. Pairs are returned in order of lowest 3 prime starting index.
- Returns
pairs
- Return type
List[str]
-
recursive_structure(self)¶ Builds and returns the continguous sequence of the structure viewing the current
Motif()as the root of the structure. The returned sequence will be part of the main sequence.- Returns
sequence
- Return type
str
-
recursive_sequence(self)¶ Builds and returns the continguous structure of the structure viewing the current
Motif()as the root of the structure. The returned structure will be part of the main structure.- Returns
structure
- Return type
str
-
has_non_canonical(self)¶ Checks if any of the basepairs are non-canonical (i.e. non- AU/UA, GU/UG, GC/CG).
- Returns
has_non_canonical
- Return type
bool
-
-
class
rna_library.Hairpin(**kwargs)¶ Bases:
rna_library.structure.motif.MotifRepresents a hairpin loop in an RNA structure. Inherits from
Motif().-
buffer(self)¶ For the
Hairpin()type, this is simply the size of the closing helix meaning the number of closing pairs.- Returns
buffer
- Return type
int
-
is_hairpin(self)¶ Indicates that the
Motif()is of typeHairpin().- Returns
is_hairpin
- Return type
bool
-
recursive_structure(self)¶ Returns the owned portion of the structure. In this coding of structure it is just the loop portion and does not include the closing pair.
- Returns
recursive_structure
- Return type
str
-
recursive_sequence(self)¶ Returns the owned portion of the sequence. In this coding of sequence it is just the loop portion and does not include the closing pair.
- Returns
recursive_sequence
- Return type
str
-
has_non_canonical(self)¶ Returns whether or not the closing pair is canonical (i.e. is AU/UA, CG/GC, GU/UG).
- Returns
has_non_canonical
- Return type
bool
-
-
class
rna_library.Junction(**kwargs)¶ Bases:
rna_library.structure.motif.MotifRepresents a junction of any size in an RNA structure including bulges and multi-loops.
-
dms_active_idxs(self)¶
-
buffer(self)¶ For the
Junction()type this is alist()ofint()’s where the first is the size of the parentHelix()and then they are arranged in 3’ to 5’ order. Will have the same size as number of branches in theJucntion().- Returns
buffers
- Return type
List[int]
-
gaps(self)¶ Returns a
list()ofint()’s of gap sizes in 3’ to 5’ order. Will have the same size as number of branches in theJucntion().- Returns
gaps
- Return type
List[int]
-
is_junction(self)¶ Indicates that the
Motif()is of typeJunction().- Returns
is_hairpin
- Return type
bool
-
recursive_structure(self)¶ Returns the owned portion of the structure. In this coding of structure it is the closing pairs as well as the child
Helix()’s and their children.- Returns
recursive_structure
- Return type
str
-
recursive_sequence(self)¶ Returns the owned portion of the sequence. In this coding of structure it is the closing pairs as well as the child
Helix()’s and their children.- Returns
recursive_sequence
- Return type
str
-
closing_pairs(self)¶ Returns a
list()ofstr()’s that correspond to the closing pairs in theJunction()Motif.- Returns
closing_pairs
- Return type
List[str]
-
has_non_canonical(self)¶ Returns whether or not any of the closing pairs are non-canonical (i.e. not AU/UA, CG/GC, GU/UG).
- Returns
has_non_canonical
- Return type
bool
-
number_branches(self)¶ Returns the number of branches in the current
Junction().- Returns
number_branches
- Return type
int
-
symmetric(self)¶ Indicates if the current
Junction()is symmetric, that is the sizes of all of the gaps are the same.- Returns
is_symmetric
- Return type
bool
-
generate_sequences(self)¶ Would generate all possible sequences for the
Junction()that are compatible with the constraints for the motif. Not currently implemented.- Raises
TypeError
-
-
class
rna_library.SingleStrand(**kwargs)¶ Bases:
rna_library.structure.motif.MotifRepresents a single stranded region in an RNA structure. Does not include unpaired regions that are part of a
Junction()orHelix().-
buffer(self)¶ For the
SingleStrand()type, this does not have any meaning and is always the value-1.- Returns
buffer
- Return type
int
-
is_singlestrand(self)¶ Indicates that the
Motif()is of typeSingleStrand().- Returns
is_singlestrand
- Return type
bool
-
recursive_structure(self)¶ Returns the owned portion of the structure. In this coding of structure it is just the nucleotides in the single strand plus its child if it exists.
- Returns
recursive_structure
- Return type
str
-
recursive_sequence(self)¶ Returns the owned portion of the sequence. In this coding of sequence it is just the nucleotides in the single strand plus its child if it exists.
- Returns
recursive_sequence
- Return type
str
-
has_non_canonical(self)¶ Because there are no pairs “owned” by
SingleStrand()’s, it always returnsFalse.- Returns
has_non_canonical
- Return type
bool
-
generate_sequences(self)¶ Generates all possible sequences for the
SingleStrand()that are compatible with the constraints for the motif.
-
-
class
rna_library.SecStruct(secstruct, sequence)¶ Represents a
-
display(self)¶
-
property
sequence(self)¶
-
property
structure(self)¶
-
helix_replace_(self, id, secstruct, sequence)¶
-
motif_replace_(self, id, new_secstruct, new_sequence)¶
-
change_motif(self, id, new_secstruct, new_sequence)¶
-
get_sequence_structure(self)¶
-
_get_ids_internal(self, m, ids, mtype)¶
-
get_ids(self, motif_type)¶
-
get_substructure(self, id1, id2=None)¶
-
get(self, id)¶
-
__iter__(self)¶
-
itermotifs(self)¶
-
hairpins(self, **kwargs)¶
-
helix(self)¶
-
junctions(self)¶
-
singlestrands(self)¶
-
set_barcode(self, m_id, bc_seq)¶
-
__add__(self, other)¶
-
-
rna_library.parse_to_motifs(structure, sequence)¶ Method takes a structure sequence pair and returns a root
Motif()with a complete associated graph.- Parameters
structure (str) – a valid dot-bracket structure
sequence (str) – the corresponding sequence composed of the alphabet [ACGUTNB]
- Returns
motif
- Return type
-
rna_library.highest_id(m, best=0)¶ Figures out the highest id number in a given
Motif()graph.- Parameters
m (Motif) – motif to start the query on
best (int) – current highest or “best” motif id at that recursion level.
- Returns
highest_id
- Return type
int
-
class
rna_library.JunctionData(**kwargs)¶ Composite class that represents a collection of JunctionEntry objects in an experiment.
-
get_active_data(self)¶
-
rebuild_data(self)¶ Method that rebuilds the internal data representation from the JunctionEntry objects.
- Return type
NoneType
-
is_symmetrical(self)¶ Getter that tells if the current JunctionData object models a symmetrical junction. :rtype: bool
-
plot(self, plot_dir, overwrite=False)¶ Method that saves a plot of the JunctionData’s data points to the supplied directory.
- Param
str plot_dir: The directory where the plot will be saved. Does not have to exist.
- Return type
NoneType
- Parameters
plot_dir (str) –
-
show(self)¶ Method that brings up a plot of the JunctionData’s data points
- Return type
NoneType
-
bind(self, ax)¶ Method that binds the JunctionData points to a supplied matplotlib Axes object.
- Param
matplotlib.axes.Axes ax: the Axes object which the plot will be bound to
- Return type
NoneType
- Parameters
ax (matplotlib.axes.Axes) –
-
measure_variance(self)¶ - Return type
Dict[str, float]
-
-
class
rna_library.JunctionEntry(**kwargs)¶ Represents a single junction entry from an RNA construct
-
validate_arguments_(self)¶ Helper method that validates arguments in the constructor.
-
key(self)¶ Getter that accesses the (sequence, structure) key for the JunctionEntry
- Return type
Tuple[str,str]
-
is_symmetrical(self)¶ Getter that checks if the JunctionEntry is for a symmetrical unction.
- Return type
bool
-
__getitem__(self, idx)¶ - Parameters
idx (int) –
- Return type
float
-
-
rna_library.build_react_df(**kwargs)¶ - Builds the reactivity dataframe from the supplied arguments. Here each row reprsents a construct.
Note that all arguments are supplied as kwargs.
- Params
str out_dir: base output directory where rna_library.process_histos was called :params: str start_seq: common start sequence for the RNA constructs
- Params
str end_seq: common end sequence for the RNA constructs
- Params
str fasta_file: path to the fast file for the construct
- Params
str histos_file: path to histogram file from DREEM analysis :rtype: pd.DataFrame
- Return type
pandas.DataFrame
-
rna_library.build_motif_df(df)¶ Function that creates a motif dataframe from a reactivity dataframe. Here each row represents a Motif.
- Param
pd.DataFrame df: reactivity dataframe which is generated from build_react_df() :rtype: pd.DataFrame
- Parameters
df (pandas.DataFrame) –
- Return type
pandas.DataFrame
-
rna_library.normalize_hairpin(df, seq, ss, **kwargs)¶ Normalizes a reactivity pattern to a normalization hairpin. Creates fully normalize values for an entire pd.DataFrame
- Param
pd.DataFrame df: reactivity_df created from rna_library.build_react_df
- Param
str seq: reference hairpin sequence
- Param
str ss: reference hairpin structure
- Param
float factor: factor to set the hairpin values to, is a keyword argument
- Param
str nts: string of nucleotide’s to be used for calc, is a keywrod argument
- Parameters
df (pandas.DataFrame) –
seq (str) –
ss (str) –
- Return type
List[List[float]]
-
rna_library.normalize_coeff_fit(reactivity_df)¶
-
rna_library.build_barcodes(secstruct, start=None, distance=3)¶ - Parameters
secstruct (str) –
start (Union[str, None]) –
distance (int) –