rna_library.core

Core module containing basic data types for RNA analysis.

Package Contents

Classes

BasePair

Enumerated type for canoncial and wobble basepairs.

Nucleotide

Enumerated type for all nucleotide types.

MotifType

Enumerated type for all motif types

Functions

valid_db(structure)

Checks if a structure is a valid dot-bracket structure containing only ‘(‘, ‘.’ or ‘)’ characters.

connectivity_list(structure)

Generates a connectivity list or pairmap from a dot-bracket secondary structure.

is_circular(start, connections)

Checks if a starting point in a pairmap is in a circular portion.

load_fasta(fname)

Reads in sequences from a .fasta file and return a dictionary with construct names as keys

is_symmetrical(token)

Checks if a sequence or secondary structure is well-formed and symmetrical.

safe_mkdir(dirname)

Creates a directory if it does not already exist.

safe_rm(fname)

Removes a file only if the file already exists.

dsci(sequence, target, dms)

Calculates the DSCI score as developed by the Rouskin Group at MIT. The generated score

fold_cache(sequence, params = _DEFAULT_PARAMS)

Uses RNAfold to predict the mfe for a structure using a global cache of results to save time.

save_cache()

Saves fold results to the cache and updats the _LAST_SIZE value to _CURRENT_SIZE.

folding_params()

See the default folding parameters being used with Vienna.

Attributes

ALLOWED_PAIRS

A set() containing all 6 canonical and wobble basepairings.

BPS

A tuple() of allowed canonical and wobble basepairings.

LEGAL_BPS

A set() of all 4 allowed nucleotide types.

NTS

A tuple() of all 4 canonical nucleotide types. Ordered

BP_VALS

A list() that contains the integer values for all of the

BASEPAIR_MAPPER

A dict() object that maps a canonical basepair to its

NT_VALS

A list() that contains the integer values for all of the

NUCLEOTIDE_MAPPER

A dict() object that maps a canoncial nucleotide to its

TYPE_MAPPER

A :class: dict() object that maps a :class: MotifType to its value

FoldResult

namedtuple that holds sequence, structure, ensembled defect and folding parameters for an RNAfold prediction.

rna_library.core.ALLOWED_PAIRS

A set() containing all 6 canonical and wobble basepairings.

rna_library.core.BPS = ['GU', 'UG', 'AU', 'UA', 'GC', 'CG']

A tuple() of allowed canonical and wobble basepairings. Ordered for easy conversion by the BasePair() class.

rna_library.core.LEGAL_BPS

A set() of all 4 allowed nucleotide types.

rna_library.core.NTS = ['A', 'C', 'G', 'U']

A tuple() of all 4 canonical nucleotide types. Ordered for each conversion by the Nucleotide() class.

class rna_library.core.BasePair

Bases: enum.IntEnum

Inheritance diagram of rna_library.core.BasePair

Enumerated type for canoncial and wobble basepairs.

GU = 0
UG = 1
AU = 2
UA = 3
GC = 4
CG = 5
is_GU(self)
Returns

If the instance is a UG or GU pair.

Return type

bool()

is_AU(self)
Returns

If the instance is a UA or AU pair.

Return type

bool()

is_GC(self)
Returns

If the instance is a CG or GC pair.

Return type

bool()

is_canoncial(self)
Returns

If the instance is a canonical Watson-Crick basepair.

Return type

bool()

to_str(self)
Returns

The BasePair() instance in text form.

Return type

str()

rna_library.core.BP_VALS

A list() that contains the integer values for all of the BasePair() enumerations.

rna_library.core.BASEPAIR_MAPPER

A dict() object that maps a canonical basepair to its BasePair() value.

class rna_library.core.Nucleotide

Bases: enum.IntEnum

Inheritance diagram of rna_library.core.Nucleotide

Enumerated type for all nucleotide types.

A = [0]
C = [1]
G = [2]
U = 3
to_str(self)
Returns

The Nucleotide() instance in text form.

Return type

str()

rna_library.core.NT_VALS

A list() that contains the integer values for all of the Nucleotide() enumerations.

rna_library.core.NUCLEOTIDE_MAPPER

A dict() object that maps a canoncial nucleotide to its Nucleotide() value.

class rna_library.core.MotifType

Bases: enum.IntEnum

Inheritance diagram of rna_library.core.MotifType

Enumerated type for all motif types

UNASSIGNED = 0
SINGLESTRAND = 1
HELIX = 2
HAIRPIN = 3
JUNCTION = 4
rna_library.core.TYPE_MAPPER

A :class: dict() object that maps a :class: MotifType to its value as a str().

exception rna_library.core.InvalidDotBracket

Bases: Exception

Inheritance diagram of rna_library.core.InvalidDotBracket

Exception for a mal-formed dot-bracket secondary structure.

exception rna_library.core.MissingDependency

Bases: Exception

Inheritance diagram of rna_library.core.MissingDependency

Exception for when a dependency is missing in the system

exception rna_library.core.InvalidArgument

Bases: Exception

Inheritance diagram of rna_library.core.InvalidArgument

Exception for a bad argument being supplied.

rna_library.core.valid_db(structure)

Checks if a structure is a valid dot-bracket structure containing only ‘(‘, ‘.’ or ‘)’ characters. :param: str structure: dot bracket structure :rtype: bool

Parameters

structure (str) –

Return type

bool

rna_library.core.connectivity_list(structure)

Generates a connectivity list or pairmap from a dot-bracket secondary structure. The list has a value of -1 for unpaired positions else has the index of a positions complement.

Parameters

structure (str) – a dot-bracket structure

Return type

list[int]

Raises

TypeError – if the number of left parentheses exceeds the number of right parentheses

rna_library.core.is_circular(start, connections)

Checks if a starting point in a pairmap is in a circular portion. This can include the closing pairs of both hairpins and junctions.

Parameters
  • start (int) – staring index in the pairmap

  • connections (list[int]) – pairmap generated from util.connectivity_list()

Return type

bool

rna_library.core.load_fasta(fname)

Reads in sequences from a .fasta file and return a dictionary with construct names as keys and RNA sequences as values.

Param

str fname: name of the .fasta file to load

Return type

dict

Parameters

fname (str) –

rna_library.core.is_symmetrical(token)

Checks if a sequence or secondary structure is well-formed and symmetrical.

Param

str token: sequence or secondary structure to test

Return type

bool

Parameters

token (str) –

rna_library.core.safe_mkdir(dirname)

Creates a directory if it does not already exist. :param: str dirname: name of the directory to create :rtype: NoneType

Parameters

dirname (str) –

Return type

None

rna_library.core.safe_rm(fname)

Removes a file only if the file already exists. :param: str fname: name of file to be removed :rtype: NoneType

Parameters

fname (str) –

Return type

None

rna_library.core.dsci(sequence, target, dms)

Calculates the DSCI score as developed by the Rouskin Group at MIT. The generated score is on the range of [0,1] and 0.95 is a common quality cutoff.

Param

str sequence: the RNA sequence to be analyzed

Param

str target: the target secondary structure

Param

List[float] dms: the DMS reactivities for the construct

Return type

Tuple[float]

Parameters
  • sequence (str) –

  • target (str) –

  • dms (List[float]) –

rna_library.core.fold_cache(sequence, params=_DEFAULT_PARAMS)

Uses RNAfold to predict the mfe for a structure using a global cache of results to save time.

Param

str sequence: RNA sequence ot be folded.

Param

Tuple[str] params: default folding params for RNAfold, defaults to (‘-p’,’–noLP’,’-d2’) :rtype: FoldResult

Parameters
  • sequence (str) –

  • params (Tuple[str]) –

Return type

FoldResult

rna_library.core.FoldResult

namedtuple that holds sequence, structure, ensembled defect and folding parameters for an RNAfold prediction.

rna_library.core.save_cache()

Saves fold results to the cache and updats the _LAST_SIZE value to _CURRENT_SIZE.

Return type

None

rna_library.core.folding_params()

See the default folding parameters being used with Vienna. :rtype: Tuple[str]

Return type

Tuple[str]