rna_library.core.util

Utility functions for the rna_library module

Module Contents

Functions

satisfies_constraints(sequence, template)

Confirms whether a sequence and template are the same or not. A template has one of 6

pool_with_distance(sequences, min_dist)

Creates a pool of sequences where each sequence has at least the specified Levenshtein distance between it and all other sequences.

bp_codes_to_sequence(bp_code)

Converts a list of BasePair()’s into a sequence string.

nt_codes_to_sequences(codes)

Converts a list of Nucleotide()’s into a sequence string.

get_pair_list(secstruct)

Creates a list of pairs of indices from a dot-bracket secstruct string. Note

connectivity_list(structure)

Generates a connectivity list or pairmap from a dot-bracket secondary structure.

is_circular(start, connections)

Checks if a starting point in a pairmap is in a circular portion.

is_symmetrical(token)

Checks if a sequence or secondary structure is well-formed and symmetrical.

safe_rm(fname)

Removes a file only if the file already exists.

safe_mkdir(dirname)

Creates a directory if it does not already exist.

valid_db(structure)

Checks if a structure is a valid dot-bracket structure containing only ‘(‘, ‘.’ or ‘)’ characters.

load_fasta(fname)

Reads in sequences from a .fasta file and return a dictionary with construct names as keys

dsci(sequence, target, dms)

Calculates the DSCI score as developed by the Rouskin Group at MIT. The generated score

rna_library.core.util.satisfies_constraints(sequence, template)

Confirms whether a sequence and template are the same or not. A template has one of 6 possible characters at each position: one of the normal A/C/G/U, N for “any” or “B” to indicate the position is meant to be part of a barcode. Will return False if sequence and template are not the same length.

Parameters
  • sequence (str) – sequence in question

  • template (str) – templated sequence

Return type

bool

rna_library.core.util.pool_with_distance(sequences, min_dist)

Creates a pool of sequences where each sequence has at least the specified Levenshtein distance between it and all other sequences. Method sorts sequences internally so input order is not relevant to final pool.

Warning

This function can runs in polynomial time so large pools WILL take a significant amount of time to run. For reference, pools on the order of hundreds of thousands took multiple hours to run on an i7 in 2021.

Param

list[str] sequences: A list of starting RNA sequences.

Param

int min_dist: Minimum edit distance between each sequence in the pool. Must be >= 0.

Return type

list[str]

Parameters
  • sequences (List[str]) –

  • min_dist (int) –

rna_library.core.util.bp_codes_to_sequence(bp_code)

Converts a list of BasePair()’s into a sequence string.

Parameters

bp_code (list[BasePair]) – a list of basepairs to be converted. Basepairs are in order of nesting.

Return type

str

rna_library.core.util.nt_codes_to_sequences(codes)

Converts a list of Nucleotide()’s into a sequence string.

Parameters

codes (list[Nucleotide]) – a list of nucleotides to be converted.

Return type

str

rna_library.core.util.get_pair_list(secstruct)

Creates a list of pairs of indices from a dot-bracket secstruct string. Note that the function assumes the incoming structure is valid.

Parameters

secstruct (str) – a dot-bracket structure which is assumed to be valid

Return type

list[tuple(int,int)]

Raises

TypeError – if the number of left parentheses exceeds the number of right parentheses

rna_library.core.util.connectivity_list(structure)

Generates a connectivity list or pairmap from a dot-bracket secondary structure. The list has a value of -1 for unpaired positions else has the index of a positions complement.

Parameters

structure (str) – a dot-bracket structure

Return type

list[int]

Raises

TypeError – if the number of left parentheses exceeds the number of right parentheses

rna_library.core.util.is_circular(start, connections)

Checks if a starting point in a pairmap is in a circular portion. This can include the closing pairs of both hairpins and junctions.

Parameters
  • start (int) – staring index in the pairmap

  • connections (list[int]) – pairmap generated from util.connectivity_list()

Return type

bool

rna_library.core.util.is_symmetrical(token)

Checks if a sequence or secondary structure is well-formed and symmetrical.

Param

str token: sequence or secondary structure to test

Return type

bool

Parameters

token (str) –

rna_library.core.util.safe_rm(fname)

Removes a file only if the file already exists. :param: str fname: name of file to be removed :rtype: NoneType

Parameters

fname (str) –

Return type

None

rna_library.core.util.safe_mkdir(dirname)

Creates a directory if it does not already exist. :param: str dirname: name of the directory to create :rtype: NoneType

Parameters

dirname (str) –

Return type

None

rna_library.core.util.valid_db(structure)

Checks if a structure is a valid dot-bracket structure containing only ‘(‘, ‘.’ or ‘)’ characters. :param: str structure: dot bracket structure :rtype: bool

Parameters

structure (str) –

Return type

bool

rna_library.core.util.load_fasta(fname)

Reads in sequences from a .fasta file and return a dictionary with construct names as keys and RNA sequences as values.

Param

str fname: name of the .fasta file to load

Return type

dict

Parameters

fname (str) –

rna_library.core.util.dsci(sequence, target, dms)

Calculates the DSCI score as developed by the Rouskin Group at MIT. The generated score is on the range of [0,1] and 0.95 is a common quality cutoff.

Param

str sequence: the RNA sequence to be analyzed

Param

str target: the target secondary structure

Param

List[float] dms: the DMS reactivities for the construct

Return type

Tuple[float]

Parameters
  • sequence (str) –

  • target (str) –

  • dms (List[float]) –