rna_library.core.util¶
Utility functions for the rna_library module
Module Contents¶
Functions¶
|
Confirms whether a sequence and template are the same or not. A template has one of 6 |
|
Creates a pool of sequences where each sequence has at least the specified Levenshtein distance between it and all other sequences. |
|
Converts a list of |
|
Converts a list of |
|
Creates a list of pairs of indices from a dot-bracket secstruct string. Note |
|
Generates a connectivity list or pairmap from a dot-bracket secondary structure. |
|
Checks if a starting point in a pairmap is in a circular portion. |
|
Checks if a sequence or secondary structure is well-formed and symmetrical. |
|
Removes a file only if the file already exists. |
|
Creates a directory if it does not already exist. |
|
Checks if a structure is a valid dot-bracket structure containing only ‘(‘, ‘.’ or ‘)’ characters. |
|
Reads in sequences from a .fasta file and return a dictionary with construct names as keys |
|
Calculates the DSCI score as developed by the Rouskin Group at MIT. The generated score |
-
rna_library.core.util.satisfies_constraints(sequence, template)¶ Confirms whether a sequence and template are the same or not. A template has one of 6 possible characters at each position: one of the normal A/C/G/U, N for “any” or “B” to indicate the position is meant to be part of a barcode. Will return
Falseif sequence and template are not the same length.- Parameters
sequence (str) – sequence in question
template (str) – templated sequence
- Return type
bool
-
rna_library.core.util.pool_with_distance(sequences, min_dist)¶ Creates a pool of sequences where each sequence has at least the specified Levenshtein distance between it and all other sequences. Method sorts sequences internally so input order is not relevant to final pool.
Warning
This function can runs in polynomial time so large pools WILL take a significant amount of time to run. For reference, pools on the order of hundreds of thousands took multiple hours to run on an i7 in 2021.
- Param
list[str] sequences: A list of starting RNA sequences.
- Param
int min_dist: Minimum edit distance between each sequence in the pool. Must be >= 0.
- Return type
list[str]
- Parameters
sequences (List[str]) –
min_dist (int) –
-
rna_library.core.util.bp_codes_to_sequence(bp_code)¶ Converts a list of
BasePair()’s into a sequence string.- Parameters
bp_code (list[BasePair]) – a list of basepairs to be converted. Basepairs are in order of nesting.
- Return type
str
-
rna_library.core.util.nt_codes_to_sequences(codes)¶ Converts a list of
Nucleotide()’s into a sequence string.- Parameters
codes (list[Nucleotide]) – a list of nucleotides to be converted.
- Return type
str
-
rna_library.core.util.get_pair_list(secstruct)¶ Creates a list of pairs of indices from a dot-bracket secstruct string. Note that the function assumes the incoming structure is valid.
- Parameters
secstruct (str) – a dot-bracket structure which is assumed to be valid
- Return type
list[tuple(int,int)]
- Raises
TypeError – if the number of left parentheses exceeds the number of right parentheses
-
rna_library.core.util.connectivity_list(structure)¶ Generates a connectivity list or pairmap from a dot-bracket secondary structure. The list has a value of
-1for unpaired positions else has the index of a positions complement.- Parameters
structure (str) – a dot-bracket structure
- Return type
list[int]
- Raises
TypeError – if the number of left parentheses exceeds the number of right parentheses
-
rna_library.core.util.is_circular(start, connections)¶ Checks if a starting point in a pairmap is in a circular portion. This can include the closing pairs of both hairpins and junctions.
- Parameters
start (int) – staring index in the pairmap
connections (list[int]) – pairmap generated from
util.connectivity_list()
- Return type
bool
-
rna_library.core.util.is_symmetrical(token)¶ Checks if a sequence or secondary structure is well-formed and symmetrical.
- Param
str token: sequence or secondary structure to test
- Return type
bool
- Parameters
token (str) –
-
rna_library.core.util.safe_rm(fname)¶ Removes a file only if the file already exists. :param: str fname: name of file to be removed :rtype: NoneType
- Parameters
fname (str) –
- Return type
None
-
rna_library.core.util.safe_mkdir(dirname)¶ Creates a directory if it does not already exist. :param: str dirname: name of the directory to create :rtype: NoneType
- Parameters
dirname (str) –
- Return type
None
-
rna_library.core.util.valid_db(structure)¶ Checks if a structure is a valid dot-bracket structure containing only ‘(‘, ‘.’ or ‘)’ characters. :param: str structure: dot bracket structure :rtype: bool
- Parameters
structure (str) –
- Return type
bool
-
rna_library.core.util.load_fasta(fname)¶ Reads in sequences from a .fasta file and return a dictionary with construct names as keys and RNA sequences as values.
- Param
str fname: name of the .fasta file to load
- Return type
dict
- Parameters
fname (str) –
-
rna_library.core.util.dsci(sequence, target, dms)¶ Calculates the DSCI score as developed by the Rouskin Group at MIT. The generated score is on the range of [0,1] and 0.95 is a common quality cutoff.
- Param
str sequence: the RNA sequence to be analyzed
- Param
str target: the target secondary structure
- Param
List[float] dms: the DMS reactivities for the construct
- Return type
Tuple[float]
- Parameters
sequence (str) –
target (str) –
dms (List[float]) –