import seaborn as sns
import gpplot
import matplotlib.pyplot as plt

Log-normalizing Data

To demonstrate the functionality of this module we'll use read counts from Sanson et al. 2018.

supp_reads = 'https://static-content.springer.com/esm/art%3A10.1038%2Fs41467-018-07901-8/MediaObjects/41467_2018_7901_MOESM4_ESM.xlsx'
read_counts = pd.read_excel(supp_reads,
                            sheet_name = 'A375_orig_tracr raw reads', 
                            header = None,
                            skiprows = 3, 
                            names = ['sgRNA Sequence', 'pDNA', 'A375_RepA', 'A375_RepB'], 
                            engine='openpyxl')

read_counts

Note that the read counts have a long right tail

sns.kdeplot(read_counts['pDNA'])

<AxesSubplot:xlabel='pDNA', ylabel='Density'>

To mitigate this tail and account for sequencing biases we log-normalize read counts by taking $$\log_2 \bigg ( \frac{\text{reads}}{\text{total reads in condition}} * 10^6 + 1 \bigg )$$

lognorms = lognorm_columns(reads_df=read_counts, columns=['pDNA', 'A375_RepA', 'A375_RepB'])
sns.kdeplot(lognorms['pDNA'])

<AxesSubplot:xlabel='pDNA', ylabel='Density'>

Filter pDNA

Note that there's a small number of sgRNAs with low pDNA abundance. We'll filter out any sgRNAs with lognormed pDNA less than the mean minus 3 standard deviations.

filtered_lognorms = filter_pdna(lognorm_df=lognorms, pdna_cols=['pDNA'])
lognorms.shape[0] - filtered_lognorms.shape[0]

576

sns.kdeplot(filtered_lognorms['pDNA'])

<AxesSubplot:xlabel='pDNA', ylabel='Density'>

assert ((filtered_lognorms['pDNA'] - lognorms['pDNA'].mean())/lognorms['pDNA'].std()).min() > -3

Calculate log-fold changes

We'll calculate log-fold changes from the pDNA reference. Since we only have one reference condition we can supply the calculate_lfcs function with the ref_col and target_cols arguments.

lfc_df = calculate_lfcs(lognorm_df=filtered_lognorms, ref_col='pDNA', target_cols=['A375_RepA', 'A375_RepB'])

Alternatively we can supply a ref_map, which is useful when we have more than one reference condition

lfc_df2 = calculate_lfcs(filtered_lognorms, ref_map = {'A375_RepA': 'pDNA', 'A375_RepB': 'pDNA'})
assert lfc_df.equals(lfc_df2)

We can use seaborn's clustermap function to plot a clustermap of replicate correlations

sns.clustermap(lfc_df.corr(), vmin= 0, cmap= 'viridis', annot= True, figsize= (4,4))

<seaborn.matrix.ClusterGrid at 0x7ffc9ba0a780>

Since we only have two conditions it's also easy to visualize replicates as a point densityplot using gpplot

plt.subplots(figsize=(4,4))
gpplot.point_densityplot(data=lfc_df, x='A375_RepA', y='A375_RepB')
gpplot.add_correlation(data=lfc_df, x='A375_RepA', y='A375_RepB')
sns.despine()

Average log-fold changes across replicates

Since we see a strong correlation, we'll average the log-fold change of each sgRNA across replicates

avg_replicate_lfc_df = average_replicate_lfcs(lfcs=lfc_df, guide_col='sgRNA Sequence', condition_indices=[0], 
                                              sep='_')
assert np.abs(avg_replicate_lfc_df.shape[0] - read_counts.shape[0]) < 1000

We can alternatively combine replicates by supplying a condition_map

avg_replicate_lfc_df_condition_map = average_replicate_lfcs(lfcs=lfc_df, guide_col='sgRNA Sequence',
                                                            condition_map={'A375_RepA': 'A375',
                                                                           'A375_RepB': 'A375'})
pd.testing.assert_frame_equal(avg_replicate_lfc_df_condition_map, avg_replicate_lfc_df)

If we don't want to join replicates, we don't have to supply these arguments

distinct_replicate_lfc_df = average_replicate_lfcs(lfcs=lfc_df, guide_col='sgRNA Sequence')

assert distinct_replicate_lfc_df.shape[0] == avg_replicate_lfc_df.shape[0]*2

Group control genes

Before combining sgRNAs at the gene level, it's sometimes helpful to group controls into pseudo-genes so they're easier to compare with target genes

guide_annotations = pd.read_excel(supp_reads,
                                  sheet_name='sgRNA annotations', 
                                  engine='openpyxl')
guide_annotations

remapped_annotations = group_pseudogenes(annotations=guide_annotations, pseudogene_size=4, 
                                         gene_col='Annotated Gene Symbol', 
                                         control_regex=['NO_CURRENT'])
control_genes = remapped_annotations.loc[remapped_annotations['Annotated Gene Symbol'].str.contains('_'), 'Annotated Gene Symbol']
assert control_genes.value_counts().max() == 4
assert control_genes.value_counts().min() == 4
assert len(control_genes) == 1000
remapped_annotations

Scale guide-level scores

We provide two methods for scaling log-fold change values to controls:

Z-score from a set of negative controls
Scale scores between a set of negative and positive controls

For both scoring methods, you can input either a regex or a list of genes to define control sets

nonessential_genes = (pd.read_table('https://raw.githubusercontent.com/gpp-rnd/genesets/master/human/non-essential-genes-Hart2014.txt', 
                                    names=['gene'])
                      .gene)
essential_genes = (pd.read_table('https://raw.githubusercontent.com/gpp-rnd/genesets/master/human/essential-genes-Hart2015.txt', 
                                 names=['gene'])
                   .gene)

annotated_lfcs = get_annotated_lfcs(avg_replicate_lfc_df, remapped_annotations, merge_on='sgRNA Sequence')
assert annotated_lfcs.shape[0] == avg_replicate_lfc_df.shape[0]

no_current_controls = get_control_lfcs(annotated_lfcs, 'NO_CURRENT', 'Annotated Gene Symbol')
assert abs(no_current_controls.shape[0] - 1000) < 20
nonessential_controls = get_control_lfcs(annotated_lfcs, nonessential_genes, 'Annotated Gene Symbol')
assert abs(nonessential_controls.shape[0] - len(nonessential_genes) * 4) < 200

z_scored_lfcs = get_neg_ctl_z_score(annotated_lfcs, nonessential_genes, 'Annotated Gene Symbol', 'condition', 'avg_lfc')
control_z_scores = z_scored_lfcs.loc[z_scored_lfcs['Annotated Gene Symbol'].isin(nonessential_genes), 'z_scored_avg_lfc']
assert np.abs(control_z_scores.mean()) < 0.001
assert np.abs(control_z_scores.std() - 1) < 0.001

scaled_lfcs = scale_lfcs(annotated_lfcs, nonessential_genes, essential_genes, 'Annotated Gene Symbol', 'condition', 'avg_lfc', -1)
pos_scaled_lfcs = scaled_lfcs[scaled_lfcs['Annotated Gene Symbol'].isin(essential_genes)]
assert pos_scaled_lfcs['control_scaled_avg_lfc'].median() == -1
neg_scaled_lfcs = scaled_lfcs[scaled_lfcs['Annotated Gene Symbol'].isin(nonessential_genes)]
assert neg_scaled_lfcs['control_scaled_avg_lfc'].median() == 0
assert pos_scaled_lfcs['probability_positive_control'].mean() > neg_scaled_lfcs['probability_positive_control'].mean()

All of these functions are wrapped together in annotate_guide_lfcs

annot_guide_lfcs = annotate_guide_lfcs(avg_replicate_lfc_df, remapped_annotations, 'Annotated Gene Symbol', 
                                                     merge_on='sgRNA Sequence',
                                                     z_score_neg_ctls=True, z_score_neg_ctl_genes=nonessential_genes, 
                                                     scale_ctls=True, scale_neg_ctl_genes=nonessential_genes, 
                                                     scale_pos_ctl_genes=essential_genes, pos_control_direction=-1)
annot_guide_lfcs.sort_values('z_scored_avg_lfc')

plot_lfcs = annot_guide_lfcs.copy()
plot_lfcs['neg_control'] = plot_lfcs['Annotated Gene Symbol'].isin(nonessential_genes)
plot_lfcs['nlog10_pos_ctrl'] = -np.log10(plot_lfcs['probability_positive_control'])
plot_lfcs = plot_lfcs.sort_values('neg_control')
plt.subplots(figsize=(4,4))
sns.scatterplot(data=plot_lfcs, x='z_scored_avg_lfc', 
                y='control_scaled_avg_lfc', hue='neg_control', alpha=0.3)
sns.despine()

Aggreagate log-fold changes at the gene level

To aggregate scores at the gene level, we specify columns to average, and columns to z-score

gene_lfcs = aggregate_gene_lfcs(annot_guide_lfcs, 'Annotated Gene Symbol', 
                                average_cols=['avg_lfc', 'control_scaled_avg_lfc', 'probability_positive_control'], 
                                zscore_cols=['z_scored_avg_lfc'])
assert np.abs(gene_lfcs.shape[0] - 20000) < 1000
# Do we score about the expect number of essential genes?
assert np.abs((gene_lfcs['probability_positive_control'] > 0.5).sum()/20000 - 0.2) < 0.1
assert gene_lfcs.sort_values('probability_positive_control', ascending=False).head(5)['Annotated Gene Symbol'].isin(essential_genes).sum() > 2
assert gene_lfcs[gene_lfcs.z_scored_avg_lfc < 0].sort_values('z_scored_avg_lfc_fdr', ascending=True).head(5)['Annotated Gene Symbol'].isin(essential_genes).sum() > 2
# Do the two scoring mechanisms agree
gene_lfcs_copy = gene_lfcs.copy()
gene_lfcs_copy['z_essential'] = (gene_lfcs_copy['z_scored_avg_lfc_fdr'] < 0.1) & (gene_lfcs_copy['z_scored_avg_lfc'] < 0)
gene_lfcs_copy['p_essential'] = (gene_lfcs_copy['probability_positive_control'] > 0.5)
scoring_overlap = (gene_lfcs_copy[['z_essential', 'p_essential']].value_counts()
                   .reset_index()
                   .rename({0: 'n'}, axis=1))
agree = scoring_overlap.loc[(scoring_overlap['z_essential'] == scoring_overlap['p_essential']), 'n'].sum()
disagree = scoring_overlap.loc[(scoring_overlap['z_essential'] != scoring_overlap['p_essential']), 'n'].sum()
accuracy = agree/(agree + disagree)
assert accuracy > 0.95
# Are controls centered around 0
control_z_scores = gene_lfcs.loc[gene_lfcs['Annotated Gene Symbol'].isin(nonessential_genes), 'z_scored_avg_lfc']
assert np.abs(control_z_scores.mean()) < 0.1
assert np.abs(control_z_scores.std() - 1) < 0.3
gene_lfcs.sort_values('avg_lfc')

We see that our negative control population, which are the nonessential genes in this case, are grouped around 0

plot_lfcs = gene_lfcs.copy()
plot_lfcs['nlog10_fdr'] = -np.log10(plot_lfcs['z_scored_avg_lfc_fdr'])
plot_lfcs['neg_control'] = plot_lfcs['Annotated Gene Symbol'].isin(nonessential_genes)
plot_lfcs = plot_lfcs.sort_values('neg_control')
plt.subplots(figsize=(4,4))
sns.scatterplot(data=plot_lfcs, x='z_scored_avg_lfc', 
                y='nlog10_fdr', hue='neg_control', alpha=0.3)
sns.despine()

We can also calculate the gene level p-values using the hypergeometric test as follows:

gene_lfcs_hyp = aggregate_gene_lfcs_hypergeometric(annot_guide_lfcs, 'Annotated Gene Symbol',
                                average_cols=['avg_lfc'])
gene_lfcs_hyp

plot_lfcs_hyp = gene_lfcs_hyp.copy()
plot_lfcs_hyp = plot_lfcs_hyp[(plot_lfcs_hyp.n_guides>=3)&(plot_lfcs_hyp.n_guides<=5)]
hyp_ctrls = plot_lfcs_hyp[plot_lfcs_hyp['Annotated Gene Symbol'].isin(nonessential_genes)]
plt.subplots(figsize=(4,4))
sns.scatterplot(data=plot_lfcs_hyp, x='avg_lfc',
                y='avg_lfc_hypergeometric_test_avg_pval')
sns.scatterplot(data=hyp_ctrls, x='avg_lfc',
                y='avg_lfc_hypergeometric_test_avg_pval')
sns.despine()

Calculate ROC-AUC between essential and non-essential genes

Finally, to evaluate the quality each screen, we'll calculate the ROC-AUC between essential and nonessential genes for each condition

We can calculate ROC-AUCs three different ways:

1) We can specify a condition_col and score_col

roc_aucs, roc_df = get_roc_aucs(lfcs=gene_lfcs, tp_genes=essential_genes, fp_genes=nonessential_genes, gene_col='Annotated Gene Symbol', 
                                condition_col='condition', score_col='avg_lfc')
assert roc_aucs['ROC-AUC'][0] > 0.9
roc_aucs

plt.subplots(figsize=(4,4))
sns.lineplot(data=roc_df, x='fpr', y='tpr', hue='condition', ci=None)
sns.despine()

We can also specify tp_genes and fp_genes using regular expressions. This is useful for distinguishing between different control types defined by a regex.

roc_aucs, _ = get_roc_aucs(lfcs=gene_lfcs, tp_genes='RPS', fp_genes='CD', gene_col='Annotated Gene Symbol',
                                condition_col='condition', score_col='avg_lfc')
assert roc_aucs['ROC-AUC'][0] > 0.8

2) Since we only have one condition we can also just supply the score_col

roc_aucs, roc_df = get_roc_aucs(lfcs=gene_lfcs, tp_genes=essential_genes, fp_genes=nonessential_genes, gene_col='Annotated Gene Symbol', 
                        score_col='avg_lfc')
roc_aucs

0.976174694978823

plt.subplots(figsize=(4,4))
sns.lineplot(data=roc_df, x='fpr', y='tpr', ci=None)
sns.despine()

Finally, we can specify condition_list as a list of columns we want to calculate AUCs for

roc_aucs, roc_df = get_roc_aucs(lfcs=gene_lfcs, tp_genes=essential_genes, fp_genes=nonessential_genes, gene_col='Annotated Gene Symbol',
                                condition_list=['avg_lfc'])
roc_aucs

This is a useful approach when we want to quantify ROC-AUC at the guide level say

annotated_guide_lfcs = lfc_df.merge(guide_annotations, how='inner', on='sgRNA Sequence')
roc_aucs, roc_df = get_roc_aucs(lfcs=annotated_guide_lfcs, tp_genes=essential_genes, fp_genes=nonessential_genes, gene_col='Annotated Gene Symbol',
                                condition_list=['A375_RepA', 'A375_RepB'])
roc_aucs

plt.subplots(figsize=(4,4))
sns.lineplot(data=roc_df, x='fpr', y='tpr', hue='condition', ci=None)
sns.despine()

We can see this visually as well that essentials deplete the most followed by nonessentials and control genes

plt.subplots(figsize=(4,4))
sns.kdeplot(gene_lfcs.loc[gene_lfcs['Annotated Gene Symbol'].isin(essential_genes), 'z_scored_avg_lfc'], label='essential')
sns.kdeplot(gene_lfcs.loc[gene_lfcs['Annotated Gene Symbol'].isin(nonessential_genes), 'z_scored_avg_lfc'], label='non-essential')
sns.kdeplot(gene_lfcs.loc[gene_lfcs['Annotated Gene Symbol'].str.contains('_'), 'z_scored_avg_lfc'], label='control')
sns.despine()
plt.legend()

<matplotlib.legend.Legend at 0x7ffc9da5c9b0>

	sgRNA Sequence	pDNA	A375_RepA	A375_RepB
0	AAAAAAAATCCGGACAATGG	522	729	774
1	AAAAAAAGGATGGTGATCAA	511	1484	1393
2	AAAAAAATGACATTACTGCA	467	375	603
3	AAAAAAATGTCAGTCGAGTG	200	737	506
4	AAAAAACACAAGCAAGACCG	286	672	352
...	...	...	...	...
77436	TTTGTTTGGATTGCTTCAAG	514	2878	303
77437	TTTTACCTTGTTCACATGGA	828	3896	3367
77438	TTTTGACTCTAATCACCGGT	588	2725	2498
77439	TTTTTAATACAAGGTAATCT	503	2209	1785
77440	TTTTTCTCACCCGATGAATC	661	3198	3189

	sgRNA Sequence	Annotated Gene Symbol	Annotated Gene ID
0	AAAAAAAATCCGGACAATGG	SLC25A24	29957
1	AAAAAAAGGATGGTGATCAA	FASTKD3	79072
2	AAAAAAATGACATTACTGCA	BCAS2	10286
3	AAAAAAATGTCAGTCGAGTG	GPR18	2841
4	AAAAAACACAAGCAAGACCG	ZNF470	388566
...	...	...	...
77436	TTTGTTTGGATTGCTTCAAG	NEPRO	25871
77437	TTTTACCTTGTTCACATGGA	NO_CURRENT_997	NO_CURRENT_997
77438	TTTTGACTCTAATCACCGGT	NO_CURRENT_998	NO_CURRENT_998
77439	TTTTTAATACAAGGTAATCT	NO_CURRENT_999	NO_CURRENT_999
77440	TTTTTCTCACCCGATGAATC	NO_CURRENT_1000	NO_CURRENT_1000

	sgRNA Sequence	Annotated Gene Symbol	Annotated Gene ID
0	AAAAAAAATCCGGACAATGG	SLC25A24	29957
1	AAAAAAAGGATGGTGATCAA	FASTKD3	79072
2	AAAAAAATGACATTACTGCA	BCAS2	10286
3	AAAAAAATGTCAGTCGAGTG	GPR18	2841
4	AAAAAACACAAGCAAGACCG	ZNF470	388566
...	...	...	...
77436	TTTGTTTGGATTGCTTCAAG	NEPRO	25871
77437	TTTTACCTTGTTCACATGGA	NO_CURRENT_181	NO_CURRENT_997
77438	TTTTGACTCTAATCACCGGT	NO_CURRENT_63	NO_CURRENT_998
77439	TTTTTAATACAAGGTAATCT	NO_CURRENT_154	NO_CURRENT_999
77440	TTTTTCTCACCCGATGAATC	NO_CURRENT_112	NO_CURRENT_1000

	sgRNA Sequence	condition	avg_lfc	n_obs	Annotated Gene Symbol	Annotated Gene ID	z_scored_avg_lfc	control_scaled_avg_lfc	probability_positive_control
6837	ACCAGATCCACATCAAACCG	A375	-4.094542	2	FBL	2091	-8.416034	-2.472776	0.999965
49866	GGAATCTCACTCTGTCACCC	A375	-3.903209	2	GVQW2	100507462	-8.029763	-2.361037	0.999940
31346	CGATAGACGCGTGCACCACG	A375	-3.897174	2	EIF6	3692	-8.017578	-2.357512	0.999939
5407	ACACAGCATCTCCACTACCC	A375	-3.818617	2	SPDYE6	729597	-7.858985	-2.311635	0.999925
5566	ACACGGAGTAGCTCTCCTTG	A375	-3.815685	2	HIST1H2BN	8341	-7.853065	-2.309923	0.999924
...	...	...	...	...	...	...	...	...	...
13820	AGGCTGTTGGTAAGTATACA	A375	1.879858	2	ZDHHC13	54503	3.645351	1.016289	0.002021
17810	ATGATGGCAAAGTAGATGTG	A375	2.068461	2	TAOK1	57551	4.026111	1.126433	0.001203
7842	ACGGCCACCTCACAAAGTAA	A375	2.093036	2	MAP4K4	9448	4.075724	1.140785	0.001125
65173	TCCTGGTCCGAAGCCGAGTG	A375	2.214092	2	MED19	219541	4.320117	1.211482	0.000806
51228	GGATATTCTCTGATGCTCAG	A375	2.463255	2	SUPT20H	55578	4.823139	1.356994	0.000406

	condition	Annotated Gene Symbol	n_guides	avg_lfc	control_scaled_avg_lfc	probability_positive_control	z_scored_avg_lfc	z_scored_avg_lfc_p_value	z_scored_avg_lfc_fdr
13317	A375	PSMG3	4	-3.165679	-1.930318	0.999157	-13.081609	4.194682e-39	8.122163e-35
7375	A375	HSPA5	4	-3.021085	-1.845874	0.999113	-12.497781	7.676386e-36	7.431893e-32
4927	A375	EIF6	4	-3.009939	-1.839365	0.998650	-12.452779	1.350592e-35	8.717173e-32
14219	A375	RPL19	4	-2.993390	-1.829701	0.998627	-12.385961	3.113338e-35	1.507089e-31
12774	A375	POLR2L	4	-2.970395	-1.816271	0.998484	-12.293110	9.863629e-35	3.819789e-31
...	...	...	...	...	...	...	...	...	...
16365	A375	TAF5L	4	1.259290	0.653875	0.014899	4.785041	1.709528e-06	1.784453e-05
16386	A375	TAOK1	4	1.286787	0.669934	0.047783	4.896065	9.777464e-07	1.043091e-05
16367	A375	TAF6L	4	1.322374	0.690717	0.010756	5.039754	4.661311e-07	5.096384e-06
16219	A375	SUPT20H	4	1.417919	0.746515	0.031551	5.425536	5.778076e-08	6.838684e-07
9313	A375	MAP4K4	4	1.527641	0.810593	0.013910	5.868560	4.395965e-09	5.850108e-08

core

Log-normalizing Data

`lognorm`[source]

`lognorm_columns`[source]

Filter pDNA

`filter_pdna`[source]

Calculate log-fold changes

`calculate_lfcs`[source]

Average log-fold changes across replicates

`get_condition`[source]

`average_replicate_lfcs`[source]

Group control genes

`group_pseudogenes`[source]

Scale guide-level scores

`get_annotated_lfcs`[source]

`get_control_lfcs`[source]

`get_neg_ctl_z_score`[source]

`scale_lfcs`[source]

`annotate_guide_lfcs`[source]

Aggreagate log-fold changes at the gene level

`aggregate_gene_lfcs`[source]

`get_hypergeometric_pval`[source]

`apply_parallel`[source]

`aggregate_gene_lfcs_hypergeometric`[source]

Calculate ROC-AUC between essential and non-essential genes

`get_roc_aucs`[source]

	condition	Annotated Gene Symbol	n_guides	avg_lfc	avg_lfc_ascending_construct_ranks	avg_lfc_descending_construct_ranks	avg_lfc_hypergeometric_test_avg_pval
0	A375	A1BG	3	-0.084621	23319;35845;46354	30512;41021;53547	0.490678
1	A375	A1CF	4	0.500314	51763;62213;71912;74278	2588;4954;14653;25103	1.542492
2	A375	A2M	4	0.288629	42669;47983;54255;73458	3408;22611;28883;34197	0.920590
3	A375	A2ML1	4	-0.481786	10645;17323;22150;24754	52112;54716;59543;66221	1.080913
4	A375	A3GALT2	4	0.059362	12609;55370;57476;61651	15215;19390;21496;64257	0.642799
...	...	...	...	...	...	...	...
19358	A375	ZYG11A	4	-0.280137	18076;24707;24879;32720	44146;51987;52159;58790	0.860380
19359	A375	ZYG11B	4	0.215013	25652;40027;61085;73812	3054;15781;36839;51214	0.748303
19360	A375	ZYX	4	-0.182864	14031;29692;40133;41865	35001;36733;47174;62835	0.623025
19361	A375	ZZEF1	4	0.104225	25206;36473;53298;67325	9541;23568;40393;51660	0.573432
19362	A375	ZZZ3	4	-0.743157	3628;11128;19396;47666	29200;57470;65738;73238	0.992930

core

Log-normalizing Data

lognorm[source]

lognorm_columns[source]

Filter pDNA

filter_pdna[source]

Calculate log-fold changes

calculate_lfcs[source]

Average log-fold changes across replicates

get_condition[source]

average_replicate_lfcs[source]

Group control genes

group_pseudogenes[source]

Scale guide-level scores

get_annotated_lfcs[source]

get_control_lfcs[source]

get_neg_ctl_z_score[source]

scale_lfcs[source]

annotate_guide_lfcs[source]

Aggreagate log-fold changes at the gene level

aggregate_gene_lfcs[source]

get_hypergeometric_pval[source]

apply_parallel[source]

aggregate_gene_lfcs_hypergeometric[source]

Calculate ROC-AUC between essential and non-essential genes

get_roc_aucs[source]

`lognorm`[source]

`lognorm_columns`[source]

`filter_pdna`[source]

`calculate_lfcs`[source]

`get_condition`[source]

`average_replicate_lfcs`[source]

`group_pseudogenes`[source]

`get_annotated_lfcs`[source]

`get_control_lfcs`[source]

`get_neg_ctl_z_score`[source]

`scale_lfcs`[source]

`annotate_guide_lfcs`[source]

`aggregate_gene_lfcs`[source]

`get_hypergeometric_pval`[source]

`apply_parallel`[source]

`aggregate_gene_lfcs_hypergeometric`[source]

`get_roc_aucs`[source]