cyclum package

Submodules

cyclum.evaluation module

cyclum.evaluation.parzen_estimate(x, lim, half_granularity=100, window=<function <lambda>>, scale=0.5)[source]

Calculate parzen window estimation (a non-parametric density estimation method)

Parameters
  • x – instances

  • lim – limit of domain

  • half_granularity

  • window

  • scale

Returns

cyclum.evaluation.periodic_parzen_estimate(x, period=3.14, half_granularity=100, window=<function <lambda>>, scale=0.5)[source]

Calculate parzen window estimation specifically for periodic domain

Parameters
  • x

  • period

  • half_granularity

  • window

  • scale

Returns

cyclum.evaluation.precision_estimate(distr_vector_list, label_vector, possible_label_list)[source]

Estimate precision

Parameters
  • distr_vector_list

  • label_vector

  • possible_label_list

Returns

cyclum.hdfrw module

Read write HDF.

cyclum.hdfrw.hdf2mat(filepath, dtype=<class 'float'>)[source]

Read hdf generated by hdfrw.R mat2hdf function to a data frame. Note that due to how python and R handles data differently, colnames are for index and rownames are for columns, and the matrix is also tacitly transposed.

Parameters
  • filepath (str) – path of hdf file

  • dtype (Type) – type of data; default is float

Return type

DataFrame

Returns

a pandas data frame

cyclum.hdfrw.mat2hdf(data, filepath)[source]

Write dataframe to an hdf file which can be read by hdfrw.R hdf2mat function.

Parameters
  • data (Union[DataFrame, <built-in function array>, List[str]]) – data frame or numpy array to be written

  • filepath (str) – path of hdf file to be written

Return type

None

Returns

None

cyclum.illustration module

class cyclum.illustration.FigureWriter(pdf_name)[source]

Bases: object

keep and write figures into a pdf file.

add_figure(figure, title=None)[source]

add a figure, but not write to file :param figure: :param title: :return:

add_figure_and_write(figure, title=None)[source]
write()[source]
cyclum.illustration.plot_cell_sparsity(linear_data, use_ratio=True)[source]

Return a figure of #{cell, none_zero_genes(cell) > x} :param linear_data: data :param use_ratio: plot as ratio or :return:

cyclum.illustration.plot_gene_sparsity(linear_data, use_ratio=True)[source]

Return a figure of #{cell, none_zero_genes(cell) > x} :param linear_data: data :param use_ratio: plot as ratio or :return:

cyclum.illustration.plot_multi_distr(xs, ys, colors, labels)[source]
cyclum.illustration.plot_pair_color(a, b, color)[source]

either plot an embedding, two dimensions at a time or compare two embeddings :param a: :param b: :param color: :return:

cyclum.illustration.plot_round_color(flat, color)[source]
cyclum.illustration.plot_round_distr_color(flat, label, color_dict)[source]
cyclum.illustration.plot_round_distr_color2(flat, label1, label2, color_dict1, color_dict2)[source]

cyclum.tuning module

Auto tuning.

class cyclum.tuning.CyclumAutoTune(data, max_linear_dims=3, epochs=500, verbose=100, rate=0.0005, early_stop=False, encoder_depth=2, encoder_width=50, dropout_rate=0.1, nonlinear_reg=0.0001, linear_reg=0.0001)[source]

Bases: cyclum.models.ae.AutoEncoder

Circular autoencoder with automatically decided number of linear components

We first perform PCA on the data, and record the MSE of having first 1, 2, …, max_linear_dims + 1 components. We then try to train a circular autoencoder with 0, 1, …, max_linear_dims linear components. We compare circular autoencoder with i linear components with PCA with (i + 1) components, for i = 0, 1, … We record the first i where the difference of loss compared with PCA is greater than both (i - 1) and (i + 1), or just (i + 1) if i == 0.

At the end, this class will be a UNTRAINED model, which has optimal numbers of linear components. You can train it will all your data, more epochs, and better learning rate.

Parameters
  • data – The data used to decide number of linear components. For a large dataset, you may use a representative portion of it.

  • max_linear_dims – maximum number of linear dimensions.

  • epochs – number of epochs for each test

  • verbose – per how many epochs does it report the loss, time consumption, etc.

  • rate – training rate

  • early_stop – Stop checking more linear components when result decided? ONLY affects the elbow plot. NO influence on result.

  • encoder_depth – depth of encoder, i.e., number of hidden layers

  • encoder_width

    width of encoder, one of the following:

    • An integer stands for number of nodes per layer. All hidden layers will have the same number of nodes.

    • A list, whose length is equal to encoder_depth, of integers stand for numbers of nodes of the layers.

  • dropout_rate – rate for dropout.

  • nonlinear_reg – strength of regularization on the nonlinear encoder.

  • linear_reg – strength of regularization on the linear encoder.

Examples:
>>> from cyclum.hdfrw import hdf2mat, mat2hdf
>>> df = hdf2mat('path_to_hdf.h5')
>>> m = CyclumAutoTune(df.values, max_linear_dims=5)
>>> m.train(df.values)
>>> pseudotime = m.predict_pseudotime(df.values)
>>> mat2hdf(pseudotime, 'path_to_pseudotime.h5')
show_bar(root=False)[source]

Show a bar plot for what percentage of more loss is handled by the circular component

Returns

figure object

show_elbow()[source]

Show an elbow plot of both PCA and autoencoder You will observe the time when autoencoder become to have a higher loss than PCA. The previous time is considered as the best model.

Returns

figure object

cyclum.writer module

Writer gives a very fast way of saving and loading float value matrices. It saves matrices in binary and in very rigid format. This avoids overheads in csv reading functions. The R counterpart is also available.

cyclum.writer.int32_to_bytes(x)[source]

Convert an 32 bit int number to little endian 4 byte binary format. This helps writing a integer to a binary file.

Parameters

x (int32) – number to be converted

Returns

4 byte binary

cyclum.writer.read_df_from_binary(file_name_mask)[source]

Read a data frame from a binary file defined by this module

Parameters

file_name_mask

Returns

the data frame

cyclum.writer.read_matrix_from_binary(file_name)[source]

Read a matrix from a binary file defined by this module.

Parameters

file_name (str) – the file to read

Returns

the matrix

cyclum.writer.write_df_to_binary(file_name_mask, df)[source]

Write a data frame to a file. Compared with matrix, it has column and row names Besides the row names and column names, the data frame must contain only float values.

Two files will be saved. For exmaple, a call write_df_to_binary(“example”, df) will output an “example-value.bin” and “example-name.txt”. They store the matrix and the column and row names separately.

Parameters
  • file_name_mask (str) – the stem of the file name

  • df – the data frame to write

Returns

None

cyclum.writer.write_matrix_to_binary(file_name, val)[source]

Write an (unnamed) matrix to a file. The matrix should contain only float values, or at least convertible to float.

Parameters
  • file_name (str) – name of file

  • val – The matrix to write

Returns

None

Module contents

Top-level package for cyclum.