API reference

This page gives an overview of all public bioScience objects, functions and methods. All classes and functions exposed in bioscience.* namespace are public.

The following subpackages are public.

bioscience.base: It contains functions for the management of I/O operations, files and objects and methods that form the core of the library and are used by the rest of the library’s subpackages.
bioscience.preprocess: This subpackage includes the functions associated with each of the preprocessing methods implemented in bioScience.
bioscience.dataMining: This subpackage contains all the source code for those data mining techniques that can be used in bioScience.

Within each subpackage there may be various functions and methods that may be private because they are often intermediate operations of the implemented methods.

Contents:

API reference

bioscience.base

bioscience.base.files.load(path, separator='\t', skipr=0, naFilter=False, index_gene=-1, index_lengths=-1, head=None) → DataFrame

Load any data from a txt or csv file. (Reuse function)

Parameters

file_path (str) – The path where the file is stored.
separator (str,optional) – An attribute indicating how the columns of the file are separated.
skipr (int, optional) – Number of rows the user wishes to omit from the file, defaults to 0.
naFilter (boolean, optional) – Boolean to detect NA values in a file. NA values shall be replaced by 0’s, defaults to False.
index_gene (int, optional) – Column position where the gene names are stored in the dataset, defaults to -1 (deactivated).
index_lengths (int, optional) – Column position where the gene lengths are store in the dataset, defaults to -1 (deactivated).
head (int, optional) – Row number(s) containing column labels and marking the start of the data (zero-indexed), defaults to None.

Returns

A dataset object from the reading of a file

Return type

bioscience.base.models.Dataset

bioscience.base.files.saveBinaryDatasets(path, datasets)

If the dataset has been binarised, this function allows storing the binary dataset.

Parameters

path (str) – The path where the file will be stored.
datasets (bioscience.base.models.Dataset) – The dataset object which stores the binary dataset.

bioscience.base.files.saveGenes(path, models, data)

Save the gene names from the results of applying a data mining technique.

Parameters

path (str) – The path where the file will be stored.
models (bioscience.dataMining.biclustering.BiclusteringModel) – The results of the data mining technique.
data (bioscience.base.models.Dataset) – The dataset object which stores the original dataset.

bioscience.base.files.saveResults(path, models, data)

Save the results of applying a data mining technique.

Parameters

path (str) – The path where the file will be stored.
models (bioscience.dataMining.biclustering.BiclusteringModel) – The results of the data mining technique.
data (bioscience.base.models.Dataset) – The dataset object which stores the original dataset.

bioscience.base.files.saveResultsIndex(path, models)

Save the results index (rows and columns index of the dataset) of applying a data mining technique.

Parameters

path (str) – The path where the file will be stored.
models (bioscience.dataMining.biclustering.BiclusteringModel) – An attribute indicating how the columns of the file are separated.

class bioscience.base.models.Bicluster(rows, cols=None, data=None, validations=None)

Bases: object

This is a conceptual class representing a bicluster after applying a Biclustering technique.

Parameters

rows (np.array) – Rows of the bicluster.
cols (np.array, optional) – Columns of the bicluster.
data (np.array, optional) – Bicluster values according to the original dataset.
validations (np.array, optional) – A set of instances from bioscience.base.models.Validation.

property cols: Getter and setter methods of the cols property.

property data: Getter and setter methods of the data property.

property rows: Getter and setter methods of the rows property.

sizeBicluster(): Number of total elements in the bicluster.

sort(): Sort the column and row array by theirs indices.

property validations: Getter and setter methods of the validations property.

class bioscience.base.models.BiclusteringModel(results=None)

Bases: object

This is a conceptual class representing a set of biclusters generated after applying a Biclustering technique.

Parameters

results (set(bioscience.base.models.Bicluster), optional) – Data structure (set) that stores all biclusters after running a Biclustering algorithm.
executionTime (float, optional) – Time taken to execute the Biclustering method.

property executionTime: Getter and setter methods of the executionTime property.

property results: Getter and setter methods of the results property.

class bioscience.base.models.Dataset(data, geneNames=None, columnsNames=None, lengths=None, annotations=None, cut=None)

Bases: object

This is a concept class representing a dataset.

Parameters

original (np.array) – Here the original dataset is stored in a NumPy array.
data (np.array) – This attribute is used to store the dataset once it has undergone the transformations desired by the user.
geneNames (np.array, optional) – Array with the name of the genes involved in the dataset. If the dataset does not have the name of the genes, it shall be replaced by a set of sequential numbers.
columnsNames (np.array, optional) – Array with the name of the columns involved in the dataset. If the dataset does not have the name of the columns, it shall be replaced by a set of sequential numbers.
lengths (np.array, optional) – Array with gene length value (RNA-Seq)
annotations (np.array, optional) – Array that stores data from an annotation file for subsequent validation phases.
cut (float, optional) – Cut-off parameter used in level binarisation.

property annotations: Getter and setter methods of the annotations property.

property columnsNames: Getter and setter methods of the columnsNames property.

property cut: Getter and setter methods of the cut property.

property data: Getter and setter methods of the data property.

property geneNames: Getter and setter methods of the geneNames property.

property lengths: Getter and setter methods of the lengths property.

property original: Getter and setter methods of the original property.

class bioscience.base.models.Validation(measure, value)

Bases: object

This is a conceptual class representing a validation model after applying a data mining technique.

Parameters

measure (str) – Name of the validation measure used.
value (float) – Value of the validation measure used.

property measure: Getter and setter methods of the measure property.

property value: Getter and setter methods of the value property.

bioscience.preprocess

bioscience.preprocess.Standard.discretize(dataset, n_bins=2, strategy='kmeans')

Discretizes data into N bins.

Parameters

dataset (bioscience.base.models.Dataset) – The dataset object to be discretized.
n_bins (int, optional) – Number of bins to produce, defaults to 2.
strategy (str, optional) – Strategy used to define the widths of the bins. Options (kmeans, quantile, uniforme). Defaults to ‘kmeans’.

bioscience.preprocess.Standard.normalDistributionQuantile(dataset, quantiles=1000)

Use Normal Distribution Quantile to preprocess a dataset.

Parameters

dataset (bioscience.base.models.Dataset) – The dataset object to be normal distribution quantile.
quantiles (int, optional) – Number of quantiles to be computed., defaults to 1000.

bioscience.preprocess.Standard.outliers(dataset, view=True, mode=1, replace=3)

Detects and modifies outliers in a dataset.

Parameters

dataset (bioscience.base.models.Dataset) – The dataset object to be checked.
view (boolean, optional) – Graphical visualisation through BoxPlot to identify outliers in the columns of the dataset.
mode (int, optional) – Type of outlier to be detected. If mild outliers are to be detected the value is 1. For extreme outliers the value is 2. Defaults to 1.
replace – Treatment of outliers. If the value is 1, rows containing outliers are deleted. If the value is 2, it shall be replaced by the maximum value when they are outliers above the maximum threshold and the minimum value when they are outliers below the minimum threshold. If the value is 3, outliers shall be replaced by the median. Defaults to 3.

bioscience.preprocess.Standard.scale(dataset)

Scale a dataset

Parameters: dataset (bioscience.base.models.Dataset) – The dataset object to be scaled.

bioscience.preprocess.Standard.standardize(dataset)

Standardize a dataset

Parameters: dataset (bioscience.base.models.Dataset) – The dataset object to be standarized.

bioscience.preprocess.RnaSeq.cpm(dataset)

Apply CPM (counts per million) method preprocessing to a dataset

Parameters: dataset (bioscience.base.models.Dataset) – The dataset object to be preprocess.

bioscience.preprocess.RnaSeq.deseq2Norm(dataset)

Apply DESeq2 normalization method preprocessing to a dataset

Parameters: dataset (bioscience.base.models.Dataset) – The dataset object to be preprocess.

bioscience.preprocess.RnaSeq.fpkm(dataset)

Apply FPKM (transcript per million mapped reads) method preprocessing to a dataset

Parameters: dataset (bioscience.base.models.Dataset) – The dataset object to be preprocess.

bioscience.preprocess.RnaSeq.tpm(dataset)

Apply TPM (transcripts-per-million) method preprocessing to a dataset

Parameters: dataset (bioscience.base.models.Dataset) – The dataset object to be preprocess.

bioscience.preprocess.Binarization.binarize(dataset, threshold=0.0, soc=None)

Applying binarisation to a dataset

Parameters

dataset (bioscience.base.models.Dataset) – The dataset object to be binarized.
threshold (float, optional) – Feature values below or equal to this are replaced by 0, above it by 1. Threshold may not be less than 0 for operations on sparse matrices, defaults to 0.0
soc (int, optional) – Threshold representing the number of ones each row of the dataset should have as a minimum. If a row does not exceed this threshold, it shall be removed from the dataset, defaults to None.

bioscience.preprocess.Binarization.binarizeLevels(dataset, inactiveLevel=None, activeLevel=None, cut=0.5, step=0.1, soc=None)

Generate multiple binary datasets by applying fuzzy logic.

Parameters

dataset (bioscience.base.models.Dataset) – The dataset object to be binarized.
inactiveLevel (float, optional) – If an element in the dataset is below this threshold it is considered an inactive gene (value equal to 0 in a binarised dataset).
activeLevel (float, optional) – If an item in the dataset is above this threshold it is considered an active gene (value equal to 1 in a binarised dataset).
cut (float, optional) – Value used in fuzzy logic to determine up to which value an active gene is to be considered. This value will assist in the creation of multiple binarised datasets.
step (float, optional) – Value used in fuzzy logic to determine how much the value in fuzzy logic should be lowered for each binary dataset generated. This value will assist in the creation of multiple binarised datasets.
soc (int, optional) – Threshold representing the number of ones each row of the dataset should have as a minimum. If a row does not exceed this threshold, it shall be removed from the dataset, defaults to None.

bioscience.dataMining

Biclustering

bioscience.dataMining.biclustering.Biclustering.bcca(dataset, correlationThreshold=0.7, minCols=3, deviceCount=1, mode=1, debug=False)

Main function processing the BiBit Biclustering algorithm.

Parameters

dataset (bioscience.base.models.Dataset) – The dataset object store the data of input file.
cMnr (int, optional) – Minimum number of rows to build a valid bicluster, defaults to 2.
cMnc (int, optional) – Minimum number of columns to build a valid bicluster, defaults to 2.
deviceCount (int, optional) – Number of GPU devices to execute, defaults to 1.
mode (boolean, optional) – Type of execution of the algorithm: mode=1 for sequential execution, mode=2 for parallel execution on CPUs and mode=3 for execution on a multi-GPU architecture. Defaults to 1.
mode – Attribute used to run the algorithm in debug mode, defaults to False

Returns

A set of BiclusteringModel objects that stores all biclusters generated by the BiBit algorithm.

Return type

set(bioscience.base.models.BiclusteringModel)

bioscience.dataMining.biclustering.Biclustering.bibit(dataset, cMnr=2, cMnc=2, deviceCount=1, mode=1, debug=False)

Main function processing the BiBit Biclustering algorithm.

Parameters

dataset (bioscience.base.models.Dataset) – The dataset object store the data of input file.
cMnr (int, optional) – Minimum number of rows to build a valid bicluster, defaults to 2.
cMnc (int, optional) – Minimum number of columns to build a valid bicluster, defaults to 2.
deviceCount (int, optional) – Number of GPU devices to execute, defaults to 1.
mode (boolean, optional) – Type of execution of the algorithm: mode=1 for sequential execution, mode=2 for parallel execution on CPUs and mode=3 for execution on a multi-GPU architecture. Defaults to 1.
mode – Attribute used to run the algorithm in debug mode, defaults to False

Returns

A set of BiclusteringModel objects that stores all biclusters generated by the BiBit algorithm.

Return type

set(bioscience.base.models.BiclusteringModel)

bioscience.dataMining.biclustering.BiBit.processBiBit(dataset, cMnr, cMnc, deviceCount, mode, debug)

Sub-function processing the BiBit Biclustering algorithm.

Parameters

dataset (bioscience.base.models.Dataset) – The dataset object store the data of input file.
cMnr (int) – Minimum number of rows to build a valid bicluster.
cMnc (int) – Minimum number of columns to build a valid bicluster.
deviceCount (int) – Number of GPU devices to execute
mode (boolean) – Type of execution of the algorithm: mode=1 for sequential execution, mode=2 for parallel execution on CPUs and mode=3 for execution on a multi-GPU architecture.
mode – Attribute used to run the algorithm in debug mode.

Returns

A BiclusteringModel object that stores all biclusters generated by the BiBit algorithm.

Return type

bioscience.base.models.BiclusteringModel

bioscience.dataMining.biclustering.BiBit.threadsPerDevice_64(resultsQueue, i, s, chunks, bicsPerGpuPrevious, patternsPerRun, mInputData, m, cMnr, cMnc, debug): Function used for the creation of a multi-GPU architecture.