API reference

This page gives an overview of all public bioScience objects, functions and methods. All classes and functions exposed in bioscience.* namespace are public.

The following subpackages are public.

  • bioscience.base: It contains functions for the management of I/O operations, files and objects and methods that form the core of the library and are used by the rest of the library’s subpackages.

  • bioscience.preprocess: This subpackage includes the functions associated with each of the preprocessing methods implemented in bioScience.

  • bioscience.stats: This subpackage includes the functions associated with each of the statistical methods implemented in bioScience.

  • bioscience.dataMining: This subpackage contains all the source code for those data mining techniques that can be used in bioScience.

Within each subpackage there may be various functions and methods that may be private because they are often intermediate operations of the implemented methods.

bioscience.base

bioscience.base.files.load(db, apiKey=None, separator='\t', skipr=0, naFilter=False, index_gene=-1, index_lengths=-1, head=None) DataFrame

Load any data from a txt or csv file. (Reuse function)

Parameters
  • db (str) – The path where the file is stored or ID database

  • apiKey (str) – API Key NCBI

  • separator (str,optional) – An attribute indicating how the columns of the file are separated.

  • skipr (int, optional) – Number of rows the user wishes to omit from the file, defaults to 0.

  • naFilter (boolean, optional) – Boolean to detect NA values in a file. NA values shall be replaced by 0’s, defaults to False.

  • index_gene (int, optional) – Column position where the gene names are stored in the dataset, defaults to -1 (deactivated).

  • index_lengths (int, optional) – Column position where the gene lengths are store in the dataset, defaults to -1 (deactivated).

  • head (int, optional) – Row number(s) containing column labels and marking the start of the data (zero-indexed), defaults to None.

Returns

A dataset object from the reading of a file

Return type

bioscience.base.models.Dataset

bioscience.base.files.loadNetwork(db, index_nodeA, index_nodeB, index_weight, separator='\t', skipr=0, head=None)

Load a network dataset from a file.

Parameters
  • db (str) – The path where the file is stored

  • index_nodeA (int) – Column position where the node A gene names are stored in the dataset, defaults to -1 (deactivated).

  • index_nodeB (int) – Column position where the node B gene names are stored in the dataset, defaults to -1 (deactivated).

  • separator (str,optional) – An attribute indicating how the columns of the file are separated.

  • skipr (int, optional) – Number of rows the user wishes to omit from the file, defaults to 0.

  • head (int, optional) – Row number(s) containing column labels and marking the start of the data (zero-indexed), defaults to None.

Index_weight

Column position where the edge weight is stored in the dataset, defaults to -1 (deactivated).

Returns

A network dataset object from the reading of a file

Return type

bioscience.base.models.NetworkDataset

bioscience.base.files.saveBinaryDatasets(path, datasets)

If the dataset has been binarised, this function allows storing the binary dataset.

Parameters
  • path (str) – The path where the file will be stored.

  • datasets (bioscience.base.models.Dataset) – The dataset object which stores the binary dataset.

bioscience.base.files.saveGenes(path, models, data)

Save the gene names from the results of applying a data mining technique.

Parameters
  • path (str) – The path where the file will be stored.

  • models (bioscience.dataMining.biclustering.BiclusteringModel) – The results of the data mining technique.

  • data (bioscience.base.models.Dataset) – The dataset object which stores the original dataset.

bioscience.base.files.saveResults(path, models, data)

Save the results of applying a data mining technique.

Parameters
  • path (str) – The path where the file will be stored.

  • models (bioscience.dataMining.biclustering.BiclusteringModel) – The results of the data mining technique.

  • data (bioscience.base.models.Dataset) – The dataset object which stores the original dataset.

bioscience.base.files.saveResultsIndex(path, models)

Save the results index (rows and columns index of the dataset) of applying a data mining technique.

Parameters
  • path (str) – The path where the file will be stored.

  • models (bioscience.dataMining.biclustering.BiclusteringModel) – An attribute indicating how the columns of the file are separated.

class bioscience.base.models.Bicluster(rows, cols=None, data=None, validations=None)

Bases: object

This is a conceptual class representing a bicluster after applying a Biclustering technique.

Parameters
  • rows (np.array) – Rows of the bicluster.

  • cols (np.array, optional) – Columns of the bicluster.

  • data (np.array, optional) – Bicluster values according to the original dataset.

  • validations (np.array, optional) – A set of instances from bioscience.base.models.Validation.

property cols

Getter and setter methods of the cols property.

property data

Getter and setter methods of the data property.

property rows

Getter and setter methods of the rows property.

sizeBicluster()

Number of total elements in the bicluster.

sort()

Sort the column and row array by theirs indices.

property validations

Getter and setter methods of the validations property.

class bioscience.base.models.BiclusteringModel(results=None)

Bases: object

This is a conceptual class representing a set of biclusters generated after applying a Biclustering technique.

Parameters
  • results (set(bioscience.base.models.Bicluster), optional) – Data structure (set) that stores all biclusters after running a Biclustering algorithm.

  • executionTime (float, optional) – Time taken to execute the Biclustering method.

property executionTime

Getter and setter methods of the executionTime property.

property results

Getter and setter methods of the results property.

class bioscience.base.models.CorrelationModel(name, results, rows, executionTime=None)

Bases: object

It is a conceptual class that represents the results generated by a correlation method.

Parameters
  • results – NumPy vector that stores the results of the row pairs of a dataset.

  • executionTime (float, optional) – Time taken to execute the Correlation method.

property executionTime

Getter and setter methods of the executionTime property.

property geneInteractionsIndex

Getter and setter methods of the results property.

property name

Getter and setter methods of the name property.

property results

Getter and setter methods of the results property.

class bioscience.base.models.Dataset(data, geneNames=None, columnsNames=None, lengths=None, annotations=None, cut=None)

Bases: object

This is a concept class representing a dataset.

Parameters
  • original (np.array) – Here the original dataset is stored in a NumPy array.

  • data (np.array) – This attribute is used to store the dataset once it has undergone the transformations desired by the user.

  • geneNames (np.array, optional) – Array with the name of the genes involved in the dataset. If the dataset does not have the name of the genes, it shall be replaced by a set of sequential numbers.

  • columnsNames (np.array, optional) – Array with the name of the columns involved in the dataset. If the dataset does not have the name of the columns, it shall be replaced by a set of sequential numbers.

  • lengths (np.array, optional) – Array with gene length value (RNA-Seq)

  • annotations (np.array, optional) – Array that stores data from an annotation file for subsequent validation phases.

  • cut (float, optional) – Cut-off parameter used in level binarisation.

property annotations

Getter and setter methods of the annotations property.

property columnsNames

Getter and setter methods of the columnsNames property.

property cut

Getter and setter methods of the cut property.

property data

Getter and setter methods of the data property.

property geneNames

Getter and setter methods of the geneNames property.

property lengths

Getter and setter methods of the lengths property.

property original

Getter and setter methods of the original property.

class bioscience.base.models.Edge(nodeA, nodeB, weight, weightRelatedValues, info)

Bases: object

This is a conceptual class represented a edge of a genetic network.

Parameters
  • nodeA (bioscience.base.models.Node) – Beginning node of the edge

  • nodeB (bioscience.base.models.Node) – End node of the edge

  • weight (float) – Weight of the edge.

  • weightRelatedValues – Values related to the weight such as Spearman or Kendall

  • weightRelatedValues – np.array

  • info (np.array) – Aditional information of the edge.

property info

Getter and setter methods of the extraCharacteristics property.

property nodeA

Getter and setter methods of the nodeA property.

property nodeB

Getter and setter methods of the nodeB property.

property weight

Getter and setter methods of the weight property.

property weightRelatedValues

Getter and setter methods of the weightRelatedValues

class bioscience.base.models.NCBIClient(idDB, apiKey=None)

Bases: object

property apiKey

Getter and setter methods of the apiKey property.

property baseUrl

Getter and setter methods of the baseUrl property.

getIdsByGeo()
getSummaryById(id)
property idDB

Getter and setter methods of the idDB property.

class bioscience.base.models.NCBIDataset(accessionNumber, title=None, summary=None, gpl=None, gse=None, taxonomy=None, gdstype=None, suppfile=None, nSamples=None, link=None, bioProject=None, samples=None)

Bases: object

This is a concept class representing a information of NCBI dataset.

Parameters
  • accessionNumber (str) – Accession number of the dataset.

  • data (np.array) – This attribute is used to store the dataset once it has undergone the transformations desired by the user.

  • geneNames (np.array, optional) – Array with the name of the genes involved in the dataset. If the dataset does not have the name of the genes, it shall be replaced by a set of sequential numbers.

  • columnsNames (np.array, optional) – Array with the name of the columns involved in the dataset. If the dataset does not have the name of the columns, it shall be replaced by a set of sequential numbers.

  • lengths (np.array, optional) – Array with gene length value (RNA-Seq)

  • annotations (np.array, optional) – Array that stores data from an annotation file for subsequent validation phases.

  • cut (float, optional) – Cut-off parameter used in level binarisation.

property accessionNumber

Getter and setter methods of the accessionNumber property.

property bioProject

Getter and setter methods of the bioProject property.

fullInfo()
property gdstype

Getter and setter methods of the gdstype property.

property gpl

Getter and setter methods of the gpl property.

property gse

Getter and setter methods of the gse property.

Getter and setter methods of the link property.

property nSamples

Getter and setter methods of the nSamples property.

property samples

Getter and setter methods of the samples property.

property summary

Getter and setter methods of the summary property.

property suppfile

Getter and setter methods of the suppfile property.

property taxonomy

Getter and setter methods of the taxonomy property.

property title

Getter and setter methods of the title property.

class bioscience.base.models.Network(node, edge, validations=None, directed=None)

Bases: object

This is a conceptual class representing a genetic network after applying a Genetic Network technique.

Parameters
  • node (np.array) – List of the nodes of the network.

  • edge (np.array) – List of the edges of the network.

  • validations (np.array, optional) – A set of instances from bioscience.base.models.Validation.

  • directed (boolean, optional) – Represents if a network is a directed graph or not.

property directed
property edges

Getter and setter methods of the edges property

property nodes

Getter and setter methods of the edges property

shared_edges_count(gene)

Function to count the number of edges shared by a given gene (node).

Parameters

gene (str or int) – The gene (node) to check for shared edges.

Returns

Number of edges shared by the given gene.

Return type

int

property sizeEdges
property sizeNodes
property validations

Getter and setter methods of the validation property

class bioscience.base.models.NetworkDataset(data, geneNamesNodeA=None, geneNamesNodeB=None, columnsNames=None, extraInfo=None, importantColumnsName=None)

Bases: Dataset

This is a concept class representing a network dataset.

Parameters
  • data (np.array) – This attribute is used to store the dataset once it has undergone the transformations desired by the user.

  • geneNamesNodeA – Array with the name of the genes involved in the dataset. If the dataset does not have the name of the genes, it shall be replaced by a set of sequential numbers.

  • geneNamesNodeB – Array with the name of the genes involved in the dataset. If the dataset does not have the name of the genes, it shall be replaced by a set of sequential numbers.

  • columnsNames (np.array, optional) – Array with the name of the columns involved in the dataset. If the dataset does not have the name of the columns, it shall be replaced by a set of sequential numbers.

  • extraInfo (np.array, optional) – Array that stores extra information about the network dataset.

  • importantColumnsName (np.array, optional) – Name of the column that contains the most relevant information of the dataset.

property extraInfo

Getter and setter methods of the extraInfo property.

property geneNamesNodeA

Getter and setter methods of the geneNamesNodeA property.

property geneNamesNodeB

Getter and setter methods of the geneNamesNodeB property.

property importantColumnsName

Getter and setter methods of the importantColumnsName property.

class bioscience.base.models.NetworkModel(results=None)

Bases: object

This is a conceptual class representing a set of networks generated after applying a Network technique.

Parameters
  • results (set(bioscience.base.models.Network), optional) – Data structure (set) that stores all networks after running a Network algorithm.

  • executionTime (float, optional) – Time taken to execute the Network method.

property executionTime

Getter and setter methods of the executionTime property.

property results

Getter and setter methods of the results property.

sort()
class bioscience.base.models.Node(info, id=None, name=None)

Bases: object

This is a conceptual class represented a node of a genetic network.

Parameters
  • info (np.array) – Aditional information of the node.

  • id (integer, optional.) – Id of the gene.

  • name (string, optional.) – Name of the gene.

property extraCharacteristics

Getter and setter methods of the extraCharacteristics property.

property id

Getter and setter methods of the id property.

property info

Getter and setter methods of the extraCharacteristics property.

property name

Getter and setter methods of the name property.

class bioscience.base.models.Validation(measure, value)

Bases: object

This is a conceptual class representing a validation model after applying a data mining technique.

Parameters
  • measure (str) – Name of the validation measure used.

  • value (float) – Value of the validation measure used.

property measure

Getter and setter methods of the measure property.

property value

Getter and setter methods of the value property.

bioscience.preprocess

bioscience.preprocess.Standard.discretize(dataset, n_bins=2, strategy='kmeans')

Discretizes data into N bins.

Parameters
  • dataset (bioscience.base.models.Dataset) – The dataset object to be discretized.

  • n_bins (int, optional) – Number of bins to produce, defaults to 2.

  • strategy (str, optional) – Strategy used to define the widths of the bins. Options (kmeans, quantile, uniforme). Defaults to ‘kmeans’.

bioscience.preprocess.Standard.normalDistributionQuantile(dataset, quantiles=1000)

Use Normal Distribution Quantile to preprocess a dataset.

Parameters
  • dataset (bioscience.base.models.Dataset) – The dataset object to be normal distribution quantile.

  • quantiles (int, optional) – Number of quantiles to be computed., defaults to 1000.

bioscience.preprocess.Standard.outliers(dataset, view=True, mode=1, replace=3)

Detects and modifies outliers in a dataset.

Parameters
  • dataset (bioscience.base.models.Dataset) – The dataset object to be checked.

  • view (boolean, optional) – Graphical visualisation through BoxPlot to identify outliers in the columns of the dataset.

  • mode (int, optional) – Type of outlier to be detected. If mild outliers are to be detected the value is 1. For extreme outliers the value is 2. Defaults to 1.

  • replace – Treatment of outliers. If the value is 1, rows containing outliers are deleted. If the value is 2, it shall be replaced by the maximum value when they are outliers above the maximum threshold and the minimum value when they are outliers below the minimum threshold. If the value is 3, outliers shall be replaced by the median. Defaults to 3.

bioscience.preprocess.Standard.scale(dataset)

Scale a dataset

Parameters

dataset (bioscience.base.models.Dataset) – The dataset object to be scaled.

bioscience.preprocess.Standard.standardize(dataset)

Standardize a dataset

Parameters

dataset (bioscience.base.models.Dataset) – The dataset object to be standarized.

bioscience.preprocess.RnaSeq.cpm(dataset)

Apply CPM (counts per million) method preprocessing to a dataset

Parameters

dataset (bioscience.base.models.Dataset) – The dataset object to be preprocess.

bioscience.preprocess.RnaSeq.deseq2Norm(dataset)

Apply DESeq2 normalization method preprocessing to a dataset

Parameters

dataset (bioscience.base.models.Dataset) – The dataset object to be preprocess.

bioscience.preprocess.RnaSeq.fpkm(dataset)

Apply FPKM (transcript per million mapped reads) method preprocessing to a dataset

Parameters

dataset (bioscience.base.models.Dataset) – The dataset object to be preprocess.

bioscience.preprocess.RnaSeq.tpm(dataset)

Apply TPM (transcripts-per-million) method preprocessing to a dataset

Parameters

dataset (bioscience.base.models.Dataset) – The dataset object to be preprocess.

bioscience.preprocess.Binarization.binarize(dataset, threshold=0.0, soc=None)

Applying binarisation to a dataset

Parameters
  • dataset (bioscience.base.models.Dataset) – The dataset object to be binarized.

  • threshold (float, optional) – Feature values below or equal to this are replaced by 0, above it by 1. Threshold may not be less than 0 for operations on sparse matrices, defaults to 0.0

  • soc (int, optional) – Threshold representing the number of ones each row of the dataset should have as a minimum. If a row does not exceed this threshold, it shall be removed from the dataset, defaults to None.

bioscience.preprocess.Binarization.binarizeLevels(dataset, inactiveLevel=None, activeLevel=None, cut=0.5, step=0.1, soc=None)

Generate multiple binary datasets by applying fuzzy logic.

Parameters
  • dataset (bioscience.base.models.Dataset) – The dataset object to be binarized.

  • inactiveLevel (float, optional) – If an element in the dataset is below this threshold it is considered an inactive gene (value equal to 0 in a binarised dataset).

  • activeLevel (float, optional) – If an item in the dataset is above this threshold it is considered an active gene (value equal to 1 in a binarised dataset).

  • cut (float, optional) – Value used in fuzzy logic to determine up to which value an active gene is to be considered. This value will assist in the creation of multiple binarised datasets.

  • step (float, optional) – Value used in fuzzy logic to determine how much the value in fuzzy logic should be lowered for each binary dataset generated. This value will assist in the creation of multiple binarised datasets.

  • soc (int, optional) – Threshold representing the number of ones each row of the dataset should have as a minimum. If a row does not exceed this threshold, it shall be removed from the dataset, defaults to None.

bioscience.stats

Correlation methods

bioscience.stats.correlation.continuous.DistCorr.distcorr(dataset, deviceCount=0, mode=1, debug=False)

Application of the Distance Correlation (distcorr) method

Parameters

dataset (bioscience.base.models.Dataset) – The dataset object to be binarized.

param deviceCount: Number of GPU devices to execute :type deviceCount: int

Parameters

mode (int) – Type of execution of the algorithm: mode=1 for sequential execution, mode=2 for parallel execution on CPUs and mode=3 for execution on a multi-GPU architecture.

Returns

A CorrelationModel object that stores values generated by a correlation method.

Return type

bioscience.base.models.CorrelationModel

bioscience.stats.correlation.continuous.Median.median(dataset, deviceCount=0, mode=1, debug=False)

Application of the median metric.

Parameters

dataset (bioscience.base.models.Dataset) – The dataset object to be binarized.

param deviceCount: Number of GPU devices to execute :type deviceCount: int

Parameters

mode (int) – Type of execution of the algorithm: mode=1 for sequential execution, mode=2 for parallel execution on CPUs and mode=3 for execution on a multi-GPU architecture.

Returns

A CorrelationModel object that stores values generated by a correlation method.

Return type

bioscience.base.models.CorrelationModel

bioscience.stats.correlation.continuous.Pearson.pearson(dataset, deviceCount=0, mode=1, debug=False)

Application of the Pearson correlation method.

Parameters

dataset (bioscience.base.models.Dataset) – The dataset object to be binarized.

param deviceCount: Number of GPU devices to execute :type deviceCount: int

Parameters

mode (int) – Type of execution of the algorithm: mode=1 for sequential execution, mode=2 for parallel execution on CPUs and mode=3 for execution on a multi-GPU architecture.

Returns

A CorrelationModel object that stores values generated by a correlation method.

Return type

bioscience.base.models.CorrelationModel

bioscience.stats.correlation.continuous.Quadrant.q(dataset, deviceCount=0, mode=1, debug=False)

Application of the quadrant (Q) metric.

Parameters

dataset (bioscience.base.models.Dataset) – The dataset object to be binarized.

param deviceCount: Number of GPU devices to execute :type deviceCount: int

Parameters

mode (int) – Type of execution of the algorithm: mode=1 for sequential execution, mode=2 for parallel execution on CPUs and mode=3 for execution on a multi-GPU architecture.

Returns

A CorrelationModel object that stores values generated by a correlation method.

Return type

bioscience.base.models.CorrelationModel

bioscience.stats.correlation.dichotomic.Log_odds.log_odds(dataset, deviceCount=0, mode=1, debug=False)

Application of the Log-odds ratio.

Parameters

dataset (bioscience.base.models.Dataset) – The dataset object to be binarized.

param deviceCount: Number of GPU devices to execute :type deviceCount: int

Parameters

mode (int) – Type of execution of the algorithm: mode=1 for sequential execution, mode=2 for parallel execution on CPUs and mode=3 for execution on a multi-GPU architecture.

Returns

A CorrelationModel object that stores values generated by a correlation method.

Return type

bioscience.base.models.CorrelationModel

bioscience.stats.correlation.dichotomic.MCC.mcc(dataset, deviceCount=0, mode=1, debug=False)

Application of the Matthews Correlation Coefficient.

Parameters

dataset (bioscience.base.models.Dataset) – The dataset object to be binarized.

param deviceCount: Number of GPU devices to execute :type deviceCount: int

Parameters

mode (int) – Type of execution of the algorithm: mode=1 for sequential execution, mode=2 for parallel execution on CPUs and mode=3 for execution on a multi-GPU architecture.

Returns

A CorrelationModel object that stores values generated by a correlation method.

Return type

bioscience.base.models.CorrelationModel

bioscience.stats.correlation.dichotomic.PBC.pbc(dataset, deviceCount=0, mode=1, debug=False)

Application of the Point-biserial correlation (PBC) method.

Parameters

dataset (bioscience.base.models.Dataset) – The dataset object to be binarized.

param deviceCount: Number of GPU devices to execute :type deviceCount: int

Parameters

mode (int) – Type of execution of the algorithm: mode=1 for sequential execution, mode=2 for parallel execution on CPUs and mode=3 for execution on a multi-GPU architecture.

Returns

A CorrelationModel object that stores values generated by a correlation method.

Return type

bioscience.base.models.CorrelationModel

bioscience.stats.correlation.mixed.ARI.ari(dataset, deviceCount=0, mode=1, debug=False)

Application of the Adjusted Rand Index (ARI) method.

Parameters

dataset (bioscience.base.models.Dataset) – The dataset object to be binarized.

param deviceCount: Number of GPU devices to execute :type deviceCount: int

Parameters

mode (int) – Type of execution of the algorithm: mode=1 for sequential execution, mode=2 for parallel execution on CPUs and mode=3 for execution on a multi-GPU architecture.

Returns

A CorrelationModel object that stores values generated by a correlation method.

Return type

bioscience.base.models.CorrelationModel

bioscience.stats.correlation.mixed.CC.cc(dataset, deviceCount=0, mode=1, debug=False)

Application of the Contingency Coefficient (CC) method.

Parameters

dataset (bioscience.base.models.Dataset) – The dataset object to be binarized.

param deviceCount: Number of GPU devices to execute :type deviceCount: int

Parameters

mode (int) – Type of execution of the algorithm: mode=1 for sequential execution, mode=2 for parallel execution on CPUs and mode=3 for execution on a multi-GPU architecture.

Returns

A CorrelationModel object that stores values generated by a correlation method.

Return type

bioscience.base.models.CorrelationModel

bioscience.stats.correlation.mixed.MI.mi(dataset, deviceCount=0, mode=1, debug=False)

Application of the Mutual Information (MI) correlation method.

Parameters

dataset (bioscience.base.models.Dataset) – The dataset object to be binarized.

param deviceCount: Number of GPU devices to execute :type deviceCount: int

Parameters

mode (int) – Type of execution of the algorithm: mode=1 for sequential execution, mode=2 for parallel execution on CPUs and mode=3 for execution on a multi-GPU architecture.

Returns

A CorrelationModel object that stores values generated by a correlation method.

Return type

bioscience.base.models.CorrelationModel

bioscience.stats.correlation.mixed.NMI.nmi(dataset, deviceCount=0, mode=1, debug=False)

Application of the Normalized Mutual Information (NMI) correlation method.

Parameters

dataset (bioscience.base.models.Dataset) – The dataset object to be binarized.

param deviceCount: Number of GPU devices to execute :type deviceCount: int

Parameters

mode (int) – Type of execution of the algorithm: mode=1 for sequential execution, mode=2 for parallel execution on CPUs and mode=3 for execution on a multi-GPU architecture.

Returns

A CorrelationModel object that stores values generated by a correlation method.

Return type

bioscience.base.models.CorrelationModel

bioscience.stats.correlation.ordinal.HoeffdingsD.hoeffdingsD(dataset, deviceCount=0, mode=1, debug=False)

Application of the Hoeffding’s D correlation method. A Hoeffdings’D value of 0 indicates a clear independence between variables; whereas, a value of 1 indicates a perfect dependence between variables. A negative value close to -1 may indicate a very strong dependence between the variables, in the opposite direction to independence.

Parameters

dataset (bioscience.base.models.Dataset) – The dataset object to be binarized.

param deviceCount: Number of GPU devices to execute :type deviceCount: int

Parameters

mode (int) – Type of execution of the algorithm: mode=1 for sequential execution, mode=2 for parallel execution on CPUs and mode=3 for execution on a multi-GPU architecture.

Returns

A CorrelationModel object that stores values generated by a correlation method.

Return type

bioscience.base.models.CorrelationModel

bioscience.stats.correlation.ordinal.Kendall.kendall(dataset, deviceCount=0, mode=1, debug=False)

Application of the Kendall correlation method.

Parameters

dataset (bioscience.base.models.Dataset) – The dataset object to be binarized.

param deviceCount: Number of GPU devices to execute :type deviceCount: int

Parameters

mode (int) – Type of execution of the algorithm: mode=1 for sequential execution, mode=2 for parallel execution on CPUs and mode=3 for execution on a multi-GPU architecture.

Returns

A CorrelationModel object that stores values generated by a correlation method.

Return type

bioscience.base.models.CorrelationModel

bioscience.stats.correlation.ordinal.Spearman.spearman(dataset, deviceCount=0, mode=1, debug=False)

Application of the Spearman correlation method.

Parameters

dataset (bioscience.base.models.Dataset) – The dataset object to be binarized.

param deviceCount: Number of GPU devices to execute :type deviceCount: int

Parameters

mode (int) – Type of execution of the algorithm: mode=1 for sequential execution, mode=2 for parallel execution on CPUs and mode=3 for execution on a multi-GPU architecture.

Returns

A CorrelationModel object that stores values generated by a correlation method.

Return type

bioscience.base.models.CorrelationModel

Distance methods

bioscience.stats.distance.Cosine.cos(dataset, deviceCount=0, mode=1, debug=False)

Application of the Cosine Similarity.

Parameters

dataset (bioscience.base.models.Dataset) – The dataset object to be binarized.

param deviceCount: Number of GPU devices to execute :type deviceCount: int

Parameters

mode (int) – Type of execution of the algorithm: mode=1 for sequential execution, mode=2 for parallel execution on CPUs and mode=3 for execution on a multi-GPU architecture.

Returns

A CorrelationModel object that stores values generated by a correlation method.

Return type

bioscience.base.models.CorrelationModel

bioscience.stats.distance.Euclidean.euclidean(dataset, deviceCount=0, mode=1, debug=False)

Application of the Euclidean distance.

Parameters

dataset (bioscience.base.models.Dataset) – The dataset object to be binarized.

param deviceCount: Number of GPU devices to execute :type deviceCount: int

Parameters

mode (int) – Type of execution of the algorithm: mode=1 for sequential execution, mode=2 for parallel execution on CPUs and mode=3 for execution on a multi-GPU architecture.

Returns

A CorrelationModel object that stores values generated by a correlation method.

Return type

bioscience.base.models.CorrelationModel

bioscience.stats.distance.Jaccard.jaccard(dataset, deviceCount=0, mode=1, debug=False)

Application of the Jaccard index.

Parameters

dataset (bioscience.base.models.Dataset) – The dataset object to be binarized.

param deviceCount: Number of GPU devices to execute :type deviceCount: int

Parameters

mode (int) – Type of execution of the algorithm: mode=1 for sequential execution, mode=2 for parallel execution on CPUs and mode=3 for execution on a multi-GPU architecture.

Returns

A CorrelationModel object that stores values generated by a correlation method.

Return type

bioscience.base.models.CorrelationModel

bioscience.stats.distance.Manhattan.manhattan(dataset, deviceCount=0, mode=1, debug=False)

Application of the Manhattan distance.

Parameters

dataset (bioscience.base.models.Dataset) – The dataset object to be binarized.

param deviceCount: Number of GPU devices to execute :type deviceCount: int

Parameters

mode (int) – Type of execution of the algorithm: mode=1 for sequential execution, mode=2 for parallel execution on CPUs and mode=3 for execution on a multi-GPU architecture.

Returns

A CorrelationModel object that stores values generated by a correlation method.

Return type

bioscience.base.models.CorrelationModel

bioscience.stats.distance.WeightedJaccard.weightedJaccard(dataset, deviceCount=0, mode=1, debug=False)

Application of the Weighted Jaccard index.

Parameters

dataset (bioscience.base.models.Dataset) – The dataset object to be binarized.

param deviceCount: Number of GPU devices to execute :type deviceCount: int

Parameters

mode (int) – Type of execution of the algorithm: mode=1 for sequential execution, mode=2 for parallel execution on CPUs and mode=3 for execution on a multi-GPU architecture.

Returns

A CorrelationModel object that stores values generated by a correlation method.

Return type

bioscience.base.models.CorrelationModel

bioscience.dataMining

Biclustering

bioscience.dataMining.biclustering.Biclustering.bcca(dataset, correlationThreshold=0.7, minCols=3, deviceCount=1, mode=1, debug=False)

Main function processing the BiBit Biclustering algorithm.

Parameters
  • dataset (bioscience.base.models.Dataset) – The dataset object store the data of input file.

  • cMnr (int, optional) – Minimum number of rows to build a valid bicluster, defaults to 2.

  • cMnc (int, optional) – Minimum number of columns to build a valid bicluster, defaults to 2.

  • deviceCount (int, optional) – Number of GPU devices to execute, defaults to 1.

  • mode (boolean, optional) – Type of execution of the algorithm: mode=1 for sequential execution, mode=2 for parallel execution on CPUs and mode=3 for execution on a multi-GPU architecture. Defaults to 1.

  • mode – Attribute used to run the algorithm in debug mode, defaults to False

Returns

A set of BiclusteringModel objects that stores all biclusters generated by the BiBit algorithm.

Return type

set(bioscience.base.models.BiclusteringModel)

bioscience.dataMining.biclustering.Biclustering.bibit(dataset, cMnr=2, cMnc=2, deviceCount=1, mode=1, debug=False)

Main function processing the BiBit Biclustering algorithm.

Parameters
  • dataset (bioscience.base.models.Dataset) – The dataset object store the data of input file.

  • cMnr (int, optional) – Minimum number of rows to build a valid bicluster, defaults to 2.

  • cMnc (int, optional) – Minimum number of columns to build a valid bicluster, defaults to 2.

  • deviceCount (int, optional) – Number of GPU devices to execute, defaults to 1.

  • mode (boolean, optional) – Type of execution of the algorithm: mode=1 for sequential execution, mode=2 for parallel execution on CPUs and mode=3 for execution on a multi-GPU architecture. Defaults to 1.

  • mode – Attribute used to run the algorithm in debug mode, defaults to False

Returns

A set of BiclusteringModel objects that stores all biclusters generated by the BiBit algorithm.

Return type

set(bioscience.base.models.BiclusteringModel)

bioscience.dataMining.biclustering.BiBit.processBiBit(dataset, cMnr, cMnc, deviceCount, mode, debug)

Sub-function processing the BiBit Biclustering algorithm.

Parameters
  • dataset (bioscience.base.models.Dataset) – The dataset object store the data of input file.

  • cMnr (int) – Minimum number of rows to build a valid bicluster.

  • cMnc (int) – Minimum number of columns to build a valid bicluster.

  • deviceCount (int) – Number of GPU devices to execute

  • mode (boolean) – Type of execution of the algorithm: mode=1 for sequential execution, mode=2 for parallel execution on CPUs and mode=3 for execution on a multi-GPU architecture.

  • mode – Attribute used to run the algorithm in debug mode.

Returns

A BiclusteringModel object that stores all biclusters generated by the BiBit algorithm.

Return type

bioscience.base.models.BiclusteringModel

bioscience.dataMining.biclustering.BiBit.threadsPerDevice_64(resultsQueue, i, s, chunks, bicsPerGpuPrevious, patternsPerRun, mInputData, m, cMnr, cMnc, debug)

Function used for the creation of a multi-GPU architecture.