Pre-processing

This section shows several examples of the many methods that can be used in bioScience to perform data pre-processing.

Suppose you have an object called dataset with your dataset loaded. To see how a data load is performed, please refer to the Load data section of this user guide.

In these cases, the pre-processed dataset will be found modified in the dataset.data attribute, while the originally loaded dataset will be found stored in the dataset.original attribute. This is because the user may wish to display or use the original dataset data for further validation or visualisation processes.

Generic pre-processing methods

The following example shows different basic pre-processing options such as discretisations, standardisations, scaling and normal distributions based on quantiles. In addition, outlier treatment is possible.

import bioscience as bs
bs.discretize(dataset, n_bins= 2)
bs.standardize(dataset)
bs.scale(dataset)
bs.normalDistributionQuantile(dataset)
bs.outliers(dataset)

To understand the meaning of each attribute you can access the API reference.

RNA-Seq oriented pre-processing methods

This subsection shows how to perform preprocessing with specific methods such as CPM, TPM, FPKM and DESEq2.

import bioscience as bs
bs.tpm(dataset)
bs.cpm(dataset)
bs.fpkm(dataset)
bs.deseq2Norm(dataset)

To understand the meaning of each attribute you can access the API reference.

Binarisation methods

Currently, there are two ways to binarise a dataset. The binarize function performs a standard binarisation of the dataset, while the binarizeLevels function gets a list of binarised datasets. The latter function uses fuzzy logic to avoid noise that may be incorporated into the data in the binarisation process.

Different examples of binarisation are shown below:

import bioscience as bs
bs.binarize(dataset)
listDatasets = bs.binarizeLevels(dataset, inactiveLevel = 0.2, activeLevel=0.8, soc = 0)
listDatasets = bs.binarizeLevels(dataset)

To understand the meaning of each attribute you can access the API reference.