bioScience quickstart

All implemented modes are associated with examples, check “bioScience examples” for more information.

Run on sequential mode (CPU)

“tests/test_integration/test_sequential.py” demonstrates basic API functionality to generate results from the application of a data mining technique for gene expression datasets. This source code is executed sequentially on the CPU processors.

Load gene co-expression dataset: Load the gene co-expression dataset from the input file. The first option loads a binary dataset, while the second option loads a non-binary gene co-expression dataset. Finally, it is shown how to load an RNA-Seq dataset.

import bioscience as bs

# Binary dataset load
dataset = bs.load(path="datasets/binaryTest3.txt", index_gene=0, naFilter=False, head = 0)

# Non-binary dataset load
dataset = bs.load(path="datasets/synthetic3.txt", index_gene=0, naFilter=True, head = 0)

# RNA-Seq dataset load
dataset = load(path="datasets/rnaseq.txt", index_gene=0, index_lengths=1 ,naFilter=True, head = 0)

Preprocessing: In this phase, different methods for pre-processing are shown. They range from basic preprocessing methods such as discretisation, standardisation, normalisation and handling outlier among others to preprocessing methods used on RNA-Seq data. Finally, bioScience also has methods to binarise a gene expression dataset.

# 2.1) Standard preprocessing
bs.discretize(dataset, n_bins= 2)
bs.standardize(dataset)
bs.scale(dataset)
bs.normalDistributionQuantile(dataset)
bs.outliers(dataset)

# 2.2) RNA-Seq preprocessing
bs.tpm(dataset)
bs.cpm(dataset)
bs.deseq2Norm(dataset)

# 2.3) Binary preprocessing
bs.binarize(dataset)
listDatasets = bs.binarizeLevels(dataset, inactiveLevel = 0.2, activeLevel=0.8, soc = 0)
listDatasets = bs.binarizeLevels(dataset)

Data mining: This third phase is responsible for executing the data mining techniques of the user’s choice. In this case, the example shows the execution of a Binary Biclustering algorithm called BiBit. In addition, by means of the option mode=1, it is being specified that this algorithm is to be executed sequentially.
```
# BiBit algorithm
# Single dataset
listModels = bs.bibit(dataset, cMnr=2, cMnc=2, mode=1, debug = True)

# List of datasets (if bs.binarizeLevels function is used)
listModels = bs.bibit(listDatasets, cMnr=2, cMnc=2, mode=1, debug = True)
```
Results: Save gene names for each result generated by the data mining technique.
```
bs.saveGenes(path="/path/", models=listModels, data=dataset)
```

Run on parallel mode (CPU)

“tests/test_integration/test_parallel_cpu.py” demonstrates basic API functionality to generate results from the application of a data mining technique for gene expression datasets. This source code is executed in a parallel environment on CPU processors.

Load gene co-expression dataset: Load the gene co-expression dataset from the input file. The first option loads a binary dataset, while the second option loads a non-binary gene co-expression dataset. Finally, it is shown how to load an RNA-Seq dataset.

import bioscience as bs

# Binary dataset load
dataset = bs.load(path="datasets/binaryTest3.txt", index_gene=0, naFilter=False, head = 0)

# Non-binary dataset load
dataset = bs.load(path="datasets/synthetic3.txt", index_gene=0, naFilter=True, head = 0)

# RNA-Seq dataset load
dataset = load(path="datasets/rnaseq.txt", index_gene=0, index_lengths=1 ,naFilter=True, head = 0)

Preprocessing: In this phase, different methods for pre-processing are shown. They range from basic preprocessing methods such as discretisation, standardisation, normalisation and handling outlier among others to preprocessing methods used on RNA-Seq data. Finally, bioScience also has methods to binarise a gene expression dataset.

# 2.1) Standard preprocessing
bs.discretize(dataset, n_bins= 2)
bs.standardize(dataset)
bs.scale(dataset)
bs.normalDistributionQuantile(dataset)
bs.outliers(dataset)

# 2.2) RNA-Seq preprocessing
bs.tpm(dataset)
bs.cpm(dataset)
bs.deseq2Norm(dataset)

# 2.3) Binary preprocessing
bs.binarize(dataset)
listDatasets = bs.binarizeLevels(dataset, inactiveLevel = 0.2, activeLevel=0.8, soc = 0)
listDatasets = bs.binarizeLevels(dataset)

Data mining: This third phase is responsible for executing the data mining techniques of the user’s choice. In this case, the example shows the execution of a Binary Biclustering algorithm called BiBit. In addition, by means of the option mode=2, it is being specified that this algorithm is to be executed sequentially.
```
# BiBit algorithm
# Single dataset
listModels = bs.bibit(dataset, cMnr=2, cMnc=2, mode=2, debug = True)

# List of datasets (if bs.binarizeLevels function is used)
listModels = bs.bibit(listDatasets, cMnr=2, cMnc=2, mode=2, debug = True)
```
Results: Save gene names for each result generated by the data mining technique.
```
bs.saveGenes(path="/path/", models=listModels, data=dataset)
```

Run on parallel mode (GPU)

“tests/test_integration/test_parallel_gpu.py” demonstrates basic API functionality to generate results from the application of a data mining technique for gene expression datasets. This source code is executed in a parallel environment on GPU devices.

Load gene co-expression dataset: Load the gene co-expression dataset from the input file. The first option loads a binary dataset, while the second option loads a non-binary gene co-expression dataset. Finally, it is shown how to load an RNA-Seq dataset.

import bioscience as bs

# Binary dataset load
dataset = bs.load(path="datasets/binaryTest3.txt", index_gene=0, naFilter=False, head = 0)

# Non-binary dataset load
dataset = bs.load(path="datasets/synthetic3.txt", index_gene=0, naFilter=True, head = 0)

# RNA-Seq dataset load
dataset = load(path="datasets/rnaseq.txt", index_gene=0, index_lengths=1 ,naFilter=True, head = 0)

Preprocessing: In this phase, different methods for pre-processing are shown. They range from basic preprocessing methods such as discretisation, standardisation, normalisation and handling outlier among others to preprocessing methods used on RNA-Seq data. Finally, bioScience also has methods to binarise a gene expression dataset.

# 2.1) Standard preprocessing
bs.discretize(dataset, n_bins= 2)
bs.standardize(dataset)
bs.scale(dataset)
bs.normalDistributionQuantile(dataset)
bs.outliers(dataset)

# 2.2) RNA-Seq preprocessing
bs.tpm(dataset)
bs.cpm(dataset)
bs.deseq2Norm(dataset)

# 2.3) Binary preprocessing
bs.binarize(dataset)
listDatasets = bs.binarizeLevels(dataset, inactiveLevel = 0.2, activeLevel=0.8, soc = 0)
listDatasets = bs.binarizeLevels(dataset)

Data mining: This third phase is responsible for executing the data mining techniques of the user’s choice. In this case, the example shows the execution of a Binary Biclustering algorithm called BiBit. In addition, by means of the option mode=3, it is being specified that this algorithm is to be executed sequentially.
```
# BiBit algorithm
# Single dataset
listModels = bs.bibit(dataset, cMnr=2, cMnc=2, mode=3, deviceCount=1, debug = True)

# List of datasets (if bs.binarizeLevels function is used)
listModels = bs.bibit(listDatasets, cMnr=2, cMnc=2, mode=3, deviceCount=1, debug = True)
```
Results: Save gene names for each result generated by the data mining technique.
```
bs.saveGenes(path="/path/", models=listModels, data=dataset)
```