bioScience quickstart
All implemented modes are associated with examples, check “bioScience examples” for more information.
Run on sequential mode (CPU)
“tests/test_integration/test_sequential.py” demonstrates basic API functionality to generate results from the application of a data mining technique for gene expression datasets. This source code is executed sequentially on the CPU processors.
Load gene co-expression dataset: Load the gene co-expression dataset from the input file. The first option loads a binary dataset, while the second option loads a non-binary gene co-expression dataset. Finally, it is shown how to load an RNA-Seq dataset.
import bioscience as bs # Binary dataset load dataset = bs.load(path="datasets/binaryTest3.txt", index_gene=0, naFilter=False, head = 0) # Non-binary dataset load dataset = bs.load(path="datasets/synthetic3.txt", index_gene=0, naFilter=True, head = 0) # RNA-Seq dataset load dataset = load(path="datasets/rnaseq.txt", index_gene=0, index_lengths=1 ,naFilter=True, head = 0)
Preprocessing: In this phase, different methods for pre-processing are shown. They range from basic preprocessing methods such as discretisation, standardisation, normalisation and handling outlier among others to preprocessing methods used on RNA-Seq data. Finally, bioScience also has methods to binarise a gene expression dataset.
# 2.1) Standard preprocessing bs.discretize(dataset, n_bins= 2) bs.standardize(dataset) bs.scale(dataset) bs.normalDistributionQuantile(dataset) bs.outliers(dataset) # 2.2) RNA-Seq preprocessing bs.tpm(dataset) bs.cpm(dataset) bs.deseq2Norm(dataset) # 2.3) Binary preprocessing bs.binarize(dataset) listDatasets = bs.binarizeLevels(dataset, inactiveLevel = 0.2, activeLevel=0.8, soc = 0) listDatasets = bs.binarizeLevels(dataset)
Data mining: This third phase is responsible for executing the data mining techniques of the user’s choice. In this case, the example shows the execution of a Binary Biclustering algorithm called BiBit. In addition, by means of the option
mode=1
, it is being specified that this algorithm is to be executed sequentially.# BiBit algorithm # Single dataset listModels = bs.bibit(dataset, cMnr=2, cMnc=2, mode=1, debug = True) # List of datasets (if bs.binarizeLevels function is used) listModels = bs.bibit(listDatasets, cMnr=2, cMnc=2, mode=1, debug = True)
Results: Save gene names for each result generated by the data mining technique.
bs.saveGenes(path="/path/", models=listModels, data=dataset)
Run on parallel mode (CPU)
“tests/test_integration/test_parallel_cpu.py” demonstrates basic API functionality to generate results from the application of a data mining technique for gene expression datasets. This source code is executed in a parallel environment on CPU processors.
Load gene co-expression dataset: Load the gene co-expression dataset from the input file. The first option loads a binary dataset, while the second option loads a non-binary gene co-expression dataset. Finally, it is shown how to load an RNA-Seq dataset.
import bioscience as bs # Binary dataset load dataset = bs.load(path="datasets/binaryTest3.txt", index_gene=0, naFilter=False, head = 0) # Non-binary dataset load dataset = bs.load(path="datasets/synthetic3.txt", index_gene=0, naFilter=True, head = 0) # RNA-Seq dataset load dataset = load(path="datasets/rnaseq.txt", index_gene=0, index_lengths=1 ,naFilter=True, head = 0)
Preprocessing: In this phase, different methods for pre-processing are shown. They range from basic preprocessing methods such as discretisation, standardisation, normalisation and handling outlier among others to preprocessing methods used on RNA-Seq data. Finally, bioScience also has methods to binarise a gene expression dataset.
# 2.1) Standard preprocessing bs.discretize(dataset, n_bins= 2) bs.standardize(dataset) bs.scale(dataset) bs.normalDistributionQuantile(dataset) bs.outliers(dataset) # 2.2) RNA-Seq preprocessing bs.tpm(dataset) bs.cpm(dataset) bs.deseq2Norm(dataset) # 2.3) Binary preprocessing bs.binarize(dataset) listDatasets = bs.binarizeLevels(dataset, inactiveLevel = 0.2, activeLevel=0.8, soc = 0) listDatasets = bs.binarizeLevels(dataset)
Data mining: This third phase is responsible for executing the data mining techniques of the user’s choice. In this case, the example shows the execution of a Binary Biclustering algorithm called BiBit. In addition, by means of the option
mode=2
, it is being specified that this algorithm is to be executed sequentially.# BiBit algorithm # Single dataset listModels = bs.bibit(dataset, cMnr=2, cMnc=2, mode=2, debug = True) # List of datasets (if bs.binarizeLevels function is used) listModels = bs.bibit(listDatasets, cMnr=2, cMnc=2, mode=2, debug = True)
Results: Save gene names for each result generated by the data mining technique.
bs.saveGenes(path="/path/", models=listModels, data=dataset)
Run on parallel mode (GPU)
“tests/test_integration/test_parallel_gpu.py” demonstrates basic API functionality to generate results from the application of a data mining technique for gene expression datasets. This source code is executed in a parallel environment on GPU devices.
Load gene co-expression dataset: Load the gene co-expression dataset from the input file. The first option loads a binary dataset, while the second option loads a non-binary gene co-expression dataset. Finally, it is shown how to load an RNA-Seq dataset.
import bioscience as bs # Binary dataset load dataset = bs.load(path="datasets/binaryTest3.txt", index_gene=0, naFilter=False, head = 0) # Non-binary dataset load dataset = bs.load(path="datasets/synthetic3.txt", index_gene=0, naFilter=True, head = 0) # RNA-Seq dataset load dataset = load(path="datasets/rnaseq.txt", index_gene=0, index_lengths=1 ,naFilter=True, head = 0)
Preprocessing: In this phase, different methods for pre-processing are shown. They range from basic preprocessing methods such as discretisation, standardisation, normalisation and handling outlier among others to preprocessing methods used on RNA-Seq data. Finally, bioScience also has methods to binarise a gene expression dataset.
# 2.1) Standard preprocessing bs.discretize(dataset, n_bins= 2) bs.standardize(dataset) bs.scale(dataset) bs.normalDistributionQuantile(dataset) bs.outliers(dataset) # 2.2) RNA-Seq preprocessing bs.tpm(dataset) bs.cpm(dataset) bs.deseq2Norm(dataset) # 2.3) Binary preprocessing bs.binarize(dataset) listDatasets = bs.binarizeLevels(dataset, inactiveLevel = 0.2, activeLevel=0.8, soc = 0) listDatasets = bs.binarizeLevels(dataset)
Data mining: This third phase is responsible for executing the data mining techniques of the user’s choice. In this case, the example shows the execution of a Binary Biclustering algorithm called BiBit. In addition, by means of the option
mode=3
, it is being specified that this algorithm is to be executed sequentially.# BiBit algorithm # Single dataset listModels = bs.bibit(dataset, cMnr=2, cMnc=2, mode=3, deviceCount=1, debug = True) # List of datasets (if bs.binarizeLevels function is used) listModels = bs.bibit(listDatasets, cMnr=2, cMnc=2, mode=3, deviceCount=1, debug = True)
Results: Save gene names for each result generated by the data mining technique.
bs.saveGenes(path="/path/", models=listModels, data=dataset)