bioScience quickstart ===================== **All implemented modes** are associated with examples, check `"bioScience examples" `_ for more information. ---- Run on sequential mode (CPU) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `"tests/test_integration/test_sequential.py" `_ demonstrates basic API functionality to generate results from the application of a data mining technique for gene expression datasets. This source code is executed sequentially on the CPU processors. #. **Load gene co-expression dataset:** Load the gene co-expression dataset from the input file. The first option loads a binary dataset, while the second option loads a non-binary gene co-expression dataset. Finally, it is shown how to load an RNA-Seq dataset. .. code-block:: python import bioscience as bs # Binary dataset load dataset = bs.load(path="datasets/binaryTest3.txt", index_gene=0, naFilter=False, head = 0) # Non-binary dataset load dataset = bs.load(path="datasets/synthetic3.txt", index_gene=0, naFilter=True, head = 0) # RNA-Seq dataset load dataset = load(path="datasets/rnaseq.txt", index_gene=0, index_lengths=1 ,naFilter=True, head = 0) #. **Preprocessing:** In this phase, different methods for pre-processing are shown. They range from basic preprocessing methods such as discretisation, standardisation, normalisation and handling outlier among others to preprocessing methods used on RNA-Seq data. Finally, bioScience also has methods to binarise a gene expression dataset. .. code-block:: python # 2.1) Standard preprocessing bs.discretize(dataset, n_bins= 2) bs.standardize(dataset) bs.scale(dataset) bs.normalDistributionQuantile(dataset) bs.outliers(dataset) # 2.2) RNA-Seq preprocessing bs.tpm(dataset) bs.cpm(dataset) bs.deseq2Norm(dataset) # 2.3) Binary preprocessing bs.binarize(dataset) listDatasets = bs.binarizeLevels(dataset, inactiveLevel = 0.2, activeLevel=0.8, soc = 0) listDatasets = bs.binarizeLevels(dataset) #. **Data mining:** This third phase is responsible for executing the data mining techniques of the user's choice. In this case, the example shows the execution of a Binary Biclustering algorithm called BiBit. In addition, by means of the option ``mode=1``, it is being specified that this algorithm is to be executed sequentially. .. code-block:: python # BiBit algorithm # Single dataset listModels = bs.bibit(dataset, cMnr=2, cMnc=2, mode=1, debug = True) # List of datasets (if bs.binarizeLevels function is used) listModels = bs.bibit(listDatasets, cMnr=2, cMnc=2, mode=1, debug = True) #. **Results:** Save gene names for each result generated by the data mining technique. .. code-block:: python bs.saveGenes(path="/path/", models=listModels, data=dataset) Run on parallel mode (CPU) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ `"tests/test_integration/test_parallel_cpu.py" `_ demonstrates basic API functionality to generate results from the application of a data mining technique for gene expression datasets. This source code is executed in a parallel environment on CPU processors. #. **Load gene co-expression dataset:** Load the gene co-expression dataset from the input file. The first option loads a binary dataset, while the second option loads a non-binary gene co-expression dataset. Finally, it is shown how to load an RNA-Seq dataset. .. code-block:: python import bioscience as bs # Binary dataset load dataset = bs.load(path="datasets/binaryTest3.txt", index_gene=0, naFilter=False, head = 0) # Non-binary dataset load dataset = bs.load(path="datasets/synthetic3.txt", index_gene=0, naFilter=True, head = 0) # RNA-Seq dataset load dataset = load(path="datasets/rnaseq.txt", index_gene=0, index_lengths=1 ,naFilter=True, head = 0) #. **Preprocessing:** In this phase, different methods for pre-processing are shown. They range from basic preprocessing methods such as discretisation, standardisation, normalisation and handling outlier among others to preprocessing methods used on RNA-Seq data. Finally, bioScience also has methods to binarise a gene expression dataset. .. code-block:: python # 2.1) Standard preprocessing bs.discretize(dataset, n_bins= 2) bs.standardize(dataset) bs.scale(dataset) bs.normalDistributionQuantile(dataset) bs.outliers(dataset) # 2.2) RNA-Seq preprocessing bs.tpm(dataset) bs.cpm(dataset) bs.deseq2Norm(dataset) # 2.3) Binary preprocessing bs.binarize(dataset) listDatasets = bs.binarizeLevels(dataset, inactiveLevel = 0.2, activeLevel=0.8, soc = 0) listDatasets = bs.binarizeLevels(dataset) #. **Data mining:** This third phase is responsible for executing the data mining techniques of the user's choice. In this case, the example shows the execution of a Binary Biclustering algorithm called BiBit. In addition, by means of the option ``mode=2``, it is being specified that this algorithm is to be executed sequentially. .. code-block:: python # BiBit algorithm # Single dataset listModels = bs.bibit(dataset, cMnr=2, cMnc=2, mode=2, debug = True) # List of datasets (if bs.binarizeLevels function is used) listModels = bs.bibit(listDatasets, cMnr=2, cMnc=2, mode=2, debug = True) #. **Results:** Save gene names for each result generated by the data mining technique. .. code-block:: python bs.saveGenes(path="/path/", models=listModels, data=dataset) Run on parallel mode (GPU) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ `"tests/test_integration/test_parallel_gpu.py" `_ demonstrates basic API functionality to generate results from the application of a data mining technique for gene expression datasets. This source code is executed in a parallel environment on GPU devices. #. **Load gene co-expression dataset:** Load the gene co-expression dataset from the input file. The first option loads a binary dataset, while the second option loads a non-binary gene co-expression dataset. Finally, it is shown how to load an RNA-Seq dataset. .. code-block:: python import bioscience as bs # Binary dataset load dataset = bs.load(path="datasets/binaryTest3.txt", index_gene=0, naFilter=False, head = 0) # Non-binary dataset load dataset = bs.load(path="datasets/synthetic3.txt", index_gene=0, naFilter=True, head = 0) # RNA-Seq dataset load dataset = load(path="datasets/rnaseq.txt", index_gene=0, index_lengths=1 ,naFilter=True, head = 0) #. **Preprocessing:** In this phase, different methods for pre-processing are shown. They range from basic preprocessing methods such as discretisation, standardisation, normalisation and handling outlier among others to preprocessing methods used on RNA-Seq data. Finally, bioScience also has methods to binarise a gene expression dataset. .. code-block:: python # 2.1) Standard preprocessing bs.discretize(dataset, n_bins= 2) bs.standardize(dataset) bs.scale(dataset) bs.normalDistributionQuantile(dataset) bs.outliers(dataset) # 2.2) RNA-Seq preprocessing bs.tpm(dataset) bs.cpm(dataset) bs.deseq2Norm(dataset) # 2.3) Binary preprocessing bs.binarize(dataset) listDatasets = bs.binarizeLevels(dataset, inactiveLevel = 0.2, activeLevel=0.8, soc = 0) listDatasets = bs.binarizeLevels(dataset) #. **Data mining:** This third phase is responsible for executing the data mining techniques of the user's choice. In this case, the example shows the execution of a Binary Biclustering algorithm called BiBit. In addition, by means of the option ``mode=3``, it is being specified that this algorithm is to be executed sequentially. .. code-block:: python # BiBit algorithm # Single dataset listModels = bs.bibit(dataset, cMnr=2, cMnc=2, mode=3, deviceCount=1, debug = True) # List of datasets (if bs.binarizeLevels function is used) listModels = bs.bibit(listDatasets, cMnr=2, cMnc=2, mode=3, deviceCount=1, debug = True) #. **Results:** Save gene names for each result generated by the data mining technique. .. code-block:: python bs.saveGenes(path="/path/", models=listModels, data=dataset)