bioScience quickstart
=====================

**All implemented modes** are associated with examples, check
`"bioScience examples" <https://github.com/aureliolfdez/bioscience/tree/main/tests/test_integration>`_
for more information.


----

Run on sequential mode (CPU)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

`"tests/test_integration/test_sequential.py" <https://github.com/aureliolfdez/bioscience/tree/main/tests/test_integration/test_sequential.py>`_
demonstrates basic API functionality to generate results from the application of a data mining technique for gene expression datasets. This source code is executed sequentially on the CPU processors.

#. **Load gene co-expression dataset:** Load the gene co-expression dataset from the input file. The first option loads a binary dataset, while the second option loads a non-binary gene co-expression dataset. Finally, it is shown how to load an RNA-Seq dataset.
   
   .. code-block:: python

      import bioscience as bs

      # Binary dataset load
      dataset = bs.load(path="datasets/binaryTest3.txt", index_gene=0, naFilter=False, head = 0)

      # Non-binary dataset load
      dataset = bs.load(path="datasets/synthetic3.txt", index_gene=0, naFilter=True, head = 0)

      # RNA-Seq dataset load
      dataset = load(path="datasets/rnaseq.txt", index_gene=0, index_lengths=1 ,naFilter=True, head = 0)

#. **Preprocessing:** In this phase, different methods for pre-processing are shown. They range from basic preprocessing methods such as discretisation, standardisation, normalisation and handling outlier among others to preprocessing methods used on RNA-Seq data. Finally, bioScience also has methods to binarise a gene expression dataset.

   .. code-block:: python

      # 2.1) Standard preprocessing
      bs.discretize(dataset, n_bins= 2)
      bs.standardize(dataset)
      bs.scale(dataset)
      bs.normalDistributionQuantile(dataset)
      bs.outliers(dataset)

      # 2.2) RNA-Seq preprocessing
      bs.tpm(dataset)
      bs.cpm(dataset)
      bs.deseq2Norm(dataset)

      # 2.3) Binary preprocessing
      bs.binarize(dataset)
      listDatasets = bs.binarizeLevels(dataset, inactiveLevel = 0.2, activeLevel=0.8, soc = 0)
      listDatasets = bs.binarizeLevels(dataset)
      

#. **Data mining:** This third phase is responsible for executing the data mining techniques of the user's choice. In this case, the example shows the execution of a Binary Biclustering algorithm called BiBit. In addition, by means of the option ``mode=1``, it is being specified that this algorithm is to be executed sequentially.

   .. code-block:: python

      # BiBit algorithm
      # Single dataset
      listModels = bs.bibit(dataset, cMnr=2, cMnc=2, mode=1, debug = True)

      # List of datasets (if bs.binarizeLevels function is used)
      listModels = bs.bibit(listDatasets, cMnr=2, cMnc=2, mode=1, debug = True)

#. **Results:** Save gene names for each result generated by the data mining technique.

   .. code-block:: python
      
      bs.saveGenes(path="/path/", models=listModels, data=dataset)


Run on parallel mode (CPU)
^^^^^^^^^^^^^^^^^^^^^^^^^^^

`"tests/test_integration/test_parallel_cpu.py" <https://github.com/aureliolfdez/bioscience/tree/main/tests/test_integration/test_parallel_cpu.py>`_
demonstrates basic API functionality to generate results from the application of a data mining technique for gene expression datasets. This source code is executed in a parallel environment on CPU processors.

#. **Load gene co-expression dataset:** Load the gene co-expression dataset from the input file. The first option loads a binary dataset, while the second option loads a non-binary gene co-expression dataset. Finally, it is shown how to load an RNA-Seq dataset.
   
   .. code-block:: python

      import bioscience as bs

      # Binary dataset load
      dataset = bs.load(path="datasets/binaryTest3.txt", index_gene=0, naFilter=False, head = 0)

      # Non-binary dataset load
      dataset = bs.load(path="datasets/synthetic3.txt", index_gene=0, naFilter=True, head = 0)

      # RNA-Seq dataset load
      dataset = load(path="datasets/rnaseq.txt", index_gene=0, index_lengths=1 ,naFilter=True, head = 0)

#. **Preprocessing:** In this phase, different methods for pre-processing are shown. They range from basic preprocessing methods such as discretisation, standardisation, normalisation and handling outlier among others to preprocessing methods used on RNA-Seq data. Finally, bioScience also has methods to binarise a gene expression dataset.

   .. code-block:: python

      # 2.1) Standard preprocessing
      bs.discretize(dataset, n_bins= 2)
      bs.standardize(dataset)
      bs.scale(dataset)
      bs.normalDistributionQuantile(dataset)
      bs.outliers(dataset)

      # 2.2) RNA-Seq preprocessing
      bs.tpm(dataset)
      bs.cpm(dataset)
      bs.deseq2Norm(dataset)

      # 2.3) Binary preprocessing
      bs.binarize(dataset)
      listDatasets = bs.binarizeLevels(dataset, inactiveLevel = 0.2, activeLevel=0.8, soc = 0)
      listDatasets = bs.binarizeLevels(dataset)
      

#. **Data mining:** This third phase is responsible for executing the data mining techniques of the user's choice. In this case, the example shows the execution of a Binary Biclustering algorithm called BiBit. In addition, by means of the option ``mode=2``, it is being specified that this algorithm is to be executed sequentially.

   .. code-block:: python

      # BiBit algorithm
      # Single dataset
      listModels = bs.bibit(dataset, cMnr=2, cMnc=2, mode=2, debug = True)

      # List of datasets (if bs.binarizeLevels function is used)
      listModels = bs.bibit(listDatasets, cMnr=2, cMnc=2, mode=2, debug = True)

#. **Results:** Save gene names for each result generated by the data mining technique.

   .. code-block:: python
      
      bs.saveGenes(path="/path/", models=listModels, data=dataset)

Run on parallel mode (GPU)
^^^^^^^^^^^^^^^^^^^^^^^^^^^

`"tests/test_integration/test_parallel_gpu.py" <https://github.com/aureliolfdez/bioscience/tree/main/tests/test_integration/test_parallel_gpu.py>`_
demonstrates basic API functionality to generate results from the application of a data mining technique for gene expression datasets. This source code is executed in a parallel environment on GPU devices.

#. **Load gene co-expression dataset:** Load the gene co-expression dataset from the input file. The first option loads a binary dataset, while the second option loads a non-binary gene co-expression dataset. Finally, it is shown how to load an RNA-Seq dataset.
   
   .. code-block:: python

      import bioscience as bs

      # Binary dataset load
      dataset = bs.load(path="datasets/binaryTest3.txt", index_gene=0, naFilter=False, head = 0)

      # Non-binary dataset load
      dataset = bs.load(path="datasets/synthetic3.txt", index_gene=0, naFilter=True, head = 0)

      # RNA-Seq dataset load
      dataset = load(path="datasets/rnaseq.txt", index_gene=0, index_lengths=1 ,naFilter=True, head = 0)

#. **Preprocessing:** In this phase, different methods for pre-processing are shown. They range from basic preprocessing methods such as discretisation, standardisation, normalisation and handling outlier among others to preprocessing methods used on RNA-Seq data. Finally, bioScience also has methods to binarise a gene expression dataset.

   .. code-block:: python

      # 2.1) Standard preprocessing
      bs.discretize(dataset, n_bins= 2)
      bs.standardize(dataset)
      bs.scale(dataset)
      bs.normalDistributionQuantile(dataset)
      bs.outliers(dataset)

      # 2.2) RNA-Seq preprocessing
      bs.tpm(dataset)
      bs.cpm(dataset)
      bs.deseq2Norm(dataset)

      # 2.3) Binary preprocessing
      bs.binarize(dataset)
      listDatasets = bs.binarizeLevels(dataset, inactiveLevel = 0.2, activeLevel=0.8, soc = 0)
      listDatasets = bs.binarizeLevels(dataset)
      

#. **Data mining:** This third phase is responsible for executing the data mining techniques of the user's choice. In this case, the example shows the execution of a Binary Biclustering algorithm called BiBit. In addition, by means of the option ``mode=3``, it is being specified that this algorithm is to be executed sequentially.

   .. code-block:: python

      # BiBit algorithm
      # Single dataset
      listModels = bs.bibit(dataset, cMnr=2, cMnc=2, mode=3, deviceCount=1, debug = True)

      # List of datasets (if bs.binarizeLevels function is used)
      listModels = bs.bibit(listDatasets, cMnr=2, cMnc=2, mode=3, deviceCount=1, debug = True)

#. **Results:** Save gene names for each result generated by the data mining technique.

   .. code-block:: python
      
      bs.saveGenes(path="/path/", models=listModels, data=dataset)