bioScience quickstart

All implemented modes are associated with examples, check “bioScience examples” for more information.


Run on sequential mode (CPU)

“tests/test_integration/test_sequential.py” demonstrates basic API functionality to generate results from the application of a data mining technique for gene expression datasets. This source code is executed sequentially on the CPU processors.

  1. Load gene co-expression dataset: Load the gene co-expression dataset from the input file. The first option loads a binary dataset, while the second option loads a non-binary gene co-expression dataset. Finally, it is shown how to load an RNA-Seq dataset.

    import bioscience as bs
    
    # Binary dataset load
    dataset = bs.load(path="datasets/binaryTest3.txt", index_gene=0, naFilter=False, head = 0)
    
    # Non-binary dataset load
    dataset = bs.load(path="datasets/synthetic3.txt", index_gene=0, naFilter=True, head = 0)
    
    # RNA-Seq dataset load
    dataset = load(path="datasets/rnaseq.txt", index_gene=0, index_lengths=1 ,naFilter=True, head = 0)
    
  2. Preprocessing: In this phase, different methods for pre-processing are shown. They range from basic preprocessing methods such as discretisation, standardisation, normalisation and handling outlier among others to preprocessing methods used on RNA-Seq data. Finally, bioScience also has methods to binarise a gene expression dataset.

    # 2.1) Standard preprocessing
    bs.discretize(dataset, n_bins= 2)
    bs.standardize(dataset)
    bs.scale(dataset)
    bs.normalDistributionQuantile(dataset)
    bs.outliers(dataset)
    
    # 2.2) RNA-Seq preprocessing
    bs.tpm(dataset)
    bs.cpm(dataset)
    bs.deseq2Norm(dataset)
    
    # 2.3) Binary preprocessing
    bs.binarize(dataset)
    listDatasets = bs.binarizeLevels(dataset, inactiveLevel = 0.2, activeLevel=0.8, soc = 0)
    listDatasets = bs.binarizeLevels(dataset)
    
  3. Data mining: This third phase is responsible for executing the data mining techniques of the user’s choice. In this case, the example shows the execution of a Binary Biclustering algorithm called BiBit. In addition, by means of the option mode=1, it is being specified that this algorithm is to be executed sequentially.

    # BiBit algorithm
    # Single dataset
    listModels = bs.bibit(dataset, cMnr=2, cMnc=2, mode=1, debug = True)
    
    # List of datasets (if bs.binarizeLevels function is used)
    listModels = bs.bibit(listDatasets, cMnr=2, cMnc=2, mode=1, debug = True)
    
  4. Results: Save gene names for each result generated by the data mining technique.

    bs.saveGenes(path="/path/", models=listModels, data=dataset)
    

Run on parallel mode (CPU)

“tests/test_integration/test_parallel_cpu.py” demonstrates basic API functionality to generate results from the application of a data mining technique for gene expression datasets. This source code is executed in a parallel environment on CPU processors.

  1. Load gene co-expression dataset: Load the gene co-expression dataset from the input file. The first option loads a binary dataset, while the second option loads a non-binary gene co-expression dataset. Finally, it is shown how to load an RNA-Seq dataset.

    import bioscience as bs
    
    # Binary dataset load
    dataset = bs.load(path="datasets/binaryTest3.txt", index_gene=0, naFilter=False, head = 0)
    
    # Non-binary dataset load
    dataset = bs.load(path="datasets/synthetic3.txt", index_gene=0, naFilter=True, head = 0)
    
    # RNA-Seq dataset load
    dataset = load(path="datasets/rnaseq.txt", index_gene=0, index_lengths=1 ,naFilter=True, head = 0)
    
  2. Preprocessing: In this phase, different methods for pre-processing are shown. They range from basic preprocessing methods such as discretisation, standardisation, normalisation and handling outlier among others to preprocessing methods used on RNA-Seq data. Finally, bioScience also has methods to binarise a gene expression dataset.

    # 2.1) Standard preprocessing
    bs.discretize(dataset, n_bins= 2)
    bs.standardize(dataset)
    bs.scale(dataset)
    bs.normalDistributionQuantile(dataset)
    bs.outliers(dataset)
    
    # 2.2) RNA-Seq preprocessing
    bs.tpm(dataset)
    bs.cpm(dataset)
    bs.deseq2Norm(dataset)
    
    # 2.3) Binary preprocessing
    bs.binarize(dataset)
    listDatasets = bs.binarizeLevels(dataset, inactiveLevel = 0.2, activeLevel=0.8, soc = 0)
    listDatasets = bs.binarizeLevels(dataset)
    
  3. Data mining: This third phase is responsible for executing the data mining techniques of the user’s choice. In this case, the example shows the execution of a Binary Biclustering algorithm called BiBit. In addition, by means of the option mode=2, it is being specified that this algorithm is to be executed sequentially.

    # BiBit algorithm
    # Single dataset
    listModels = bs.bibit(dataset, cMnr=2, cMnc=2, mode=2, debug = True)
    
    # List of datasets (if bs.binarizeLevels function is used)
    listModels = bs.bibit(listDatasets, cMnr=2, cMnc=2, mode=2, debug = True)
    
  4. Results: Save gene names for each result generated by the data mining technique.

    bs.saveGenes(path="/path/", models=listModels, data=dataset)
    

Run on parallel mode (GPU)

“tests/test_integration/test_parallel_gpu.py” demonstrates basic API functionality to generate results from the application of a data mining technique for gene expression datasets. This source code is executed in a parallel environment on GPU devices.

  1. Load gene co-expression dataset: Load the gene co-expression dataset from the input file. The first option loads a binary dataset, while the second option loads a non-binary gene co-expression dataset. Finally, it is shown how to load an RNA-Seq dataset.

    import bioscience as bs
    
    # Binary dataset load
    dataset = bs.load(path="datasets/binaryTest3.txt", index_gene=0, naFilter=False, head = 0)
    
    # Non-binary dataset load
    dataset = bs.load(path="datasets/synthetic3.txt", index_gene=0, naFilter=True, head = 0)
    
    # RNA-Seq dataset load
    dataset = load(path="datasets/rnaseq.txt", index_gene=0, index_lengths=1 ,naFilter=True, head = 0)
    
  2. Preprocessing: In this phase, different methods for pre-processing are shown. They range from basic preprocessing methods such as discretisation, standardisation, normalisation and handling outlier among others to preprocessing methods used on RNA-Seq data. Finally, bioScience also has methods to binarise a gene expression dataset.

    # 2.1) Standard preprocessing
    bs.discretize(dataset, n_bins= 2)
    bs.standardize(dataset)
    bs.scale(dataset)
    bs.normalDistributionQuantile(dataset)
    bs.outliers(dataset)
    
    # 2.2) RNA-Seq preprocessing
    bs.tpm(dataset)
    bs.cpm(dataset)
    bs.deseq2Norm(dataset)
    
    # 2.3) Binary preprocessing
    bs.binarize(dataset)
    listDatasets = bs.binarizeLevels(dataset, inactiveLevel = 0.2, activeLevel=0.8, soc = 0)
    listDatasets = bs.binarizeLevels(dataset)
    
  3. Data mining: This third phase is responsible for executing the data mining techniques of the user’s choice. In this case, the example shows the execution of a Binary Biclustering algorithm called BiBit. In addition, by means of the option mode=3, it is being specified that this algorithm is to be executed sequentially.

    # BiBit algorithm
    # Single dataset
    listModels = bs.bibit(dataset, cMnr=2, cMnc=2, mode=3, deviceCount=1, debug = True)
    
    # List of datasets (if bs.binarizeLevels function is used)
    listModels = bs.bibit(listDatasets, cMnr=2, cMnc=2, mode=3, deviceCount=1, debug = True)
    
  4. Results: Save gene names for each result generated by the data mining technique.

    bs.saveGenes(path="/path/", models=listModels, data=dataset)