Documentation for vcregression

vcregression is command line interface for statistical inference of sequence-function relationships using Gaussian process(GP) regression.

Basic functions of vc_regression include:

  • estimation of the variance components from partially observed fitness landscapes
  • calculating the maximum a posterior estimate (MAP) using GP regression
  • calculating the posterior variance
  • posterior sampling using Hamiltonian Monte Carlo

Installation

vc_regression is a command line interface written in Python 3. To install the software, simply clone the repository to your local directory. Before running the software, install all dependencies using the pip package manager with the following command line:

pip install requirements.txt

Quick Start

Estimation of the variance components

For a quick demonstration of the method using the sample data file smn1data.csv (Wong et al. 2018) [1], first execute the following command line to estimate the variance components

python3 vc_prep.py 4 8 -data data/Smn1/smn1data.csv

MAP estimate

To calculate the maximum a posterior estimate (MAP) using the lambdas inferred using the command line above, execute the following command line:

python3 vc_map_estimate.py 4 8 -data data/Smn1/smn1data.csv -lambdas out/lambdas.txt

Posterior variance

Execute the following command to get the posterior variances of for a specified subset of sequences:

python3 vc_variance.py 4 8 -seqs data/seqsSmn1.txt -lambdas out/lambdas_star.txt -vars data/varSmn1.txt -seqsvar data/seqspossample.txt

Indices and tables

[1]Wong et al. 2018. Quantitative Activity Profile and Context Dependence of All Human 50 Splice Sites.