The qpcR library is an extension to the R environment that assists in the modelling and analysis of quantitative real-time PCR data.
With the qpcR library you can (categorized):
- Fit sigmoidal (four-, five-, six- and seven-parameter) models to single or replicate raw fluorescence data and display the curves with various options. Models include: 4-parameter logistic b4 (Liu & Saint, 2002; Tichopad et al., 2003; Rutledge, 2004, Rutledge & Stewart, 2008; Zhao & Fernald, 2005), 4-parameter log-logistic l4 and 5-parameter logistic/log-logistic b5/l5 (Spiess & Ritz, 2008; Spiess et al., 2008), and unpublished b6/l6 (added linear term) and b7/l7 (added quadratic term) models.
- Conduct a model selection process in which the best sigmoidal model is chosen by nested F-tests on the residual variance or other criteria such as likelihood ratio, Akaike weights or reduced chi-square.
- Further optimize the fitting process by eliminating cycles in the ground and plateau phase, using all possible combinations (an extension of the plateau cycle elimination method in Rutledge, 2004).
- Calculate many measures for the goodness-of-fit, such as the residual variance, R2, adjusted R2, Akaike Information Criterion (AIC), corrected AIC (AICc), Bayesian Information Criterion (BIC), root-mean-squared-error (RMSE), Allen's PRESS statistic and reduced chi2. All these measures can also be used for non-qPCR fits of class 'nls', 'lm', 'glm' etc...
- Make a 'lack-of-fit' test for non-replicates.
- Calculate the goodness-of-fit (by means of RMSE) of all different sigmoidal models within the exponential region of the qPCR curve.
- Bootstrap sigmoidal fits to obtain confidence intervals for all estimated parameters, including those from efficiency and threshold cycle analysis.
- Extract many features of sigmoidal fit, such as threshold cycle, efficiency, first and second derivative maxima. Efficiencies can be deduced from any part of the sigmoidal curve.
- Conduct robust and variance-weighted fitting of sigmoidal models. A weighting expression such as "1/error^2" or "sqrt(fitted)" can be used for fitting with weights that are derived from characteristics of the curve, i.e. error of replicates or fitted values.
- Predict either fluorecence or cycle values from data, and update models.
Other fitting methods:
- Derive values from more classical quantitation methods, such as the ‘window-of-linearity’ method (with/without baseline optimization; Ramakers et al, 2003; Ruijter et al., 2009), exponential fitting after 'take-off point' identification (Tichopad et al., 2003; Zhao & Fernald, 2005) and 'linear regression of efficiency' (with/without baseline optimization; Rutledge & Stewart, 2008).
- Use the mechanistic mak2 model from Boggy et al. (2010) to calculate the initial template fluorescence D0 of any qPCR data, making the use of efficiencies/threshold cycles expendable. In collaboration with Gregory Boggy, an improved mak3 model was developed which has an added slope parameter and results in even better fits.
- Do a maxRatio analysis as in Shain & Clemens (2008) by fitting a cubic spline to the raw fluorescence data after background subtraction.
- Use the mechanistic cm3 model from Carr & Moore (2012), the bilinear model from Buchwald (2007), spline functions and linear-exponential hybrid models.
Curve feature methods:
- Calculate the Cy0 (x-intercept of the tangent on the first derivative maximum) value as described in Guescini et al. (2008), the takeoff-point (outlier from studentized residuals) as in Tichopad et al. (2003), or the midpoint region as in Peirson et al. (2003).
- Conduct smoothing prior to curve fitting based on methods such as Running Mean, Savitzky-Golay, Kalman, Cubic Spline, Exponential Moving Average (EMA) etc. A soon to appear technical paper on this subject shows that Savitzky-Golay and EMA perform best, however smoothing entails a loss of information and is as such certainly debatable...
- Conduct "baselining" by subtracting from the complete data a fixed numerical value, mean/median of a cycle range, linear (most common!) and quadratic baselines or the lower asymptote ('c' parameter) of any of the implemented sigmoidal models by refitting.
Calibration curve methods:
- Conduct a classical calibration curve analysis for the estimation of qPCR efficiency (threshold cycle vs. concentration) by using single or replicate data (Pfaffl, 2001). Calculate confidence intervals for the efficiency and predict unknown concentrations from the threshold cycles of new samples.
- Bootstrap the replicates of dilution the data to obtain bootstrap confidence intervals for the calculated efficiency and R2.
Outlier run detection:
- PCR runs can be analyzed by several published (and some unpublished...) methods of single curve or replicate curve outlier detection. These include univariate methods such as the 'Kinetic Outlier Detection' (KOD; Tichopad et al., 2003) or a sigmoidal outlier detection that compares the cycle difference between first/second derivative maximum (SOD, unpublished), or multivariate methods that employ different features of the curve such as first/second derivative maxima/slope/maximum fluorescence (Tichopad et al., 2010) or slope/maximum fluorescence/fluorescence at first derivative maximum (Sisti et al., 2010).
Batch qPCR analysis:
- Single model lists (modlist) can be created that consist of multiple (up to thousands) of sigmoidal fitted PCR runs. During the creation of these model lists, runs that failed to fit or which are sigmoidal outliers can be automatically removed. Furthermore, the data can be pre-transformed prior to fitting by several methods, such as normalization, baseline subtraction etc. On these model lists, many functions of the 'qpcR package' can be executed by the sapply(...) family of methods in R.
- Replicate model lists (replist) can be created in a batch format which consist of many sets of replicate runs per list item.
- A qPCR dataset can be analyzed in batch (pcrbatch) by all available methods that calculate essential features such as efficiency, threshold cycle, initial template fluorescence F0. The results from all methods are concatenated into one dataframe that can be pasted into Excel or alike. This is useful for comparison of the different values that come out from different methods.
- The 'qpcR' package houses a function for general error analysis that is based on 1) error propagation using first- and second-order Taylor expansion including covariance structure, 2) a permutation approach with optional 'tied' observations that delivers permutation-based confidence intervals and 3) a Monte Carlo simulation-based approach based on a standard or truncated multivariate normal distribution with covariance included. The function is largely based on the propagate function of the same-named package of this author. First-order Taylor expansion in qPCR error analysis is advocated in Nordgard et al. (2010) and also implemented in the qBASE (Hellemans et al., 2007) software, the permutation approach can be found as the 'pairwise reallocation permutation' method in the popular REST software (Pfaffl et al., 2002).
- Gene expression ratios can be calculated on single assay or multiple assay setups, using normalization against reference genes or not. The ratio calculation follows the classical Edelta-ct paradigm (Pfaffl, 2001). Efficiencies as well as threshold cycles for ratio calculation can be collected from the different fit methods or different parts of the sigmoidal curve as described above. In case of replicates, the efficiencies and threshold cycles can be defined as a single value for all replicates, averaged between replicates or taken as the individual efficiency per curve. A batch function can perform ratio calculations on different control and treatment samples with different genes-of-interest/reference genes. All ratio calculations employ the error analysis methods described above: error propagation, permutation and Monte Carlo simulation are obtained for each ratio, which confers reliability to the results. The ratio calculation can be based on the Cq/efficiency/D0 values obtained from the fits or can be supplied as external data from other qPCR softwares.
Reference gene averaging:
- Reference genes can be averaged according to the method described in Vandesompele et al. (2001). Different to the original publication, the 'qpcR' package conducts an ARITHMETIC averaging of the reference genes' threshold cycles prior to ratio calculation, as opposed to the original GEOMETRIC averaging of transformed quantities. The result is the same, due to logarithmic identities of the arithmetic and geometric means. In this implementation, errors in the reference gene values are propagated to the final result by the implemented error propagation function. Reference gene averaging is included in the batch ratio calculation method and can be switched on.
qPCR data visualization:
- Single and multiple qPCR data with possible replicates can be visualized by 4 different plot types. "all" displays all curves in one graphic, "single" displays one curve/one set of replicates in a multiple plot matrix, "3D" will give a 3D-rendering of all curves that can be rotated in all dimensions and "image" will display a "heatmap"-like image plot suitable for high-throughput data.
- Residuals from the sigmoidal plots can be displayed by a specialized "residuals plot" in which one can analyze the fit performance in different regions of the curve.
- Results obtained from the ratio calculations (error propagation, permutation and Monte Carlo simulation) are displayed as multiple histograms/boxplots including 95% confidence intervals.
- Starting from a template curve, artificial qPCR data is generated by a Monte Carlo simulation in which the user can define a homo-/heteroscedastic variance along the curve. For each simulation, different sigmoidal models can be tested in respect to their performance by collecting several measures for goodness-of-fit and an optional model selection step. This function was used to show the good performance of the asymmetric 'b5'/'l5' models in Spiess et al. (2008).
Melting curve analysis:
- A melting curve analysis can be carried out by supplying either raw fluorescence (F) vs. temperature (T) values or already transformed -dF/dT values. In case of raw values, differentiation is performed and the first derivative curve is displayed with the melting peaks. Peaks and Tm values are automatically identified, peak areas calculated and peaks flagged as 'negative' if the peak area is below a user-defined threshold value. If the user supplies a vector containing reference Tm values, an iterative approach using spline smoothing tries to make the melting curve compatible with the reference values.
qPCR data import:
- qPCR data can be imported from all kinds of systems by a sophisticated import function which queries several formatting steps for data without/with reference dyes for normalization (such as ROX), data in wide/long format or stacked/juxtaposed runs. The formatting parameters are stored on the hard drive for subsequent analysis of future runs or for batch analysis of many files in one directory.