R Package: HIBAG
R CRAN: http://cran.r-project.org/web/packages/HIBAG/index.html
Email: Xiuwen Zheng (zhengx@u.washington.edu) or Bruce S. Weir (bsweir@u.washington.edu)
This page was last updated on March 21, 2013
HIBAG is a state of the art software package for imputing HLA types using SNP data, and it uses the R statistical programming language. HIBAG is highly accurate, computationally tractable, and can be used by researchers with published parameter estimates (provided for subjects of European, Asian, Hispanic and African ancestries) instead of requiring access to large training sample datasets. It combines the concepts of attribute bagging, an ensemble classifier method, with haplotype inference for SNPs and HLA types. Attribute bagging is a technique which improves the accuracy and stability of classifier ensembles deduced using bootstrap aggregating and random variable selection.
back to contents
R CRAN: http://cran.r-project.org/web/packages/HIBAG/index.html
The published parameters were estimated from HLA and SNP genotypes of multiple GlaxoSmithKline clinical trials (referred to as “HLARES”) and HapMap. The HIBAG models were built from SNP markers common to the Illumina 1M Duo, OmniQuad, OmniExpress, 660K and 550K platforms. The training data consist of 1) HLARES data of European ancestry, 2) HLARES data of Asian ancestry and HapMap CHB+JPT, 3) HLARES data of Hispanic ancestry, and 4) African American HLARES data and 60 African parents of HapMap YRI.
HLA Nomenclature Updates (important update: April 2010)
Four-digit resolution (high resolution):Summary of training data set:

Ethnic-specific models:
The standard statistical quantities of prediction quality for a specific HLA allele H:
|
library(HIBAG) # Load the published parameter estimates from European ancestry model.list <- get(load("European-HLA4.RData")) ######################################################################### # Import your PLINK BED file # yourgeno <- hlaBED2Geno(bed.fn=".bed", fam.fn=".fam", bim.fn=".bim") summary(yourgeno) # HLA imputation at HLA-A hla.id <- "A" model <- hlaModelfromObj(model.list[[hla.id]]) summary(model) # SNPs in the model head(model$snp.id) # "rs2523442" "rs9257863" "rs2107191" "rs4713226" "rs1362076" "rs7751705" head(model$snp.position) # 29525796 29533563 29542274 29542393 29549148 29549597 # best-guess genotypes pred.guess <- predict(model, yourgeno, type="response") # posterior probabilities pred.prob <- predict(model, yourgeno, type="prob") |
|
library(HIBAG) # Import your PLINK BED file geno <- hlaBED2Geno(bed.fn=".bed", fam.fn=".fam", bim.fn=".bim") summary(geno) # HLA genotypes, 01:02/02:01, 05:01/03:01, ... train.HLA <- hlaAllele(geno$sample.id, H1=c("01:02", "05:01", ...), H2=c("02:01", "03:01", ...), locus="A") # Selected SNPs, two options: # 1) the flanking region of 500kb on each side, # or an appropriate flanking size without sacrificing predictive accuracy snpid <- hlaFlankingSNP(geno$snp.id, geno$snp.position, "A", 500*1000) # 2) the SNPs in our pre-fit models model.list <- get(load("European-HLA4.RData")) snpid <- model.list[["A"]]$snp.id # Subset training SNP genotypes train.geno <- hlaGenoSubset(geno, snp.sel=match(snpid, geno$snp.id)) # Building ... model <- hlaAttrBagging(train.HLA, train.geno, nclassifier=100, verbose.detail=TRUE) summary(model) # Save your model model.obj <- hlaModelToObj(model) save(model.obj, file="your_model.RData") # Predict ... # best-guess genotypes pred.guess <- predict(model, newgeno, type="response") # posterior probabilities pred.prob <- predict(model, newgeno, type="prob") |