Help With School!

 

MIDI Music SoftwareMusic Composing SoftwareMIDI Music Composing Software SupportMusic Editing Software LinksOrder Music SoftwareMusic Editing Software Links

Biology Education

The Aspire Q&A program includes many biology-related quizzes:
Aspire Q&A:
Create and use quizzes to quickly test your knowledge of a subject. Includes quizzes that cover physics, chemistry and biology. Or, create your own with the included easy-to-use quiz editor. Click here to get it.

Here are instructions on performing microarray analysis (this is a high-level tutorial intended for graduate students in the biological sciences):

Step-by-Step Tutorial for analyzing Affymetrix expression microarrays
in the R statistical environment:

This tutorial is intended for Microsoft Windows users, but will work on any platform with small modifications (installing packages is not done using ZIP files on other platforms): 

Installing the R statistical environment:
Go to http://www.r-project.org and download the latest version of R. This tutorial is based on version 2.4.0. On the R website select the ‘Cran Mirror’s link and select the download site closest to you. Select the version of R for your operating system. Select the ‘base’ link and then select and run the install executable (something like R-2.4.0-win32.exe) and complete the installation process. 

Installing Bioconductor:
To install Bioconductor, which is a set of extra packages required for microarray analysis, follow these steps:

  1. Run R. You should see a command prompt ‘>’.
  2. Type in this command at the prompt and hit return (this will download the source code for this function):

source("http://www.bioconductor.org/getBioC.R")

  1. Now run the function by typing in this:

getBioC() 

If you need a microarray data set to practice on:
You can download microarray data from this site: http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi
In the section marked ‘RNA interference model of RPS19 deficiency…’ download the file ‘DBA_cel_files.zip’ and unzip the files (Using a utility such as WinZip). For this tutorial the unzipped files are located in the 'c:\MicroarrayData\' directory.

Reading in your microarray files:
Bioconductor needs to know how to group your microarray files by phenotype, so you need to create a text file with information on each CEL file and the phenotype group to which it belongs. In your data directory, create a text file called ‘Phenotypes.txt’ with this info:

Name  FileName    Target
1     CL20030502101AA   Control
2     CL20030506102AA   Control
3     CL20030502103AA   Control
4     CL2004050501AA    EpoSCF10Days
5     CL2004050502AA    EpoSCF10Days
6     CL2004050503AA    EpoSCF10Days

In this tutorial we are only going to use six of the CEL files in the data set, for simplicity sake. Here I have the control erythrocytes at time zero in one group and those treated with erythropoietin and stem cell factor for 10 days in the second group. 

Inside the R environment type in these commands:

library(affy)

This sets the working/default directory (strings are in the C-programming style, so you need two backslashes to denote a single backslash):

setwd("c:\\MicroarrayData\\")

This reads in the phenotype data (the text file we created above):

pd <- read.AnnotatedDataFrame( "RNATargets8.txt", header=TRUE, sep="", row.names=1 )

This reads in the CEL files and links them to our phenotype data (type in all on one line):

rawAffyData <- ReadAffy("CL20030502101AA.CEL", "CL20030506102AA.CEL", "CL20030502103AA.CEL", "CL2004050501AA.CEL", "CL2004050502AA.CEL", "CL2004050503AA.CEL",phenoData=pd,verbose=TRUE)

Each CEL file contains cell expression data for one expression microarray chip. This file contains intensity levels that are matched to each oligo probe on the chip. 

Viewing your CEL data:
To view a picture of the cel file data type in this command (this will display the first CEL file’s data in the data set):

image(rawAffyData[,1])

Viewing the CEL image may uncover defects in the chip or processing of the chip if there obvious smears or large patterns. These chips, though, look fine. 

To plot one CEL file’s data against another use this command:

plot(log(intensity(rawAffyData[,2])),log(intensity(rawAffyData[,3])))

This plots the log-transformed intensity levels of each oligo for both chips (in this case CL20030506102AA and chip CL20030502103AA) against each other. If the intensity levels are exactly identical on both chips you’ll see a straight, thin diagonal line. The more the chips deviate the thicker and more dispersed the plotted data will get. 

This command will plot every paired combination of the specified chips against each other:

mva.pairs(pm(rawAffyData)[,c(1,2,3)]) 

This boxplot will show the ranges of your data across chips:

boxplot(rawAffyData) 

Analyzing your data using the RMA method:
Type in this on the command line:

eset <- expresso(rawAffyData, bgcorrect.method="rma", normalize.method="quantiles", pmcorrect.method="pmonly", summary.method="medianpolish")

This uses the 'expresso' function which allows you to perform several analysis operations at once. It allows you to mix different background correction, normalization, perfect match correction, and summarization methods. Here we’re using the basic RMA method, which uses ‘rma’ background correction, ‘quantiles’ normalization, ‘pmonly’ perfect match correction, and ‘medianpolish’ summarization functions. 

The eset variable now contains the analysis results. To extract the result to a tab-delimited text file use these commands:

sink("ExpressionVals.txt")
write.table(exprs(eset),sep="\t")
sink()

You can now open/import ExpressionVals.txt into a spreadsheet program (like Excel) to view it. Note that the column headers that identify the chips are shifted over, so you will have to manually shift them over one to the right. The row headers are the names of the oligo probes. The values are the normalized expression levels for the specified oligo for each of the microarray chips. 

To perform a t-test on the data (to determine which oligos had a significant change between the control group and the treated group) you can use these commands:

First, create a t-test function by copying and pasting this entire block into R (*note: this is a non-paired t-test):

myTTest <- function(x) {
      # Split the row of expression values based on which phenotype category they are in.

      # note: This assumes there are exactly two phenotypes, no more, no less.
      valsByPhenotype <- split(x, eset$Target)
      # note: Assuming the variances are equal in the t-test.
      # Do a t-test comparison between the expression values of the two phenotypes.
      result <- t.test(x=valsByPhenotype[[1]],y=valsByPhenotype[[2]],var.equal=TRUE)$p.value
      # Return the p-value result.
      return(result)
}

Then you can apply the t-test function to your entire data set like this (this will apply the myTTest function to each row of expression values):

pValues <- esApply(eset,1,myTTest)

Then write out the t-test values to a text file:

sink("TTestVals.txt")
write.table(pValues,sep="\t")
sink()

The 'TTestVals.txt' file will be in the same order as the 'ExpressionVals.txt' file created earlier. It will also have the probe set names so you can match them up.

Performing Global False Discovery Rate Analysis:
Global false discovery rate analysis will try to estimate the number of false positives in your analysis results. One way to perform this using Bioconductor is to use the 'Twilight' R module. To install this module, first download the zip file by going to the bioconductor site ( http://bioconductor.org ) and select 'Browse Packages' and then 'Software'. Find ‘twilight’ in the list and download the latest install ZIP file. Then select the 'Packages'-'Install package(s) from local zip files' option from the menu. 

Once you’ve installed the package you can load the library:

library(twilight)

To perform the global false discovery rate analysis, first create the phenotype pattern in an array like this (indicating the first 3 chips are in the first phenotype group and the next 3 chips are in the second phenotype group):

id <- c(1,1,1,2,2,2)

Then perform the analysis – this command uses the top 5% as a cutoff (*note: this has a 'paired=TRUE' option if your data is paired):

pval <- twilight.pval(exprs(eset),id,quant.ci=0.95)

You can then just type in the variable name pval to see a summary, including the ‘Estimated percentage of non-induced genes’:

pval

Then export the p-value results to a file:

sink("fdrResults.txt")
write.table(pval$result,sep="\t")
sink()