Changes between Version 1 and Version 2 of BIOS_PreparedData


Ignore:
Timestamp:
Sep 19, 2016 4:44:21 PM (9 months ago)
Author:
jamverlouw
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • BIOS_PreparedData

    v1 v2  
    44== Freeze I ==
    55=== Data available ===
    6 Raw RNA seq data is avalable at the grid, see [wiki:FgRnaSeq RNASeq data]. This data has been aligned using the pipeline described at [wiki:FgPipeline RNAseq alignment and quantification pipeline], the exon, transcript and gene level count output is described in the following. Count data is available from the so called 'Freeze1': These are the 2116 samples from Groningen (N=626), Leiden (N=654), Rotterdam (N=652) and Maastricht (N=184) that passed QC (see [wiki:FgSampleBlacklist2 RNAseq QC]). This is around half of the BIOS RNA seq data that is used for the first papers: the other half has been measured but is still in the process of aligning and QC. Both raw and TMM normalized data are available. TMM normalization corrects for the different library sizes across subjects, see attached script for R code or the R package edgeR, and http://genomebiology.com/2010/11/3/r25.
     6Raw RNA seq data is avalable at the grid, see [wiki:BIOS_RnaSeq RNASeq data]. This data has been aligned using the pipeline described at [wiki:BIOS_Pipeline RNAseq alignment and quantification pipeline], the exon, transcript and gene level count output is described in the following. Count data is available from the so called 'Freeze1': These are the 2116 samples from Groningen (N=626), Leiden (N=654), Rotterdam (N=652) and Maastricht (N=184) that passed QC (see [wiki:BIOS_SampleBlacklist2 RNAseq QC]). This is around half of the BIOS RNA seq data that is used for the first papers: the other half has been measured but is still in the process of aligning and QC. Both raw and TMM normalized data are available. TMM normalization corrects for the different library sizes across subjects, see attached script for R code or the R package edgeR, and http://genomebiology.com/2010/11/3/r25.
    77=== Location on VM ===
    88
     
    1717
    1818Data is stored in R objects, by loading the files in R (e.g. type load('/virdir/Backup/RP3_data/RNASeq/run_01/exoncounts/exon_count_freeze1_R_object.RData') in R) there will be a matrix called RNA in your workspace for the raw data, and RNAs for the TMM normalized data. The row names of these matrices (type rownames(RNAs)) contain gene, exon or transcript IDs, the column names (colnames(RNAs) are the subject BIOS IDs (called uuid in other files). We used
    19 ensembl v.71 for annotation, see [wiki:FgReferenceFiles Reference and annotation]. If you want to export the data to a tab delimited text file, use write.table(RNAs, file='yourfile.txt', quote =FALSE, col.names=TRUE, row.names=TRUE, sep='\t').[[BR]]
     19ensembl v.71 for annotation, see [wiki:BIOS_ReferenceFiles Reference and annotation]. If you want to export the data to a tab delimited text file, use write.table(RNAs, file='yourfile.txt', quote =FALSE, col.names=TRUE, row.names=TRUE, sep='\t').[[BR]]
    2020== Freeze II ==
    2121=== Data available ===
     
    6262=== Data available ===
    6363
    64 BIOS phenotype data is stored in a meta database, see [wiki:FgMetadatabase this page]. This databases can be accessed by so called views, using e.g. R. Three views were extracted (January 2015) and stored at the VM: [[BR]]
     64BIOS phenotype data is stored in a meta database, see [wiki:BIOS_Metadatabase this page]. This databases can be accessed by so called views, using e.g. R. Three views were extracted (January 2015) and stored at the VM: [[BR]]
    6565 * phenotype data: view="allPhenotypes" design="phenotypes"[[BR]]
    6666 * RNA seq sample sheets: view="rnaseq", design="samplesheets" [[BR]]
    6767 * IDs: view="getIds", design="identifiers" [[BR]]
    6868These files are available in .RData and .csv file formats.
    69 See for column name explanations the page [wiki:FgPhenotype Phenotype data]. Phenotype data is not complete yet: we are currently contacting the biobanks to complete there files.
     69See for column name explanations the page [wiki:BIOS_Phenotype Phenotype data]. Phenotype data is not complete yet: we are currently contacting the biobanks to complete there files.
    7070=== Location on VM ===
    7171