Changes between Initial Version and Version 1 of BIOS_VirtualMachine

Sep 19, 2016 11:57:14 AM (3 years ago)



  • BIOS_VirtualMachine

    v1 v1  
     1= BIOS data location and access overview =
     3Processed files are stored at the BIOS VM. BIOS VM is a virtual machine, a digital computer, running at SURFsara's HPC cloud. It behaves like a normal Linux server would, though it can be connected to graphically.
     5BIOS Metadatabase can be accessed via a web browser or any client program supporting HTTPS.
     7Raw files are stored at [ SURFsara's Grid]. SURFsara's Grid is an online data storage system. Accessing Grid requires a certain level of technical skills and to follow [ a registration procedure].
     9[[Image(BIOSdataInfrastructure.png, 600px)]]
     11= BIOS virtual machine provided by SURFsara =
     13For safety and privacy reasons, BIOS data (genome, transcriptome, methylome and phenome) is only accessible for downstream analysis at a SURFsara virtual machine (VM). The BIOS VM is managed by Martijn Vermaat and Leon Mei. Since the resources and capacity of this VM are limited, it should only be used for downstream analysis. If you want to work on many BAM files or similar expensive analysis you would normally use a cluster for, it's probably better to get acquainted with working on the grid directly. If you are not sure contact Martijn or Leon.
     15The current test VM runs these specs:
     16 * 16 processors
     17 * 128 Gb RAM
     18 * 4.5Tb disk space mounted at /virdir. This is the place you could keep your analysis data. The files in /virdir/Backup folder will be backed up about once per month. The files in /virdir/Scratch are not backed up.
     19 * 2GB soft limit and 3GB hard limit per user at /home
     21== BIOS VM Access ==
     23To get access, please send a request to Leon Mei (`h.mei[at]`) or Martijn Vermaat (`m.vermaat.hg[at]`) with your '''public''' SSH key ([wiki:FgSshKey instructions]).
     25For remote access from a Linux or Mac OSX terminal, type
     29where your private SSH key is in the standard location `~/.ssh/id_rsa` (alternatively, specify it with `-i`).
     31For terminal access from Windows, use the [ PuTTY] tool and configure the VM IP address, your username, and your private SSH key.
     33For graphical access from windows use [ MobaXterm] as advised by SURF in the HPC cloud documentation (
     35Alternatively for graphical access from Windows or Mac OSX, use [ X2Go] and configure the VM IP address, your username, your private SSH key and the session type/desktop manager (Gnome). Or, use a remote desktop connection client (for mac:
     37[wiki:FgSshKey Step by step instructions for using a public/private SSH pair for access to the VM]
     39[wiki:FgConnectTroubleshooting Connection troubleshooting page (under construction)]
     41== Rstudio server ==
     42There is a Rstudio server running on the BIOS VM:
     44You could log in using your username and password as your ssh session.
     46== UCSC Genome Browser tracks ==
     48Viewing RP3 data in the UCSC Genome Browser can be done by using the [wiki:FgStorageInTheCloud#VirDir WWW export directory on the virdir] and selecting the exported URLs as custom tracks.
     50Please note that no privacy sensitive data should be stored here, as it will be world-readable.
     52Example sessions:
     54 * [ Coverage tracks for 10 random samples, meta exon track, and the PolyA binning track]
     56= Grid SRM access from the BIOS VM =
     57In case you need to have access to some raw BIOS data (e.g., RNAseq, methylation), you will have to download them from the Grid SRM storage to the BIOS VM. Here are instructions on how to do it.
     59'''Note:''' Requesting access to the SRM takes quite some time and using the SRM itself is not the easiest thing to learn. As such, if there already is someone at your institute with access and experience using the SRM, it might be faster and easier to ask that person for help.
     61== Grid SRM Access ==
     63Before proceeding, follow the steps in [wiki:FgObtainingGridAccess Obtaining access to grid infrastructure].
     65=== Prepare a proxy ===
     67To download data from the Grid SRM to the BIOS VM you'll need a proxy and your keys (the grid certificate). You should have access to a UI to start a proxy. For example, or another site.
     69* On the UI there should be a `.globus` folder in your home dir, that contains the grid certificate. Copy the `.globus` directory from your local home folder to the UI home folder. Make sure the permissions are set accordingly. (log into an UI and issue the commands `chmod 644 usercert.pem` and `chmod 400 userkey.pem`). These files don't need to be renewed.
     70  {{{
     71  .globus/:
     72  total 8
     73  -rw-r--r-- 1 mgalen mgalen 1769 Aug 14 16:55 usercert.pem
     74  -r-------- 1 mgalen mgalen 1751 Aug 14 16:55 userkey.pem
     75  }}}
     77* The proxy can be started by logging into an UI and use `startGridSession`:
     78  {{{
     79  startGridSession
     80  }}}
     82* This creates your own x509 in the `/tmp` dir on the UI which looks something like this.
     83  {{{
     84  -rw------- 1 mgalen   6.1K Aug 27 09:55 x509up_u40208
     85  }}}
     86* You may have to change the permissions of this file using `chmod 644 x509up_u40208`. Copy this file to a place at the BIOS VM for later use. (Maybe to `/tmp` also.) Make sure you copy the x509 file associated with your username. This is valid for 7 days, you need to renew this weekly.
     88* Now log in the BIOS VM and use this command to fetch the file you just created on the UI to the BIOS VM.
     89  {{{
     90  scp /tmp    (replace ''  with the address of your UI)
     91  }}}
     93=== Downloading files ===
     95* Once these files are in place, you can copy data from the Grid SRM to the BIOS VM using curl. For example, login to the VM and issue the following command, where `-E` points to the path where you put the proxy file. Don't forget to redirect the output from curl to a local filename.
     96  {{{
     97  mgalen@cloud-KVM:~$ curl --CApath /etc/grid-security/certificates/ -E /tmp/x509up_u40208 -L >README
     98  }}}
     100* You can also upload data to the Grid SRM from the BIOS VM using curl. To upload a local file `test.txt`, use the `--upload-file` (or `-T`) argument:
     101  {{{
     102  mgalen@cloud-KVM:~$ curl --CApath /etc/grid-security/certificates/ -E /tmp/x509up_u40208 -L --upload-file test.txt
     103  }}}
     104  (In practice, of course use a more appropriate directory on the Grid SRM instead of the project root.)
     105  Instead of specifying the full target name including filename, you can also just specify the target directory ending in a `/`. Curl will than use your local filename also on the Grid SRM:
     106  {{{
     107  mgalen@cloud-KVM:~$ curl --CApath /etc/grid-security/certificates/ -E /tmp/x509up_u40208 -L --upload-file test.txt
     108  }}}
     110* If you want to delete a file from the Grid SRM, use `-X DELETE` (use this with caution):
     111  {{{
     112  mgalen@cloud-KVM:~$ curl --CApath /etc/grid-security/certificates/ -E /tmp/x509up_u40208 -L -X DELETE
     113  }}}
     115* Just checking if a file exists, without really downloading it, can be done with the `-I` option (A response with `200 OK` means the file exists, `404 Not Found` means it doesn't):
     116  {{{
     117  mgalen@cloud-KVM:~$ curl --CApath /etc/grid-security/certificates/ -E /tmp/x509up_u40208 -L
     118  HTTP/1.1 200 OK
     119  Date: Tue, 28 Jan 2014 15:26:17 GMT
     120  ETag: 0000EADAD57CE41D47F5A8F069A7C24F8003_-1773128220
     121  Last-Modified: Wed, 15 Jan 2014 14:31:15 GMT
     122  Content-Length: 154
     123  Server: Jetty(7.3.1.v20110307)
     124  }}}
     125  {{{
     126  mgalen@cloud-KVM:~$ curl --CApath /etc/grid-security/certificates/ -E /tmp/x509up_u40208 -L
     127  HTTP/1.1 404 Not Found
     128  Content-Type: text/html
     129  Transfer-Encoding: chunked
     130  Server: Jetty(7.3.1.v20110307)
     131  }}}
     134== Data Storage ==
     136Read [wiki:FgStorageInTheCloud Storage in the cloud]