Tom's view of what folks are doing or should be doing in the group. PostDocs *********************************** SUFIAN ******* should be involved in all aspects of lab: research, funding, publications, conferences, management etc... Specifics plans/projects August 2014 a) is working on MD simulations of nucleosome A using Human and Mouse variants of Mammory Tumor Virus system i) should provide written background summary (see e.g. Flaus 2004) on sin mutants and mobitliy MMTV ii) provide necessary literature justifying choice of MMTV and HMTV sequences and all details see also the work of Anna Drago from Summer 2014 iii) use bishop's scripts for making, minimizing, equilibrating, simulating ANY nucleosome in /simulations/Examples b) work with SASTRA on sin mutants i) write up of the methods used in existing sin mutant sims including proper citations etc... this is the manuscript in progress ii)summarize the analysis that was done by Suma this should include collecting all of the scripts she used AND working w/ Venkat and Parhdhu to devise a method to "automate" these running of these scripts and publishing data in ibiomes c) learn NAMD and VMD ... see tutorials at www.ks.uiuc.edu these work hand in hand with item a)-iii) and c)-ii) d) should be involved in proposal and allocation process i) allocation on TACC's stampede via XSEDE ii) allocatoin on LONI's new machine SuperMic via short LONI request friendly user period iii)NIH grants: deadline Oct 5 and Feb 5 Oct 5: resubmit of ICM-ToolKit new submit of MD studies of Sin Mutants Feb 5: somethign w/ Kim & Ge at LSU-CCT other opportunties for self funding e) scheduling of teas/talks/tutorial and LASIGMA meetings/reports Graduates *********************************** Each should have their own well defined project and know how it integrates with the overall laboratory effort. Priya (collab w/ Jianghu and Joohyun at LSU CCT) graduating Feb/March 2015 ****** working to develop a web interface/gateway ICM++GB that allows for interactive chromatin modeling w/in the genome browser experience i) learn dalliance b/c this is what Jianghua will use ii) learn JSON as this will be the communication layer iii) learn what SWIG does iv) make sure that the linear model is genewralizable s.t. can take positioning inputs form self, icm, daliance, or any arbitrary other source via JSON Gyanadeep: (ICM development) graduating in May 2015 ****** ICM kernel is merely an implementation of equations 9,10,11 AND 12,13,14 from /home/tmbshare/refs/Hass1995.pdf tools needed i) a data structure that stores: call it a BP... basepair so we have BP helix parms and BP coords a) sequencde information b) the 12 DNA helical parameters (i.e. 12 floats for each bp) c) a set of coordinates representing the directors (CA, H1,H2,H3) (i.e. 12 floats for each bp) the data structure should use i) dynamic memory allocation for storage of XYZ and HP data ii) Armadillo's Vector/matrix structures should be used AND so memory alloc should be through Armadillos' library routines expectation of size ideally will be able to handle 1,000,000 or possible 10,000,000 bp in one par descriptor: so (12 HP + 12 XYZ ) floats/bp * 4byte/float * 10,000,000bp is less than 1Gb of ram for this (960Mb) for processing trajectory data I may want to read in 10,000 to 100,000 par files that's (12HP + 12 XYZ) * 4 byte * 147bp * 100,000 frames is still only 1,411MB or less than 1.5Gb ii) supporting tools a) par2xyz equqations 9,10,11 b) xyx2par equations 12,13,14 c) readpar populates the structure above given a file name the format of par file is Olson's ".par" data set.. essentially 3 header lines then NBP lines w/ "X-Y" for sequence then %8.3f format for each HP datum EACH LINE IS 3 characters a space then 12 * %8.3f" d) readseq reads a string of A,C,G,T (possibly other) and fills in the Sequence par of the data structure d) writepar inverse of above... give a filename and structure...writ it to file e) writexyz write xyz data in VMD recognized XYZ format format is 2 header lines then column data w/ name and xyz data 4000 COMMENT TcB par2xyz CA 0.00000 0.00000 0.00000 H1 1.00000 0.00000 0.00000 H2 0.00000 1.00000 0.00000 H3 0.00000 0.00000 1.00000 CA -0.01227 0.23463 3.22933 . . . f)occupy or fillpar: fill the par data structure with helical parameter values from different sets of helical parameters Some thought needs to go into this. but idea is given multiple "par" descriptors how do we merge/unit them into one set e.g. a length of DNA may be desribed as 0 0 0 00 .... 1 1 1 1. 0 0 0 ... 3 3 3 3 3 3 .00 wherethe 0's represent fill using sequence specific parameter values from par "0" 1's represent a fill using parameters values from par " 1" and 3's represent etc...etc.. this "occupancy" array is merely a pointer/linked list to other "BP" structures NOTE: fillpar and readseq will be required to populate the data structure but readpar populates the data structure in full Venkat: (collab w/ Joohyun and TACC on workflows and local guru) graduating October 2014 ****** Responsible for helping glue high performance simulations together w/ the workflows given a set of inputs should be able to run NAMD jobs whereever and verify that the simulations "worked" and ran efficiently and successfully... details of the biology/system not necc important how to link up w/ analysis important but details of analysis not important. 1) manyjob/bigjob point man... includes ssh, simulation management, file naming conventions for inputs and outputs from NAMD running on LONI, TACC, local, Cerberus 2)IDE/ECLIPSE/SVN/DOXYGEN coordinator/point man 3)SWIG... for scriptizing anything in C++ 4)NAMD on stampede,loni and cerberus 5) automation and verification of MD simulations and restarting when fail ! Pardhu (collab w/ Cheatham Lab Utah and Julien (former TEC Lab grad) graduating in Feb/March 2015 ****** this is listed more or less in priority order i) get all of our existing sims AND existing analysis organized and published in ibioes lite (note bishop already put 90% of the files there. this should run and be stable and independent of any other work) includes: following study ACGT: 16 simsa * 16ns(?) = 256 ns total Yeast: 336 sims * 20ns(?) = 6720 ns total Biologic: 4-6sims * ~100ns = ?? ns NFR: 5*21=105 sims * ~20ns = 2010nstotal ii) as Sufian Identifies analysis done by Suma this should be applied to all files in our ibiomes lite AUTOMATICALLY iii) get the iRODS version of ibiomes running iv) begin to develop iRODS rules that can acheive ii) above Undergraduates *********************************** James Liman (collab w/ Patrick Shipman at CSU fort collins) graduating May 2015 ****** a) working on mathematical and helical parameters analysis of nucleosome superhelix. This is a redo of Bishop's Superhelix Geometry 2008(?) manuscript /home/tmbshare/refs/Bish2008.pdf Immediate issues i) format/review and eval Patrick Shipmans' mathematical analysis helices ii) redo the FOurier KO-KI for all nucleosomes done: a) collect all nucleosome pdbs from RCSB done: b) apply KO and KI tools ??? c) assemble data: RMSD of KO vs wavenumber and KI vs wve number use Bish2008.pdf as guide iii) goal extend Bish2008.pdf to condiser a) intrabasepair parameters... e.g. propeller opening b) energy analysis using stiffnesses are ideal DNA values for stiffnesses (see icm web values) sequence specific values for stiffness and conformation(see ICM web values) conformatons to consider are the TH/SH/Ro-Sl-Tw/ and KO-KI helices c) should see if can get values from recent ABC effort (ie. updates to the ICM web values) see /home/tmbshare/refs/ABC.2014.pdf and /home/tmbshare/refs/ABC.2014.supp.pdf b) little fe demo user c) lammps and energy minimizer for the web using models developed by Korolev and Nars see /home/tmbshare/refs/Korolev2012.pdf and /home/tmbshare//refs/Korolev2010.pdf using Jiang & Pugh's positioning data and the MMTV as our target play set to "fix" Bishop is working on a manuscript for htis