Sep 22, 2013

How not to lose a research project!

One thing that I have learned early on is that, if you are not organized about your research projects, it can one day just 'disappear' !!? By disappear I mean, if you worked on a project in say 2008 and then in 2013 you really need the dataset and code from that one project from 2008, but of course you don't really know where it is now after several rounds of reformatting and transferring to a new desktop and what ever else happened in between. The key to avoiding this hot mess is to keep things organized from the start of a project. Meaning, even from the time you find some related papers and you strike upon a brilliant new research idea.

I have gone through several rounds of trial and errors in keeping my projects organized electronically and perhaps you can get some ideas from the directory structure that I use.
  • First, I have a folder called projects  where I have all files related to ALL my research projects. 
  • Then within the project folder, I create subfolders for each of my projects. Note that I tend to give names for each project for the purpose of clarity. Please don't name your folder as "project1" "project2" or "current" and "previous", etc.  Give proper names like <AutoSum> or <OpinionSummarizerWithLDA2008>. Something highly identifying. You can give an initial name to the project and change it to a better one once you have a better idea of the project itself.
  • Within, each project folder I separate the contents as follows:
    • source/ - if you have any source code, that goes here
    • reading/ - all your literature survey, useful articles, etc go here.
    • experiments/ - all files pertaining to the experiments and results go here
      • input/ - all my input data
      • output/ - the output of my experiments
      • results/ - the excel spread sheet or other documents created for analysis
    • paper-writeup/ - all files relating to paper writing for conference or journal submission reside here.
      • images/ - all the original images (e.g. ppt files with flow chart) that were created for the paper.
      • tables/ - all files related to the tables in my paper. I tend to create tables in Lyx and then paste the source in my latex document.
    • final-submission - all the files that were submitted as part of the camera-ready copy
    • distribute - all the files that were distributed for public usage. For example, you may choose to release your code, your data set and other documents related to the project. You want to make sure you have the exact same version that is available to the public. I can't emphasize enough how important this is.
      • dataset/
      • source/
The above format works for me quite well. If I need access to the dataset released in 2008, with this structure, it would be pretty easy to actually get to it. I just need to access project_name ->distribute->dataset.  And now when I need to do backups or transfer machines, I only have to copy the projects folder. That is usually the first folder I send for backup. 

No comments:

Post a Comment