“Do you have other flow plots showing mitotic activation using phosphatase inhibitors?” “Can we get these charts with error bars?” “Can I get a spread of all of Stephanie’s images that match your flow plots?” Sound familiar? I ALWAYS got questions like those from my PI during lab meetings, through email, random drop-ins in the lab and even text messages. Of course, the frequency of these demands increased as we got closer to publication. My time spent running experiments decreased precipitously in exchange for diving into the Burnham server, retrieving raw data and curating it for presentation and publication. How many times have you had to search for past data, figures or protocols that belonged to you or someone else?
My anecdotes and yours aside, recent research indicates that engineers and so-called “knowledge workers” spend at least 20 to 30% of their time at work searching for information.*
Part of the problem lies with the nature of data itself. Data comes in many formats, compressed or uncompressed, raw, processed or curated, from lists of text and numbers in neatly separated columns and rows to complicated matrices embedding audio, video or “stack” information. Each data type of course, requires a separate program to read the specific file type. Until you can fully “see” the data display, the filenames and extensions will give you but a slight glimpse at the much larger picture.
However, the largest part of the data management problem is the separation of data generation from its context. Ideally every aspect of the data, where it came from, who generated it and when, what format, what its purpose is, whether it is immediately useful or ought to be archived, should be annotated and easily apparent. These metadata are what make data meaningful in the first place, and are absolutely required prior to storing data if you have any hope of retrieving it in the future.
This is not to say the world is devoid of tools to help alleviate these problems. On the contrary, several laboratory information management systems (LIMS) and electronic laboratory notebooks (ELNs) have been developed. A quick search reveals a myriad of LIMS systems and ELNs developed by an equal number of publishers. Many come with fancy graphical interfaces, easy-to-search features and structured filesystems to reduce search time. Surely there must be one that solves our dilemma? Ironically, the sundry numbers of available “solutions” is an indication of how difficult and persistent this problem is.
In short and unfortunately, LIMS are where “data goes to die”. Unless the data are: 1) indexed properly for every data type produced, 2) easily discoverable and 3) can convey where the data “fit” in your current workflow, the LIMS will not perform better than your standard N drive. The number of indices becomes a battle of attrition too. You will end up with more indices than actual data when it is “fully annotated”, spending an equal amount of time in the reverse direction; annotating data to make it 100% discoverable will cost as much time, if not more, than what you spend attempting to discover it.
ELNs don’t fare much better. Without a way to amalgamate data sets coming from other people in the lab or to visualize a process, the notebook archive will collect just as much electronic dust as that 1980’s Peechee.
In addition, neither the LIMS nor the ELN has a system to tether the communication stream surrounding a given experiment. The data always seem to be divorced from any communication about its generation and a second or third platform must be searched to find pertinent information about the data. We all have experienced searching those long email chains by file type, sender, or odd-defining keyword to remember any nuance about the data. Compile that into the database!
To solve this enigma, we decided to meld a communication system (think Twitter or Slack, but for your data) with a cloud-based data repository. By combining these two components first, we tether communication about data with the data. Second, we implemented a process dashboard that literally spells out each step in a given experiment revealing an easy-to-understand process. Finally, we made it so that each step in a given experiment could be driven by a different person. Different responsibilities, different data, different protocols, same centralized location. Suddenly the two-core, 5-person study that generates seven different file types becomes a manageable, visibly intuitive workflow. The solution we’ve introduced is called FlowJo Envoy.
*Referenced here: http://utrconf.com/top-3-reasons-why-we-spend-so-much-time-searching-for-information/ http://www.ejitime.com/materials/IDC%20on%20The%20High%20Cost%20Of%20Not%20Finding%20Information.pdf