In the research lab where I work, we've been developing a data processing pipeline for several years. This includes not only a program but also a new file format based on HDF5 for a specific type of data. While the choice of HDF5 was looking compelling on paper, we found many issues with it. Recently, despite the high costs, we decided to abandon this format in our software.
In this post, I'll describe what is HDF5 and what are the issues that made us move away from it.
There are many data visualization tools out there. Yet, I believe we're still lacking a robust, scalable, and cross-platform visualization toolkit that can handle today's massive datasets.
Most existing tools target simple plots with a few hundreds or thousands of points: bar plots, scatter plots, histograms and the like. Typically, these figures represent aggregated statistical quantities. Maps are also particularly popular, and there are now really great open source tools.
Perhaps contrary to a common belief, this is not the end of the story. There are much more complex visualization needs in academia and industry, and I've always been unsatisfied by the tools at our disposal.
My latest book was released a few weeks ago. This project has been one of the most challenging projects I've ever done, and not necessarily for the reasons I would have originally thought. Here is a little story of those fifteen months writing the IPython cookbook.