Friday 14 October 2011

What should a routine system for climate data records look like?

Here are my personal views about what the form should be of the system built in the next Phase of the ESA CCI to provide routine access to high-quality climate data records. This vision is constructed around what is required for SST.


If you aren't interested in the text, at least look at the picture (below).



1. The Climate Change Initiative (CCI) should contribute a long-term scientific legacy of climate-quality satellite observations with uncertainty estimates. CCI climate data records (CDRs) should be as precious to Earth System Science in 100 years' time as the fundamental meteorological and oceanographic observations made in the 19th century are to us now. Practically, this implies a transition from the present Phase 1 activities (user consultation, algorithm selection, system definition and prototyping, off-line production) to routine CDR production and problem-solving. Routine production will support provision of climate services, the scientific/societal/policy demand for which will likely be long-term.



2. ESA have wisely required scientists and system engineers to work together in the prototyping and system specification in Phase 1. This close co-operation must continue through Phase 2 and thereafter: transition is not merely a software implementation task, and should include embedded science teams for problem-solving in the CDRs.


3. The maturity and functionality of prototype software from Phase 1 will vary between ECVs.  For SST and several other variables with functional prototypes, Phase 2 should focus on commissioning existing prototypes in routine reliable infrastucture and using this for initial operations (including at least one improvement cycle in Phase 2 I'd suggest – see figure above). This may involve code revision/refactoring, although complete rewriting/re-engineering of code from DPMs should not be necessary (and would likely be scientifically counter-productive).

4. During Phase 2 (in parallel with routine production), we should demonstrate the process of going through at least one cycle of feedback and improvement of the CDR. A concept for this is presented in the figure above. Including this aspect in Phase 2 will prove the approach for the later phases of ‘continuous development and operations’ that should continue the work of the CCI long term.

5. Therefore, ‘routine CDR generation’ must be viewed broadly, as in the figure, to include
·      routine, reliable provision of climate-quality observations via robust implementation of excellent scientific methods; and
·      the agile scientific work-flow that delivers upgraded CDRs in response to new requirements and problems.

6. Continuous development is essential in parallel to routine generation because
·      we continually learn how better to reduce uncertainties and improve stability of CDRs
·      experiences of users will always uncover new problems in these huge, complex data sets
·      inputs (level0 and/or level1) will periodically be reprocessed and improved by agencies
·      new sensors will require seamless integration into the CDR, while preserving stability and other quality aspects
·      new user requirements will emerge, demanding new, value-added or better quality CDRs.

7. Routine generation must include continuous availability of a full, consistently reprocessed, continuously updated CDR. In the figure, this is the purpose of having overlapping periods between CDR version. The new version running in parallel would be designated "pre-operational" in a meteorological agency. Pre-operational provision gives users time to move across to the next version CDR.

8. The improvement cycle (loop from problem identification to reprocessing) requires agile implementation of scientific innovations (e.g., new algorithms) within a robust and traceable system environment. The full effect on the CDR is apparent only on reprocessing the full archive. The system must therefore be capable of maintaining routine delivery of ‘new’ data while simultaneously undertaking rapid (preferably multiple) reprocessing as part of this improvement cycle.

9. Distinctions between near-real time operations and routine CDR generation are real and should be recognised. One distinction is the need for the CDR improvement cycle as part of the operations, discussed above. Another is that stability and consistency take on a greater importance for a CDR. Lastly, short-time delivery with a lag of days is generally acceptable. This time delay can, and should, be exploited to maximize the climate-quality aspects of the CDRs, beyond what is possible with near-real time provision. (For example, processing for consistency across a constellation, or stability in time can take advantage of the short time lag in delayed-mode delivery.)