It was the Best of Times, It was the Worst of Times (for Data Wonks)

frustrated man

As we hit mid-summer, I begin to look forward to the things I love about the fall: jackets and sweaters, my kids going back to school and four months of being up to my ears in utility data. This is the time of year that I plan for one of my most challenging annual tasks - updating MEEA’s tracking data to include the latest round of utility annual reporting on energy efficiency spending and savings. It's a labor of love that occupies much of my time September through January so we can release new estimates at our annual Midwest Energy Solutions Conference in February.

Why, you ask, is this process so challenging and time consuming? It used to be due to difficulty finding dockets, but these days, the biggest headaches have to do with data quality.

What Is “Data Quality”?

I should note that by ‘data quality,’ I do not mean whether the savings, spending and other metrics reported are real – utilities are required to go through an evaluation, measurement and verification (EM&V) process to determine those values. When I’m talking about data quality here, I’m talking about the completeness of the data and the ability to obtain meaningful and comparable data across numerous utilities and states – the breadth and depth of the publicly-available data that we need for tracking and analysis.

The sheer variety of approaches to reporting is astounding. First, there is the difference in regulatory regimes from state to state and what the commission or legislature has required. Then there is the utility layer. Some utilities file concise annual reports with all of the summary information up front. In other cases, you can only find individual program reports and nothing has been added up, or the data is buried deep in 700 pages of testimony and exhibits. One utility even reports savings in one docket and spending in another. Some have spreadsheets, others just PDFs, and, in some cases, you still find printed documents that have been re-scanned for electronic posting. And don't even get me started on how data that are required to be shared in one state are proprietary trade secrets - protected by redaction and non-disclosure agreements - in a neighboring state. Sometimes there are even differences in releasing or redacting between utilities within the same state.

Energy efficiency data is the wild west. Even with the help of tools like third-party databases, it can be a daunting task to find specific data to do broad statewide, regional or national analyses of energy efficiency programs and portfolios.

What's the Big Deal? 

This is more than a run-of-the-mill frustration for data analysts. Data quality issues affect our teams, organizations and, ultimately, the impact we have. The better the data quality, the easier it is for data professionals to do our jobs of collecting and analyzing it. In turn, that analysis enables MEEA and other EE stakeholders to educate policymakers and the public on the benefits of utility EE investments. An effective, data-driven approach to affecting change requires good, consistent data on the front-end.

The Fix is Data Standardization

Some states in the Midwest have made good efforts at standardized summary reporting. Indiana required annual scorecard reports from every utility in the same docket on the same day under its energy efficiency standard, but when that the state legislature repealed that standard in 2014, unfortunately, the standardized, centralized reporting was lost with it. Over the past several years, Illinois has created annual summary reporting templates through its Stakeholder Advisory Group (SAG) statewide collaborative. Iowa and Minnesota tend to have very open data environments, and though there is some variation between utilities in presentation, there is usually no trouble finding deep data. The bad examples still outweigh the good ones, though.

My favorite thinking on fixing the energy efficiency data problem comes from the Electricity Policy and Markets Group at Berkeley Labs. In their 2014 analysis of the cost of energy efficiency programs, the authors included a discussion of the data problems they found and propose a framework for what level of data enables what type of analysis (see p. 54-57 of the report).

Roughly, what they suggest is a three-tiered hierarchy of energy efficiency data: baseline reporting, reporting for intra-state and regional assessment and reporting for regional and national assessment. Each additional layer of data enables deeper and more comprehensive analysis. The data requirements increase with each successive level of the hierarchy:

  • baseline reporting includes projected and actual spending and savings, cost-effectiveness metrics, and program descriptions;
  • the state-regional level adds cost breakdowns, itemized costs and benefits, net-to-gross ratios and participation data; and
  • the regional-national level includes methodologies and assumptions, conversion factors, and incentive level data.

The same group followed up in 2015 with a spreadsheet-based tool for energy efficiency reporting that could be adopted by state regulators as a standardized-but-customizable template for energy efficiency reporting.

A Better Way is Possible

It's my dream that someday every utility in every state will use some version of that tool or a similar mechanism and that all of the data we need to analyze state, regional and national trends in energy efficiency programs will be readily available and readily comparable. In the meanwhile, I have dockets to dig through.