A recent bus shelter ad in San Francisco asserts that we are all data nerds. Putting aside any hyperbole or social irony in the ad, many of us in the local tech industry are playing with numbers a lot these days. This love affair with data-driven decision-making is enabled by new streams of data and increasingly cheap digital storage. Media successes provide inspiration for companies founded on the notion that there will be value derived from datasets such as imagery, electric grid usage, human activities and lifestyles. Data science is a new category for jobs, degrees, and an array of applications of statistics, machine learning, and visualization. In this hive of buzzwords, I’m thinking about data science through the framework of Bloom’s taxonomy, and the EE-Scope team that I lead is developing tools that get at answers to the question why.
From my physics background, I know that answering “why” in the purely physical domain is challenging as we aim for results that will survive the scientific process; at some point, the questions become ones that cannot be tested experimentally. When humans are involved, the questions and the answers can be both more accessible and much messier. Residential electricity consumption data reflect the real complications and unpredictability of individual human lives. In developing the energy analytics of EE-Scope at PARC, we aim to respect the limitations of data and derive value by asking the right questions.
When a tool provides a new stream of data, it can be fun to see what is there. With new smart meters, the first thing an electric utility or a homeowner can see is simply how much energy is consumed at finer time resolution (hourly instead of monthly, for example) and with less delay than before. Many companies provide dashboards to display the time-series data to consumers or to the utility. There is excitement in first seeing what is happening, and there is an art is choosing how to present that information.
To derive value from the new data stream, comprehension and application are important. Perhaps these are answers to the question of how. By predicting future load based on patterns in consumption, a utility can more reliably balance supply and demand. A utility or a consumer may find value in comparisons of consumption between consumers or over time. Disaggregation of electricity consumption by appliance can add another dimension to the understanding.
Comparisons and labeling give a context and may motivate change. If data exploration ends here, the consumer of the data (the homeowner or the utility, in this example) needs some intuition about the root cause of the observed patterns in order to appropriately make changes. When that intuition is lacking, there is an opportunity for a third-party to provide value.
In developing EE-Scope at PARC, we are focused on trying to answer “why” questions. Patterns, correlations, sorting, and comparisons help build comprehension of the data. However, without understanding causation (such as why Bob uses more energy than Alice or why this neighborhood consumed more electricity this June than last June), it is hard for consumers or for utilities to choose the best way to react. If Bob wants to reduce his consumption to be more like Alice’s, he needs to understand why he is consuming more energy. Maybe he has two teenagers in his house and Alice lives alone. Maybe his air-conditioned house lacks insulation while Alice’s is better insulated. In each case, understanding why Bob uses more electricity than Alice is critical to understanding what and how much Bob can realistically change in his behavior or his home in order to meet his goal. With EE-Scope, we focus on using available data and a minimum viable model to learn what is in the data and what is of value.
To meet state mandates, California utilities often incent their customers to reduce their energy consumption. Incentive programs may be offered on a first-come first-serve basis, which can incur a couple of failure modes. Firstly, some homes may receive information about irrelevant offers (such as a home without a pool receiving mailings about pool pumps); this undermines the customer’s trust in the utility and may make a customer less likely to pay attention to other messaging. Secondly, the people who take advantage of an offer may not be the ones to benefit the most. In taking advantage of a rebate for weatherizing his home, Bob may be taking that incentive away from his neighbor who has an even leakier home. In either case, the utility has to work harder and spend more ratepayer money to achieve its energy efficiency goals.
PARC EE-Scope is a tool to help utilities better design and target energy efficiency incentive programs so that the right customers may be offered appropriate incentives (for structural or behavioral changes). Ingesting data that utilities can access, EE-Scope outputs comparative metrics to understand the likely reasons why some houses use more energy than others. Targeting based on such metrics can increase the cost-effectiveness of the programs.
Are we all data nerds? We each lack some amount of training, data, time, access, or interest to make every decision on sound analytics and so we frequently trust others to analyze data for us; we ask doctors to interpret lab results and mechanics to interpret dynamometer tests. Big data allows us to be even bigger consumers of analytics, trusting third-parties with both our data and our ignorance, asking them to look through a “microscope” of some sort and tell us the results. The quality of what we are told depends on the quality of the data, the quality of the questions, and the respect given to uncertainties inherent in the data and the analysis. In building EE-Scope, PARC aims for the upper tiers on Bloom’s taxonomy, hoping to bring value to utilities so that they can improve their business and more easily meet efficiency goals.