# 1. Introduction

Methods to forecast photovoltaic, PV, power generation are expected to fulfill an important role in the integration of PV systems on current power grids. This is due to the ability that such methods have to anticipate, eventual and strong variations of PV power generation, caused by changes in the weather. The information provided by PV power forecast methods can help power companies and users to prepare for such events. Many methods to forecast PV power generation have been proposed for different time and spatial scales [1-3]. Moreover, comprehensive reviews of several approaches also are available [4, 5].

Regardless the method and input data used, the strong dependence of PV power on weather conditions makes the realization of accurate forecasts in continuous fashion a difficult task. The problem becomes even acuter on locations with unstable weather and for time scales longer than a few hours. In this case, besides a value for the forecast of PV power generation, for a given time and location, it is interesting to have also information about the uncertainty of such forecast. Uncertainty can be expressed in several ways, and one of them is through the calculation of prediction intervals which are expected to contain a future point observation with a given confidence level.

Thus, the objective of this study is to present a simple method to calculate prediction intervals for one-day-ahead forecasts of power generation of single PV systems. The method is based on the use of the maximum likelihood estimation method, and on the concept of similarity between PV power forecasts for different hours and different days.

To validate the method, it was applied to calculate prediction intervals, with different theoretical confidence levels, for 1 year of hourly forecasts of power generation of two PV systems installed in different locations in Japan. The forecasts of PV power were done using a method previously proposed [6], which was based on numerical weather prediction data and a support vector regression algorithm. The performance of the prediction intervals calculation method was verified analyzing the correspondence of pre-specified confidence levels used in the intervals calculations and the achieved annual forecast error coverage they provided. Moreover, comparisons with 2 naïve reference approaches were done to evaluate the sizes of the prediction intervals according to their forecast error coverage.

# 2. Prediction Intervals Methods

In this section the proposed method to calculate prediction intervals is presented and compared with previous approaches. Moreover, two naïve reference methods to calculate the intervals are presented. Their objective is to provide a basis of comparison to analyze the performance of the proposed method.

## 2.1 Proposed method

It is desired as a prediction interval for any point forecast of PV power generation fi, an interval that will contain the true PV power generation yi with a given confidence level. One way to approach this problem is making assumptions about the distribution of the forecast error e(x), regarded as f(x) −y, where x represents the input variables used to make the forecasts f(x). If the true distribution of the forecast error is known, predictive intervals can be obtained with a given confidence level from the corresponding probability distribution. In the case the true distribution is not known, one option is to assume that the forecast errors follow a known distribution and to estimate the parameters of such distribution via maximum likelihood estimation [7].

In a previous study, Lin and Weng [8], proposed to calculate prediction intervals with the approach described for forecasts done with support vector machines. They estimated the forecast error of the method based on the errors of a cross-validation procedure on the training data used to construct the forecast model. Furthermore, they assumed that the distribution of the forecast errors followed a symmetric Gaussian distribution, as shown in Eq. 1 or a symmetric Laplacian one, as showed in Eq. 2. From these assumptions it is possible to estimate the scale parameter σ for each of these 2 distributions by maximizing the likelihood. In this case, σ for a Gaussian distribution is the root mean square error of the forecasts; and for a Laplacian one σ is the mean absolute error of the forecasts.

If the probability distribution of the forecast error follows a known distribution, then for a given probability 1 − s, the prediction interval limits can be calculated from upper sth percentile ps of the corresponding probability distribution, Eq. 3. In Eq. 3 Llim and Ulim are the lower and upper limits of the prediction interval.

For a Gaussian distribution, the upper ps is given by Eq. 4, where Φ-1 is the quantile function of the distribution.

For a Laplacian distribution, ps is given by Eq. 5.

According to Lin and Weng [8], similar approach was also used by Platt [9] in the problem of classification. However, this kind of approach presents 2 problems to be applied in the problem of PV power generation forecasts. First, the forecasts errors are estimated from a cross-validation procedure. The PV power generation forecast is a time series problem. As such, the use of cross-validation if applied directly will not yield good error estimates for the forecast model. The second problem is that the calculation of σ as proposed implies that the forecast error distribution depends on the input just through the forecasted value [8]. In other words, each forecast model will have just one prediction interval regardless the magnitude of the input variables. In the problem of PV power forecast this assumption poses a problem because forecasts for different periods of the day and weather conditions will have prediction intervals with different sizes. For example, forecasts for hours at the beginning and the end of the day should have prediction intervals with lower magnitudes than for hours around noon time.

In fact, we showed in a previous study that the application of such approach without modification is not effective in the PV power generation forecast problem, proposing a simple modification based on the target hours of the forecasts to improve the prediction intervals [10].

In this study we propose to use past forecast errors instead of using the ones of a cross-validation procedure applied on training data. Furthermore, a criterion based on input data similarity is used to obtain suitable prediction intervals according to the forecast hour and input data of the forecasts. The hypothesis behind this approach is that for a specific location, at a given time, similar input data should yield similar forecasts errors of PV power generation and these errors should belong to the same distribution. Thus, a prediction interval of the PV power generation value for a sunny weather at noon will be based on past forecast errors for sunny weather at noon. Calculating this way for a given location, the prediction intervals will vary according to the input data, weather conditions and target hour of the forecasts.

To identify past input data similar to the input data of a target forecast the Euclidean distance was used as the similarity parameter. Therefore, to calculate the prediction intervals of a forecast of PV generation for a given hour, the input data that generated such forecast is compared with the input data of hourly forecasts done in the previous 60 days. From this comparison the n% most similar hours are retrieved and used in the calculation of the prediction intervals. Based on a preliminary assessment of how much data are necessary and how similar the data have to be to obtain good prediction intervals, n was set on 5% (42 hours) of the initial set of data. The preliminary assessment results are in the initial version of this paper presented at the 2014 International Conference of Electrical Engineering [11].

Finally, regarding the proposed method, two physical constraints were adopted. First, the lowest value for the inferior limit of the prediction intervals was set to be zero as it is the minimum PV power generation. Second, the maximum value for the superior limit of the prediction intervals for a PV system was set to its maximum theoretical power generation at the same hour, given the same extraterrestrial insolation conditions.

## 2.2 Reference method 1

If past forecast data are available, a simple approach to obtain predictive intervals would be to use these data directly without making any assumption about the distribution of the forecast errors. In this case, the intervals are directly estimated from the quantiles of the data sets. With this method, to calculate the prediction intervals for a given forecast with a confidence level of 90% for example, it is enough to identify to 5% quantile and the 95% quantile of the past forecasts that had their input data similar to the input data of the target forecast.

This method may work well in databases containing many years of past forecasts. However, its application in this study provides an assessment regarding the validity of the hypothesis done in section 2.1, where several years of past data are not available.

## 2.3 Reference method 2

A different reference approach to calculate the prediction intervals is one where they are defined by the maximum and minimum possible values for the forecasts of PV power generation for each hour of the day. In this way the intervals will always comprise the true PV power generation and they will provide coverage of the forecast error of 100%. In reality, however, the resulting intervals will be so large that they will not have any practical application.

Nevertheless, the use of this method has the objective to provide a reference value regarding the size of the intervals obtained with the method proposed in section 2.1. If the proposed method yield intervals as large as the ones obtained with this reference method, they will not be useful.

To obtain the maximum possible values for the forecasts of PV power, the horizontal extraterrestrial insolation for every hour of forecast is used. With this information the PV power generation was calculated using the model presented in Eq. 6, proposed by Mellit & Pavan [2].

In Eq. 6 Ppv is the photovoltaic power generated in kW, A is the total area of the modules in m2, npv is the conversion efficiency, nbos the system efficiency and G the insolation in kW/m2. To obtain a maximum theoretical value for the PV power G was regarded as the horizontal extraterrestrial insolation. Furthermore, to avoid problems with shadow, modules tilt angle, orientation angle, and with the first and last hours of daylight, a correction factor of 5% of the rated power of the PV system was added to Ppv.

# 3. Forecast Method

The prediction interval methods described in section 2 can provide intervals for forecasts of PV power generation done with any kind of method. They depend only of the past input data used and the output data the forecast method yielded. In this study they were applied to provide intervals to forecasts done with a method based on the use of support vector regression, hourly extraterrestrial insolation and numerical weather prediction data. These data are provided on the day preceding the forecast day. The forecast horizon is therefore of one day ahead of time. The numerical weather prediction is provided by grid-point value forecasts with a meso-scale model, GPV-MSM, of the Japan Meteorological Agency.

The input data used for any hour of forecast of PV power is in Table 1. The method provides hourly forecasts based on the hourly input data and for each day of forecasts the model is trained with hourly input data and measured PV power of the previous 60 days. Details about the setup of the algorithm and its application are in previous studies [6, 12].

**Table 1.***The value for the hour of forecast and the preceding one are used as input.

# 4. PV Systems Data

One year of prediction intervals, 2010, were calculated for hourly forecasts of power of 2 PV systems. One PV system is located in Saitama prefecture, north of Tokyo, and the other in Aichi prefecture, southwest of Tokyo. Both PV systems have a rated power of 10 kW, and their specifications and installation conditions are in Table 2.

**Table 2.**PV systems specification and installation data.

These 2 PV systems were chosen because they provide examples of forecasts of PV power with high average annual forecasts errors, PV2 in Table 2, and low average annual forecast errors, PV1 in Table 2. Thus, the performance of the prediction interval methods can also be assessed for different kinds of forecast errors.

# 5. Results

In Fig. 1, the annual forecast error coverage achieved with each confidence level is presented. In Fig. 1a are results for PV system 1, and in Fig. 1b are the results for PV system 2. Each figure also contains a dotted line representing the ideal behavior regarding the confidence levels and the forecast error coverage. Finally, in the same Fig. 1a and Fig. 1b are the results obtained using the reference approach 1.

**Fig. 1.**Annual forecast error coverage with prediction intervals versus the corresponding pre-specified confidence levels used in the calculation of the intervals for a PV system with low forecast errors (a) and another with high ones (b).

Based on the results in Fig. 1a and Fig. 1b, it is clear that the reference method 1 has poor performance regardless the PV system and the confidence level. This characteristic reflects the fact that the data set size of 42 hours of similar input data is not sufficient to provide direct estimation of prediction intervals.

Regarding the distribution assumptions, the results in Fig. 1 show that the difference between the use of the Laplacian distribution and the Gaussian distribution was small. However, clearly, assuming a Laplacian distribution caused the prediction interval method to approximate well the slope of the ideal curve for both PV systems. In the case of the Gaussian distribution assumption, the confidence levels had a tendency of underestimating the forecast error coverage for low values, 85% and 90% and overestimating them for high values, 95% and 97.5%. This behavior is noted in Fig. 1a.

Comparing both PV systems, PV system 2, which generally had high forecast errors, caused the proposed method to yield prediction intervals naturally larger than the ones obtained for PV system 1. The overall result was higher forecast error coverage for PV system 2 than for PV system 1. However, there was not a strong difference; it was not higher than 1.5% in the worst case.

Another important factor to consider in the evaluation of a prediction interval method is the size of the intervals it yields given different pre-specified confidence levels.

In the case of PV power prediction intervals, their size can be regarded as a kind of reserve power needed by the PV system operator to deal with the forecast error. For example, given a forecast of PV power for an hour, if the forecast underestimates the true value, there will be an excess of power regarding what was expected. This excess can be thought as a quantity that has to be absorbed, wasted, or sent somewhere else in the power grid, or to a battery, so that the balance between power demand and supply can be kept. In the case such excess of power is not wasted, the upper limit of the prediction interval can be thought as a measure of a reserved capacity prepared to store surplus of PV power generation.

On the other hand, if the forecasted PV power overestimates the true value, power has to be delivered by the power grid, or by a battery to complete the gap between what was expected and what was generated. In this case, the lower limit of the prediction interval expresses a reserved capacity available to deliver power in case of overestimations.

In both cases the prediction intervals can be seen a measure of how much power has to be reserved. An example of this way of seeing the prediction intervals is illustrated in Fig. 2.

**Fig. 2.**Prediction intervals as a measure of reserve power to deal with PV power generation fluctuations.

Considering the intervals as reserve power, they will imply costs. Therefore, it is desired to obtain intervals that are only as big as necessary. The intervals’ size for given confidence levels, provides then a useful measure when comparing prediction interval methods.

It should be noted that a proper prediction interval method will yield intervals that ultimately reflect the level of forecast error. If the forecast errors for a given hour or weather condition are high, so it should be the related prediction interval. Therefore comparisons of interval sizes only make sense when comparing different prediction interval methods to evaluate which one reflects better the characteristics of the forecast errors.

An initial evaluation of the intervals size is presented in Fig. 3(a), for PV system 1 and Fig. 3(b) for PV system 2. The reserve power is normalized by the PV system rated power. The required value achieved for each pre-specified confidence interval is presented. In Fig. 3(a) and Fig. 3(b) the reserve power required by reference methods 1 and 2 are also presented.

**Fig. 3.**Annual reserve power required with each prediction interval method for different confidence levels (for a PV system with low forecast errors (a) and one with high ones (b).

The results in Fig. 3 show that with reference 2 100 % of the forecast errors are covered. However the required reserve power to do that was significantly higher than the reserve power required by the proposed method. For example, in Fig. 3(a) using the Laplacian distribution assumption with a confidence level of 97.5%, a forecast error coverage of 97.1% was achieved using 36% less reserve power than reference method 2.

Comparing the distribution assumptions, generally with the Gaussian distribution less reserve power was required than with the Laplacian distribution. Nevertheless, as shown in Fig. 1, the effective forecast error coverage was also slightly lower than the one achieved with the Laplacian distribution assumption.

In Fig. 3(a) and Fig. 3(b), the results indicate that reference method 1 required the lowest reserve power regardless the confidence level. However, the results in Fig. 2 also show that such low reserve power values were associated with poor forecast error coverage, making the method actually the worst of the ones evaluated.

From Fig. 3(b), one can see that the differences between the reserve power value required by reference method 2 and the ones of the other methods are lower than in the case of Fig. 3(a). For the PV system with high forecast errors, the application of the proposed method was less effective than for PV systems with low forecast errors. For example, in Fig. 3(b) using the Laplacian distribution assumption with a confidence level of 97.5%, a forecast error coverage of 98.2% was achieved using 18% less reserve power than with the reference method 2. This value is half the difference achieved for PV system 1 in Fig. 3(a).

A better understanding of the performance of each method can be seen comparing directly the effective forecast error coverage with the corresponding reserve power for each method and PV system. These results are in Fig. 4.

**Fig. 4.**Annual forecast error coverage versus reserve power for different PV systems and prediction interval methods.

To identify how much reserve power is required in terms of what is actually generated, the reserve power in Fig. 4 was normalized by the annual power generation of each PV system.

The results in Fig. 4 indicate that the use of the Gaussian distribution with the proposed method yielded in general lower prediction intervals (expressed as the reserve power ratio) than the use of the Laplacian distribution. However, as also noted in Fig. 1(a), for the PV system with low forecast errors using the Gaussian distribution assumption caused strong overestimations of the forecast error coverage for low confidence levels and slight underestimations for high confidence levels. With the Laplacian distribution assumption the proposed method presented more uniform behavior approximating better the pre-specified confidence levels.

For the PV system with high forecast errors, the Gaussian distribution was a better fit. Furthermore, with the Gaussian distribution also the lowest reserve power ratio was achieved.

These results can be understood considering the shapes of the Laplacian and Gaussian curves and the distribution of the forecast errors of both PV systems. For PV system 1, with generally low forecast errors, most of the forecast errors will be around zero. Moreover, the frequencies of forecast errors will decrease sharply with the increase of their absolute values. This behavior resembles better the shape of the Laplacian distribution. In the case of PV system 2, with generally higher forecast errors, the frequency of low forecast error will be lower than the ones of PV system 1, yielding a forecast error distribution more similar to the Gaussian curve.

Comparing the results obtained with the proposed method with the ones provided by the reference method 2, the benefits in terms of less reserve power are clear. For example, to cover 97.1% of the forecast errors for PV system 1, it was necessary to have a reserve of 1.5 times the total PV power generated in the year. To cover all forecast errors with the reference method 2 it was required near to 2.35 times the total PV power generated in the year.

In Fig. 5 are examples of prediction intervals calculated with the proposed method for a given day. The calculations were done for PV system 1 using the Laplacian distribution assumption. In Fig. 5 the green line indicates the superior limit of the prediction interval achieved with reference method 2.

**Fig. 5.**Examples of forecasts of PV power generation with prediction intervals with the proposed method.

# 6. Conclusion

The objective of this study was to present a simple method to calculate prediction intervals for forecasts of power generation of PV systems. The method is based on the use of the maximum likelihood estimation, and on the concept of similarity between the input data used in the forecasts.

The results showed that the proposed method used with the Laplacian distribution assumption is more suitable to PV systems with low forecast errors. For PV systems with high forecast errors the Gaussian distribution assumption was more suitable.

In spite of that, focusing only on the relation between forecast error coverage and confidence levels of the intervals, the use of the Laplacian distribution is indicated. The Gaussian distribution assumption yielded a stronger tendency to overestimate prediction intervals for low confidence level values and to underestimate them for high confidence level values than the Laplacian distribution assumption.

Based on the results, it can be concluded that the proposed method to calculate the prediction intervals in the problem PV power generation forecast is valid. The forecast error coverage obtained with it approximated well the confidence levels of the intervals, and it used significantly less reserve power than the reference method 2. Moreover, it requires just 60 days of past forecasts, being a useful option when large databases with past PV power generation and forecast data are not available.

Still, the results are based on PV systems’ data representing extreme cases regarding annual forecast errors. In further studies a comprehensive analysis containing a wide range of PV systems installed in Japan will be done to better characterize the validity of the method and of the forecast error distribution assumptions.