Advanced Data Quality Analysis of Data Exchange Platforms

The scenario aims to encourage the development of tools and services for data quality checks from a general perspective that will be flexible enough to adapt to the different needs of data exchanges among TSOs, DSOs and consumers. The developed tool is expected to be part of project’s middleware to measure the quality of exchanged data based on proposed methodologies. Considering the rich content of the ENTSO-E Transparency Platform (TP), the applicants are encouraged to demonstrate developed methodologies utilizing the TP data to test the performance of the developed tool in addition to identifying data quality issues on the TP.

Specific service

The service is to apply advanced machine learning algorithms on the data exchanged between different players in the TSO-DSO-Consumer value chain. The developed service will also serve for the Transparency Platform to enhance the quality of the data by highlighting the abnormalities. In particular, the service should be able to detect outliers from timeseries where standard methodologies are not sufficient.

The service will have a link with the T5.4 related to AI and big data. It will also be generally beneficial for the overall OneNet architecture to enhance the quality of the data with large amount of data coming from distributed sources.

For the tool’s development purposes data from the Transparency Platform can be used as an input. The output will be the result of the data quality analysis.

Addressed to:

The development of the expected tool under this scenario requires expertise in the fields of Big Data Analytics, Machine Learning and advanced AI methodologies to perform data quality measurements in the energy domain. Start-ups, SMEs from the field of data service provision and aiming to be part of TSO-DSO-Consumer value chain are welcome to apply.

Description of the scenario

The service provider downloads the data from the Transparency Platform (starting with one data item).

The service provider develops an algorithm that enable to apply machine learning on timeseries. The algorithm is trained on historical data from the Transparency Platform and enables to predict the outliers or wrong data in the new published data as well as on the Transparency Platform historical data. The proven service will also serve for the data quality measurements for the exchanged data in OneNet demos. After completion of OneNet project, the developed tool under this call shall be accessible to the beneficiaries of the project on royalty-free basis.          

Important information for applicants

The applicants are expected to use data exchanged in OneNet demos as well as the the ENTSO-E Transparency Platform data to develop tools and services to perform data quality checks and asses the data quality on the platform with the proposed methodologies. As the Transparency Platform offers a wide variety of data, the applicants can focus on a smaller subset of data items under the scope of this scenario.

The following parts of this section will provide a general overview of the Transparency Platform data and data download options from the platform.

Available Data on the Transparency Platform

In accordance with Regulation 543/2013, the ENTSO-E Transparency Platform was launched on 5 January 2015. Following the launch of the new platform, the www.entsoe.net website, on which TSOs had voluntarily published some market data since 2011, was de-commissioned in March 2015. The historical data from 2011-2014, which was previously published on entsoe.net, is available to download from the Transparency Platform > Data Pre-5.1.15 section.

Currently, data on Transparency Platform is published under 7 different domains.

Figure 1. Data Domains on Transparency Platform

Load: Within this domain, actual total load for and load forecast data with various horizon (day, week, month, year) is presented.

Generation: Data regarding installed capacities, generation and generation forecast is presented.

Transmission: Data about cross border power transfers and forecasted capacities is revealed.

Balancing: Under this domain, data regarding to keep the electricity grid balanced is published. This includes bids data, accepted offers and activated reserve information including prices besides balancing state of the areas.

Outages: Within this domain, data regarding planned maintenances and forced outages in the grid is published.

Congestion Management: Data about actions taken to relieve overloaded parts of the transmission grid is published.

System Operations: Data about operational agreements and frequency quality is published.

Data Download Options from Transparency Platform

TP offers various ways of data export alternatives. In order to meet the different user needs, followings options are available:

Users who are interested in limited amount of data can directly use GUI export option. On the other hand, SFTP is suitable for bulk data downloads but data available on SFTP refresh once every hour. On the contrary, Restful API can serve for the ones who are interested in the most recent updates on data. But there are also some limitations apply to API requests in terms of the number of requests per minute, the number of files to be downloaded per request and the time window allowed for queries depending on the data item of interest. The last option, Data Repository Solution allows download up to 50 MB. Requests are processed in the background, asynchronously, without imposing a load on the platform through a preferred channel for communication (Web service or ECP). Finally, the platform allows users to subscribe for a data feed in which the platform pushes updates to the user’s endpoint through a web service or ECP channel.

Table 1. Overview of Download Options

Download OptionFile TypeData Updates
Web GUIxml, csv, xlsxAlmost real time
SFTPcsvEvery hour
Restful APIxmlAlmost real time
Data Repositoryxml (zipped)Almost real time
SubscriptionsxmlAlmost real time

Conditions for Use of Transparency Platform Data

Conditions for use of TP data is defined by the  Terms and Conditions  within a dedicated section as follows:

“In accordance with the applicable legislation, the Data User shall, when using of the Transparency Platform Data for any purpose whatsoever:

  • Use the Transparency Platform Data in good faith and always comply with good business practices regarding the re-use of publicly available data;
  • Mention the ENTSO-E Transparency Platform as the source of publication of the data, in accordance with good industry practices and comply with all reasonable requests from ENTSO-E regarding the visibility of the ENTSO-E Transparency Platform origin of the re-used Transparency Platform Data;
  • Be only allowed to make reference to the ENTSO-E Transparency Platform as the source of publication of the re-used data. It is therefore expressly prohibited to use the ENTSO-E Transparency Platform name or the ENTSO-E name in any manner that is likely to cause confusion regarding the possible existence of any kind of sponsorship or of endorsement of any use of the Transparency Platform Data by the Data User;
  • Not cause prejudice to the copyright or related right on a Transparency Platform Data, which may be owned by the concerned Primary Owner of Data. In case of a risk to cause prejudice to said right, the Data User shall seek the prior agreement of the holder of the copyright or related right. Notwithstanding this requirement, as a facilitation for the Data User, ENTSO-E publishes on the Transparency Platform and regularly updates the list of the Transparency Platform Data which can be freely re-used with no need to seek for the prior agreement of the respective Primary Owner of Data. The Data User has responsibility to check this list before each re-use of the Transparency Platform Data.”

Already developed methodology to detect outliers

In 2018, ENTSO-E members entered into a Memorandum of Understanding (MoU) that establishes requirements for the quality of the data provided by TSOs.

The initial proof of concept (PoC) done for Actual Total Load [6.1.A] Data Item has resulted in acceptable quality analysis results based on the Median Absolute Deviation(MAD) technique:

(https://www.academia.edu/5324493/Detecting_outliers_Do_not_use_standard_deviation_around_the_mean_use_absolute_deviation_around_the_median).

However, the further PoC extension for other Data Items has shown that the nature of their data is not always suitable for MAD analysis, an opportunity to apply new techniques (e.g. machine learning) arise.

Some more concrete examples where MAD technique didn’t show promising results:

Detected anomalies (mostly false positives) are marked in red:

Third Parties benefit from getting involved in the scenario

The applicants will have the chance to be part of the growing collaboration among TSOs-DSOs-Consumers by providing data services. The applicants will also achieve a good knowledge of the developing concepts and infrastructures in that field to better address their services with the growing market needs.

Incorporation of Third Parties for network operators and household consumers

The expected services and tools for data quality measurements will ensure that the exchanged data among players has high quality standards. It is very important for network operators to have a precise information before taking any data driven actions. The developed services and tools can help eliminating that risk. Moreover, also from the household consumers’ perspective, data quality checks will help avoiding any negative financial outcomes resulting from the non-realistic commitments due to unintended exchange of incorrect data.

Added value on OneNet project

As the energy markets evolves and develops, we are having more and more different types of players in the grid.  The number of players and the frequency of interaction among them increase very sharply with the growing share of distributed energy systems. These developments also challenge the grid operators from the operational point of view. Therefore, it is very crucial to have an effective communication among the increasing number of stakeholders. The services and tools to be developed under this scenario aims to address the need of data quality measurements to maintain high data quality standards among the players in order to eliminate any misleading information. Quality measurements will be performed regularly at the desired frequency in order to identify the suspicious exchanges and corresponding parties will be informed.