Datasets & Softwares

This is a comprehensive list of datasets and softwares published with open licenses under the GECKO project.

Living Lab Data Repository: Energy Consumption dataset in Germany 

Keywords: Energy consumption, Eco- feedback, Intelligent personal assistant, Home appliances, Smart homes


Repository: Github Repository
License: MIT License

Summary:
The Living Lab Data Repository in Germany offers a valuable dataset comprising anonymized information on appliance usage and energy consumption patterns. This resource is invaluable for researchers, and technology developers alike. Researchers can harness the dataset to build predictive models, gaining insights into energy usage behaviors and facilitating advancements in energy efficiency. Simultaneously, technology developers find inspiration in the dataset to innovate energy-efficient appliances and smart technologies, contributing to the ongoing global transition towards sustainable energy solutions. The intended audience for this dataset spans across academia, industry, and environmental advocacy groups. Academic researchers and data scientists delve into the dataset to conduct empirical studies and drive innovation in energy research. Meanwhile, engineers, innovators, and entrepreneurs utilize the dataset to develop cutting-edge technologies and products aligned with energy efficiency goals. Environmentalists and advocates find the dataset instrumental in promoting awareness and driving initiatives aimed at reducing energy consumption and mitigating environmental impact. Together, this diverse audience collaborates to harness the power of data towards building a more sustainable and energy-efficient future.

Requirements:
The repository contains a README.md file which contains the detailed information of the data. The format conversion method is provided in the repository with filename(format_convert.ipynb). Users can use it to convert original data to workable data. The data file contains all the data in CSV format.

Gecko background:
This ongoing project is the outcome of ESR’s 12 research efforts, centering around 11 living lab users located in Germany. As a significant outcome of the GECKO project, this dataset combines both quantitative and qualitative data, aligning closely with the project’s objectives. Our primary goal is to develop an eco-feedback system capable of fostering energy-saving behaviors and enhancing user awareness regarding energy consumption. To achieve this, we have requested each living lab user to identify 3-5 of their most energy-consuming appliances at home. By analyzing these energy usage patterns, our aim is to explore the most effective and efficient forms of eco-feedback.

Explainable Patent Classification: Explainable Deep Learning Models for Patent Classification

Keywords: Explainable AI, Explainable Text Classification, Layer-wise Relevance Propagation (LRP) 


Repository: Github Repository
License: Apache-2.0 license

Summary:
State-of-the-art methods for multi-label patent classification rely on deep neural networks (DNNs), which are complex and often considered black-boxes due to their opaque decision-making processes. In this repository, ESR13 implemented several novel deep explainable patent classification frameworks by introducing layer-wise relevance propagation (LRP) to provide human-understandable explanations for predictions. Later they trained several DNN models, including Bi-LSTM, CNN, and CNN-BiLSTM, and propagate the predictions backward from the output layer up to the input layer of the model to identify the relevance of words for individual predictions. Considering the relevance score, they then generate explanations by visualizing relevant words for the predicted patent class. they conducted experiments on two datasets comprising two-million patent texts demonstrate high performance in terms of various evaluation measures. The explanations generated for each prediction highlight important relevant words that align with the predicted class, making the prediction more understandable.

Requirements:
This source code repository can be used to implement any explainable text classification task. The requirements are listed as follows:
– Pretrained word-embedding model for semantic representation of your input text. It can be pretrained by Word2Vec, Glove etc. In our code, ESR13 used FastText pretrained model trained on Wikipedia texts.
– List of python packages in the code

Gecko Background:
This project is the outcome of a secondment of ESR13 with OWN GmbH, located in Berlin, Germany. Note that Own GmbH is one of the industrial partners in GECKO project. According to GECKO grant agreement ESR13 had a planned secondment with them. Since the PhD goal of ESR13 is to focus on human-centered Explainable AI, he worked with their team and collaborated to the project on patent classification task. ESR13 contributed to make the patent classification task explainable so that the prediction/decision can be explainable to the users. Later they presented the accepted paper on it in XAI2023 conference in Lisbon, Portugal. It has also been published by Springer.

Reference:
Shajalal, M., Denef, S., Karim, M. R., Boden, A., & Stevens, G. (2023, July). Unveiling Black-Boxes: Explainable Deep Learning Models for Patent Classification. In World Conference on Explainable Artificial Intelligence (pp. 457-474). Cham: Springer Nature Switzerland.

Plegma Lab DatasetDomestic appliance-level and aggregate electricity demand with metadata from Greece

Keywords: Scientific community and society, Energy and society, Energy efficiency, Energy management, Business and industry


Repository: Github Repository, Dataset Repository
License: CC BY 4.0

Summary:
The Plegma dataset is a practical resource aimed at improving energy efficiency and analyzing consumption patterns in homes. This dataset, resulting from a year-long study across 13 households in Greece, captures detailed electricity demand data at both aggregate and appliance-specific levels at high-frequency intervals of 10 seconds. Additionally, it includes environmental parameters, building characteristics, and sociodemographic information, offering a comprehensive view of energy usage. Its uniqueness lies in its focus on the Mediterranean region, providing insights into specific consumption behaviors influenced by local climate and lifestyle.

The dataset is structured to support various applications, including demand response and non-intrusive load monitoring (NILM), and comes in a format that is compatible with different software platforms. It is accompanied by open-source Jupyter notebooks for data preprocessing and visualization, making it accessible to a wide audience of researchers and practitioners. The Plegma dataset serves as a resource for developing energy-saving services and applications that can be customized to meet the needs of different regions and individual households.

By combining detailed energy usage data with contextual information, the Plegma dataset contributes to the ongoing efforts in sustainable energy research. It showcases the possibility of conducting detailed energy monitoring in real-life settings and provides a foundation for future research and innovations in the field of energy efficiency and management.

Requirements:
The data is provided in CSV format and therefore, is usable in most popular software packages, such as MS Excel, Matlab & SPSS, or any other programming language.
The Plegma_README file is a valuable resource that details the organization of the dataset, explaining the structure, naming convention, and specific contents of each file, which allows users to locate and utilize the data they need efficiently.

Gecko Background:
This project is the outcome of the ESR’s 14 research efforts, focusing on living labs, conducted in collaboration with Plegma Labs, National Technical University of Athens and University of Strathclyde. As a tangible outcome of the GECKO project, this dataset stands out for its dual nature, encompassing both quantitative and qualitative data, thereby affirming the project’s objectives. This work has been published to the Scientific Data Nature journal.

Reference:
Athanasoulias, S., Guasselli, F., Doulamis, N., Doulamis, A., Ipiotis, N., Katsari, A., … & Stankovic, V. (2024). The Plegma dataset: Domestic appliance-level and aggregate electricity demand with metadata from Greece. Scientific Data, 11(1), 376.

ELECTRIcity: An efficient Transformer for Non-Intrusive Load Monitoring

Keywords: NILM; non-intrusive load monitoring; transformers; energy disaggregation; imbalanced data; deep learning 


Repository: Github Repository
License: MIT License

Summary:
Non-Intrusive Load Monitoring (NILM) describes the process of inferring the consumption pattern of appliances by only having access to the aggregated household signal. Sequence-to-sequence deep learning models have been firmly established as state-of-the-art approaches for NILM, in an attempt to identify the pattern of the appliance power consumption signal into the aggregated power signal. Exceeding the limitations of recurrent models that have been widely used in sequential modeling, this paper proposes a transformer-based architecture for NILM. Our approach, called ELECTRIcity, utilizes transformer layers to accurately estimate the power signal of domestic appliances by relying entirely on attention mechanisms to extract global dependencies between the aggregate and the domestic appliance signals. Another additive value of the proposed model is that ELECTRIcity works with minimal dataset pre-processing and without requiring data balancing. Furthermore, ELECTRIcity introduces an efficient training routine compared to other traditional transformer-based architectures. According to this routine, ELECTRIcity splits model training into unsupervised pre-training and downstream task fine-tuning, which yields performance increases in both predictive accuracy and training time decrease. Experimental results indicate ELECTRIcity’s superiority compared to several state-of-the-art methods.

This code repository provides the code to reproduce the paper “ELECTRIcity: An efficient Transformer for Non-Intrusive Load Monitoring”

Requirements:
The repository contains a README.md file, which contains all necessary information to execute the code.

Gecko Background:
The software has been used to develop the aforementioned model, which resulted in a journal publication from ESR4.

References:
Sykiotis, S., Kaselimi, M., Doulamis, A., & Doulamis, N. (2022). Electricity: An efficient transformer for non-intrusive load monitoring. Sensors, 22(8), 2926.

Electricity consumption measurements from three dairy farms in Germany

Keywords: NILM, industrial NILM, energy disaggregation, sustainable farming 


Repository: University of Strathclyde Portal
License: CC BY 4.0

Summary:
The dataset contains energy consumption data from three dairy farms located in Germany. The data is recorded for a period of one year, with a sampling period of 10 seconds. High consuming appliances commonly used at dairy farms, such as milking robots, compressors, vacuum and water pumps, and cleaning equipment are present in the dataset. The dataset is intended to be used for research related to industrial non-intrusive load monitoring, demand-response analysis, analysis of appliance usage, deriving consumption and time of use statistics, energy conservation advice, etc.
Data collection was performed by a GECKO project partner, Discovergy GmbH.

Description of how the dataset/software can be used and what technical requirements are needed:
Available recordings are stored in .csv files and can be accessed using any CSV reader.

Description of how the dataset/software has been used as part of GECKO and what we learned from it:
The dataset was used by ESR3 for a mini-project on deep learning-based non-intrusive load disaggregation on dairy farms to support transition to sustainable farming. Transfer of trained deep learning models between different farms has also been investigated. A paper resulting from this study has been presented at 2022 IEEE International Conference on Smart Computing (SMARTCOMP).

Link to paper: IEEE International Conference on Smart Computing (SMARTCOMP) 2022 Paper

A Weakly Supervised Active Learning Framework for Non-Intrusive Load Monitoring 

Keywords: Non-Intrusive Load Monitoring, Deep learning, Weak Supervision, Active learning 


Link to Dataset/Software: Github Repository
License: Creative Commons Attribution 4.0 International

Summary:
Training of deep learning models in a supervised manner relies on availability of large amounts of labelled data, which is often hard to collect. Often, unlabelled data collection is straight-forward, but labelling is challenging – because of needed time, or expertise, or expensive equipment. Active learning approaches are designed to iteratively examine available unlabelled data, and extract only the most informative samples from them. Then, only the extracted samples can be labelled and used for incremental training of models, instead of labelling and using all available data, without compromising performance. This framework was designed as a part of research on weakly supervised active learning for non-intrusive load monitoring application, but it can be adjusted to other deep learning models for various kinds of tasks, not limited to non-intrusive load monitoring or weak labelling.

Requirements:
RProvided implementation is in Python programming language. Different deep learning models can be embedded in the active learning loop. *Note: repository consists of 2 parts: a weakly-supervised deep learning non-intrusive load monitoring model, and active learning framework. Only the active learning framework was created by GECKO ESR3.

Gecko Background:
The framework was used by ESR3, for a mini project in collaboration with Giulia Tanoni, where use of weak instead of strong labels inside an active learning framework was investigated, for the problem of multi-label classification in non-intrusive load monitoring. The resulting paper is submitted to Integrated Computer Aided Engineering journal.

Co-design – Part 1: Workshops to explore current imaginaries behind smart home technologies development and use

Keywords: co-design; workshop, smart technology, imaginary, multi-stakeholder 


Link to Dataset/Software: Download from Zenodo
License: Creative Commons Attribution 4.0 International

Summary:
This dataset represents the data collected during a series of independent in-person workshops with professionals developing smart technology, its early-adopters, and late/non-adopters. This data collection has been part of a PhD research project, but can be further explored by qualitative researchers interested in human factors, human-computer interaction, IoT adoption, and analysis of human relationships with technology.

Requirements:
The referred dataset is preferably used with qualitative methods, such as thematic analysis or grounded theory. The dataset is accessible using any word processing software (e.g., OpenOffice, Microsoft Word) and picture visualizing/processing software (e.g., Windows Photo Viewer, Mac Preview).

Gecko Background:
This qualitative dataset is the first part of ESR10’s study on co-designing smart home technologies. All data was collected and analyzed in four consecutive parts following a multi-stakeholder co-design process on responsible and just smart home technologies. The current dataset, Part 1, was important to characterize the dominant imaginaries shaping the design of and interactions with smart home technologies. The data collected during the subsequent parts of the referred study are also available at Zenodo.

Co-design – Part 2: Workshop with professionals, early-adopters, and late/non-adopters to design interventions for a more responsible and just future with smart home technologies

Keywords: co-design, workshop, smart technology, responsible research and innovation, design justice 


Link to Dataset/Software: Download from Zenodo
License: Creative Commons Attribution 4.0 International

Summary:
This dataset represents the data collected during a series of two in-person workshops: one with professionals developing smart technology and its early-adopters, and a second one with late/non-adopters of smart technology. The first workshop had four groups of participants and the second three groups. The data is divided by each group. This data collection has been part of a PhD research project but can be further explored by qualitative research interested in human factors, human-computer interaction, IoT adoption, and analysis of human’s relationships with technology.

Requirements:
The referred dataset is preferably used with qualitative methods, such as thematic analysis or grounded theory. The dataset is accessible using any word processing software (e.g., OpenOffice, Microsoft Word) and picture visualizing/processing software (e.g., Windows Photo Viewer, Mac Preview).

Gecko Background:
This qualitative dataset is part of ESR10’s study on co-designing smart home technologies. All data was collected and analyzed in four consecutive parts following a multi-stakeholder co-design process on responsible and just smart home technologies. The current dataset, Part 2, was important to understand how co-design methodologies can affect the imaginaries behind smart home technologies. The data collected during the previous and subsequent parts of the referred study are also available at Zenodo.

Co-design – Part 3: Focus group with professionals, early-adopters, and late/non-adopters to better detail each intervention designed, who would be involved, and how it would happen

Keywords: co-design, focus group, smart technology, systemic change, transition


Link to Dataset/Software: Download from Zenodo
License: Creative Commons Attribution 4.0 International

Summary:
This dataset represents the data collected during a series of one-on-one online semi-structured interviews with professionals developing smart technology, its early-adopters, and late/non-adopters. The interviews were conducted via Microsoft Teams. This data collection has been part of a PhD research project but can be further explored by qualitative researchers interested in human factors, human-computer interaction, IoT adoption, and analysis of human’s relationships with technology.

Requirements:
The referred dataset is preferably used with qualitative methods, such as thematic analysis or grounded theory. The dataset is accessible using any word processing software (e.g., OpenOffice, Microsoft Word).

Gecko Background:
This qualitative dataset is part of ESR10’s study on co-designing smart home technologies. All data was collected and analyzed in four consecutive parts following a multi-stakeholder co-design process on responsible and just smart home technologies. The current dataset, Part 4, was important to understand what the implications of co-design for the development of responsible and just smart home technologies would be. The data collected during the previous and subsequent parts of the referred study are also available at Zenodo.

Smart meter electricity of a Household in Germany with Electric Vehicle Charging Annotation 

Keywords: NILM, EV, energy disaggregation, transportation, three-phase


Link to Dataset/Software: University of Strathclyde portal
License: Creative Commons Attribution 4.0 International

Summary:
Energy consumption data of a single household in Germany, in which an electric vehicle charger is present, for a period of 1 year, timestamped and sampled at 1 minute intervals. This dataset is intended to be used for research into energy conservation and advanced energy services, ranging from non-intrusive appliance load monitoring, demand response measures, tailored energy and retrofit advice, appliance usage analysis, consumption and time-use statistics and smart home/building automation. Data collection was performed by a GECKO project partner, Discovergy GmbH.

Requirements:
Available recordings are stored in .csv files and can be accessed using any CSV reader.

Gecko Background:
The dataset was used by ESR2 for deep learning-based non-intrusive load disaggregation at the residential sector to support electrification of transportation transition. Transfer learning using deep learning models have been used to demonstrate the ability of disaggregating EV load from unseen households. Further to that the dataset was used for lifecycle assessment of electric vehicles across different parts of the UK and Europe based on actual energy data and end-users’ practices. Two papers were published one in Energies, MDPI and one in Transportation Research Procedia, Elsevie.

The dataset was used by ESR2 for deep learning-based non-intrusive load disaggregation at the residential sector to support electrification of transportation transition. Transfer learning using deep learning models have been used to demonstrate the ability of disaggregating EV load from unseen households. Further to that the dataset was used for lifecycle assessment of electric vehicles across different parts of the UK and Europe based on actual energy data and end-users’ practices. Two papers were published one in Energies, MDPI and one in Transportation Research Procedia, Elsevier.

References:
Vavouris, A., Stankovic, L., & Stankovic, V. (2023). Integration of drivers’ routines into lifecycle assessment of electric vehicles. Transportation Research Procedia, 70, 322-329. https://doi.org/10.1016/j.trpro.2023.11.036

Vavouris, A., Stankovic, L., Stankovic, V., & Shi, J. (2022). Benefits of three-phase metering for load disaggregation. In Proceedings of the 9th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation (pp. 393-397). (BuildSys ’22). Association for Computing Machinery (ACM). https://doi.org/10.1145/3563357.3566149

Appliance Phase Identification on ECO Dataset

Keywords: NILM, energy disaggregation, three-phase  


Link to Dataset/Software: University of Strathclyde portal
License: Creative Commons Attribution 4.0 International

Summary:
A supplementary material to the ECO Dataset [1]. Information about the power phase on which each appliance is connected to is included. [1] Christian Beckel, Wilhelm Kleiminger, Romano Cicchetti, Thorsten Staake, and Silvia Santini. 2014. The eco data set and the performance of non-intrusive load monitoring algorithms. In Proceedings of the 1st ACM Conference on Embedded Systems for Energy-Efficient Buildings (BuildSys ’14). Association for Computing Machinery, Memphis, Tennessee, 80–89. isbn: 9781450331449. doi: 10.1145/2674061.2674064

Requirements:
Available recordings are stored in .csv files and can be accessed using any CSV reader.

Gecko Background:
The dataset was used by ESR2 for deep learning-based non-intrusive load disaggregation for the residential sector to increase the performance of ML methods exploiting the information of three-phase loads. The dataset augments a pre-existing public dataset with additional information. One paper was published in ACM BuildSys’22.

References:
Vavouris, A., Stankovic, L., Stankovic, V., & Shi, J. (2022). Benefits of three-phase metering for load disaggregation. In Proceedings of the 9th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation (pp. 393-397). (BuildSys ’22). Association for Computing Machinery (ACM). https://doi.org/10.1145/3563357.3566149