Fig. 6 | Scientific Data

Fig. 6

From: IDSEM, an invoices database of the Spanish electricity market

Fig. 6

Pipeline for creating the database. The process relies on several dictionaries of customers’ names and surnames, marketers’ and distributors’ information, name of villages and streets, and financial institutions. On the other hand, there are several document templates that are initially configured from real invoices. The pipeline includes three main steps: the first one simulates the contents of the bills and stores the labels in JSON files; the second one fills out the templates using the labels of the previous step; the third step converts the results to PDF format. The output is the database composed of a training and a test directory.

Back to article page