Automated Data Refresh Validation in Power BI — What, Why, and How

Using Power Automate to validate Power BI data refreshes



Generated with AI (Microsoft Designer)

Every BI team spends considerable time and effort on validating data. While these efforts are justified and necessary at the beginning of a project or when a new data source is brought in. However, such efforts must be minimized for an already established reporting solution to ensure the team can focus on other essential activities. One solution to reduce validation efforts is to automate the validation.

This article will present the concept of refresh assessment for Power BI in complex projects. We will also briefly discuss some manual data validation approaches, highlight their shortcomings, and why a BI team should consider automating data validation for their reporting solutions.


Before we talk about refresh assessment and automation, let’s look at some processes and problems that an analytics team has on their hands.

Data refresh and the dilemma of BI teams

In Power BI, data refresh applies to import or mixed mode datasets (rebranded as semantic models) and refers to updating the data stored in the dataset from the data sources. Depending on the specifics and requirements of the reporting solution, the data refresh can be scheduled or ad-hoc. In complex reporting solutions, it’s a common requirement that a subset of tables be refreshed on different dates in a month.

The data refresh process varies from team to team and depends on the overall data architecture and components involved. A generic data architecture looks like this: Power BI sources the data from a SQL Database (Azure, Synapse, on-prem, etc.) and may connect directly to files such as CSV, Excel, etc., hosted on OneDrive or SharePoint. Azure Data Factory moves data from the data lake to the SQL Database.

A generic data architecture (Image by author)

In an ideal world, this is where the story ends. On the day of a refresh, your pipelines ran perfectly, the data was refreshed in Power BI, the users didn’t complain, and everyone lived happily ever after. 🙂

Alas, things are messier in the real world

  • maybe your pipelines failed or didn’t run completely, or
  • facts were loaded correctly, but the dimensions didn’t update or
  • you refreshed your Power BI, and data disappeared from some visuals or
  • the numbers don’t make any sense, or they’re wrong altogether

Numerous factors can go wrong on the day of refresh. Whatever the reason, it distracts the team from focusing on other important tasks. The trouble increases manifold if your data sources are unreliable and data changes often unpredictably, such as manual files or a non-cooperative data source team.

In such scenarios, you can’t refresh your Power BI without validating the data and numbers before they reach Power BI. If you refresh Power BI, then you find out the issues. It’s already too late. At this junction, the team has to address some crucial questions, such as

  • Should the team focus on delivering new changes and features or validating data with each refresh?
  • How do we maintain user’s trust in our reports without investing too much effort in data refresh and validation activities?

The solution to these questions is twofold: refresh assessment and automation.

Refresh Assessment

The refresh assessment aims to analyze the impact of data refresh on our Power BI reports. For example, assume we have a sales & marketing dataset that tracks multiple KPIs across the department. The dataset contains the correct data and shows the right KPI values. Let’s call it “ideal state A”.

Before the next refresh, we will assess the impact of the data refresh on our dataset. If the effect looks positive and as expected and the dataset will be in the “ideal state B” after the refresh, we will refresh the dataset. Otherwise, the team will investigate/validate the issues and data, and refresh activities will be paused until all the problems have been resolved. In other words, refresh assessment is validating your data before refreshing the Power BI dataset.

For complex reporting solutions, refresh assessment is a multi-step approach. This two-step approach is necessary rather than a choice as it addresses different parts of a data validation process.

Multi-step refresh assessment approach (Image by Author)
  1. At the data source (DB or files): To confirm the availability of new and historical data, especially when unreliable or manual data sources are involved. These checks are performed to ensure all the tables have been loaded correctly and data is available for all the reporting dimensions. Basic data profiling and aggregation can achieve these goals with the help of an SQL query, a pivot table, etc.
  2. In Power BI: To confirm the correctness of KPI values because Power BI is where all your calculation and reporting logic is built. Checks at the source alone are insufficient and fail to show what the KPIs will look like after the refresh. This step may require a comparison of “now” (current data) with the “will-be” (data after refresh) state of the dataset.

Let’s look at a real-life example to understand the process and efforts better. We have a “Unified Growth Monitor” dataset and report that tracks several KPIs across the organization with the following notable characteristics:

  • 33 KPIs
  • 18 diverse data sources: Automated (on-prem DB, Synapse) and Manual files (CSV and Excel)
  • New data is available at different intervals: Monthly on various workdays (for example, on W1, W7, W14, etc.) and Quarterly for different KPIs
Unified Growth Monitor reporting solution (Image by author)

And yes, our data model is not a star schema. A star schema can’t support such a crazy reporting solution.

Refresh Process with Manual Refresh Assessment

The days when we refreshed our data were chaotic and required multiple team members to monitor, assess, and validate the pipelines and data. Before we talk about our refresh process, let’s lay down some important business and technical requirements:

  • All the KPIs from a data source refresh at the same time
  • Multiple data sources can be refreshed at the same time
  • A data source may be required to be refreshed more than once a month on demand

On the day of refresh, our Data Engineer keeps an eye on the pipelines and ensures that they have run successfully and entirely. Once the data is available in the database, our Quality Analyst (QA)performs a data profiling and aggregation check to ensure that data is available in all the facts and dimensions. Additionally, QA also validates the manual files. In the best-case scenario, when no data issues are observed, our QA spends around an hour per data source on these checks.

Unified Growth Monitor: Manual refresh assessment (Image by author)

With a green light from the QA, our Power BI Developer (PBI Dev) performs the second part of the assessment. PBI Dev creates a replica of our production dataset and uploads the replica to the service. Then, he does a complete or selective table refresh on the replica. Once the data is refreshed, he validates the latest and historical KPI values. We refresh the production dataset if all the KPIs and data look positive. On average, our PBI Dev spends around an hour and a half per data source in the best-case scenario when no data issues are observed.

While a two-step refresh assessment process can prevent faulty data from reaching production, most of the time. Nevertheless, it has several shortcomings.

Why doesn’t a manual approach work?

A manual refresh assessment process struggles with all the challenges any other manual process has to deal with, such as being time-intensive with higher turn-around times, prone to errors, and rigid with a limited scope. Some additional challenges make a manual approach inefficient.

  • Validation of dependent data across different sources can be challenging: Suppose you have facts in a file and must use conformed dimensions from a database. Or if you must use alternate dimensions from a file while your facts are in the database.
  • Complex queries to aggregate data and calculate KPIs
  • Extra time and efforts to update the validation queries: Your data model will change with time; you may have a new dimension table, or the calculation of a KPI has evolved. The team has to spend extra effort to update and test the queries.
  • Complexity and turn-around time increases with the number of dimensions
  • Access restrictions or limitations against the data source: Maybe you don’t have direct access to the data source or have other limitations that prevent you from accessing the data source.

A practical solution to most challenges is to bring automation to your refresh assessment.

Automated Refresh Assessment

Before discussing the technical details of automating our refresh assessment, let’s examine how we benefited from automation in our “Unified Growth Monitor” reporting solution.

Refresh Process with Automated Refresh Assessment

Unified Growth Monitor: Automated refresh assessment (Image by author)

Our Data Engineer monitors our pipelines as usual. However, we have freed our QA to focus on other tasks unless we identify any issues with our data. Our BI Dev triggers the Power Automate-based refresh assessment. Review the differences between the “now” and “will-be” state of the dataset. If everything looks as expected, then we will refresh the production dataset.

All of it takes around 10 minutes per refresh. So whether it’s one data source or all 18 together, 10 minutes and we’re done. In the best-case scenario, of course. If there are any issues, then verbose highlighting rules assist our PBI Dev with identifying areas to investigate further.

How do we automate refresh assessments?

In our scenario, automation is based on the fact that your Power BI dataset contains all the relationships, calculations, and data you need to validate your data. Power BI is also a great reporting tool to visualize and highlight the validation results. Finally, Power Automate provides fantastic no-code, low-code automation possibilities with out-of-the-box support for Power BI.

Automating refresh assessment is a four-step process:

Steps to automate refresh assessment (Image by author)

Each step consists of multiple actions, but that’s all for this article. So grab a coffee and head to the following article, where we will look at detailed steps to set up a simple automated refresh validation solution.

Conclusion

To maintain user trust, validating your data before refreshing your Power BI reports is critical. However, the team must balance the efforts to validate the data and focus on new deliveries. Most teams have a manual approach to validate their data refreshes with severe limitations and shortcomings. Automation can help teams save time and effort while improving the quality of their data refresh assessments.


Leave a comment