Case Study

Or: How one may come to love discovering horribly performing queries

Problem

Recently, a difficult case was presented. An administrator faced frustrated and impatient executives. Quarterly reports were unavailable to help the same make critical decisions for the business, a large Fortune 500 enterprise. When these reports were run, the server churned to a halt, and even reports that normally ran fast were impacted. Ultimately, the problem reports timed out without completion. Although other users were then able to run less intensive reports again, no amount of patience could grant the executives knowledge from the failed reports necessary to inform their decisions. The reporting system used new software, a front end provided by another large Fortune 500 company. Since the reports were all created by the third party company, the administrator knew little about their structure or design. The third party engaged but could not discover root cause nor suggest any workarounds to allow the quarterly reports to complete.

Identifying root cause

In the initial investigation it was confirmed there were no hardware problems on the server and the configuration appeared generally correct. No other obvious issues were discovered. Since the quarterly reports were already suspect, we captured a profiler trace from SSAS to see the underlying queries, and discovered there numerous queries belonging to the same report, but one in particular that actually caused the issue, failing to complete and running for a very long time. Because of this, no duration was reported in the trace, but using the query's Query Begin event, we were able to extract its MDX and parameter list.

Once the query was obtained, experimentation was done, but this first required de-parameterization. Parameters – expressed as <@ParameterName> in the captured query from the profiler trace – were replaced with their equivalent text as identified from the parameters XML in the trace. Then the de-parameterized version of the query was run in Management Studio. Clauses were systematically commented out, and ultimately one calculation was discovered as the culprit. When the calculation was substituted, simply returning 0, the query returned in a few seconds.

Gravity of the problem

Having identified the particular calculation in question, it was broken down to reveal that it had been significantly over-thought by the original developers at the third party company who created the reporting database as a vendor to the administrator experiencing the problem. The calculation was a morass. It called its own self recursively with each time period, for each successive period prior to itself in the time dimension of the database. Then with each recursively called period, the same calculation again recursively called each of that period's successive periods again. The complexity of this algorithm was not only exponential, but factorial! While a time dimension with only a few members might return relatively quickly, even a modest number of time periods would multiply execution time by hundreds or thousands of times. A slightly larger number of time periods as discovered, caused execution time of the calculation to exceed the age of the known universe, completion of which would tax the courage of even the most patient CEO.

Resolution

Fortunately, most calculations with factorial complexity do not actually require it and may therefore be redesigned with complexity much reduced. In this case, the complex calculation proved unnecessary. By reducing the summations being performed recursively in the excessively complex way, it became clear the purpose of the calculation was just to create a variation of a simple sum and difference of several underlying measure values, for each time period between the first period of its year and the current period (within the calculation's context). Essentially it was a Year-to-Date calculation, which a well-meaning developer at the vendor had tortured to death. Rewriting the query to perform a simple SUM() over the underlying base measures for this range of periods reduced complexity to linear. Even with a vast number of members, the time in which the calculation will execute will always be proportional to the number of time periods covered, and its performance will be very good (assuming some other new horrendous calculation is not invoked by one of the underlying measures of course).

In the end, the change seems small, compared to the improved productivity it afforded. The original calculation of factorial complexity:

member [Measures].[Queue Load] as
iif([Time].Currentmember is OpeningPeriod([Time].[(All)]),
	[Measures].[QL]
	,
	(
		([Time].Currentmember,[Measures].[QL]) +
		([Time].Currentmember.NextMember,[Measures].[Queue Load]) +
		([Time].Currentmember.NextMember,[Measures].[Outgoing]) -
		([Time].Currentmember.Nextmember,[Measures].[Incoming Adjusted])
	)
)


And the modified member of linear complexity:

member [Measures].[Queue Load] as
sum([Time].CurrentMember:OpeningPeriod([Time].[(All)]), [Measures].[QL])
	 + sum([Time].NextMember:OpeningPeriod([Time].[(All)]), [Measures].[Outgoing])
	 - sum([Time].NextMember:OpeningPeriod([Time].[(All)]), [Measures].[Incoming Adjusted])

Delivering the resolution

At this point in the investigation things were going very well. However, the de-parameterized test query on which experimentation occurred was still not ready for substitution back into the original report. The report was heavily parameterized, so the tedious process must be repeated in reverse, to substitute parameter values with their @ParameterName form back into the original query, to be used in the report. Or... one may use the SSAS Long Running Query Performance Helper as described in How to use.

How this tool can simplify such an investigation

The time extracting the problem query from the trace, de-parameterizing it, and re-parameterizing it subsequently to update the source report, would account for half an hour for someone experienced in this kind of investigation. For someone with less experience, it could take longer. The tedious task is error prone and frustrating, and simply not very enjoyable, so the tool automates it.

Last edited May 26, 2014 at 4:30 AM by jburchel, version 9