I have an SSAS cube that is scheduled to refresh every 30 minutes. Basically it runs a single query on a remote server, gets the results back from this to the analysis services server and then redeploys the cube. The query takes around 30 seconds to run and the entire job usually completes in under 1 minute.
But sometimes the job can take a LOT longer than this. Every now and then it will run for around 2 hours then fail. Throughout this period the performance on the source database is noticeably affected, i.e. reports start timing out etc. I can see the query in Activity Monitor on the database server, copy the SQL, paste it into another query window, run it, wait for it to complete, come back to Activity Monitor and see the job is STILL running!
The remote job seems to stick at:
Task State = SUSPENDED;
Wait Type = IO_COMPLETION;
No locks.
Most of the time nobody else is logged into the database at the time this is failing.
I have tried the following:
- redeployed the cube;
- deleted the cube and redeployed it from scratch;
- changed the schedule of the job (a few times);
- tuned the query that executes;
- checked security, etc.
Here is an example of a typical log from this morning:
00:30:00 – 42 seconds
01:00:00 – 42 seconds
01:00:23 – job failed (but why was it run again 23 seconds after the last one started?)
01:30:00 - not run!
02:00:00 - not run!
02:30:00 - not run!
03:00:00 – 59 seconds
03:30:00 – 47 seconds
04:00:00 – 74 seconds
04:30:00 – job failed (no indication why or how long it took to fail, but I am guessing it took a while as the next two runs were missed)
05:00:00 – not run!
05:30:00 – not run!
06:00:00 – 83 seconds
06:30:00 – 46 seconds
07:00:00 – job failed (but no idea why and the next two runs were missed)
07:30:00 – not run!
08:00:00 – not run!
08:30:00 – 51 seconds
09:00:00 – 48 seconds
09:30:00 – 122 seconds
10:00:00 – 73 seconds
10:30:00 – 41 seconds
11:00:00 – 47 seconds
11:30:00 – 3,519 seconds!! (just long enough to miss the next schedule, but the job appears to have worked!)
12:00:00 – not run!
12:30:00 – 44 seconds
etc...