Slow Model Performance Over Large Forecast Datasets

0

Hey Everyone,
I’m working on redoing an old Excel system for forecasting production at an industrial mineral plant and I was hoping to get some advice on optimizing model performance.
The mine engineers output a data file for what they anticipate the site can produce in terms of tons and quality, its then brought into the model where that output file is translated (all simple equations) into human readable values. Where I’m running into problems is that these forecast sheets have daily data per machine, and there ends up being some very large DataLink tables. I’ve increased the RAM allotment to 8GB (it doesn’t appear to go higher than 3GB) and it looks like Quantrix is fully utilizing all 8 threads available to it on the i7 I’m running.
Does anyone else work on large data sets and have any advice on maximizing performance?
The two main equations its running are:

  1. a weighted average of the daily machine production to a total site value (sum for tons, w-avg for quality data)
  2. a weighted standard deviation for quality data

It runs those two equations to roll up production to both a standard week and monthly Quantrix Timelines.
I’d attach the model file, but forward looking mineral production statements tend to be very sensitive and I don’t have legal clearance to post it.
It appears to run quickly and normally with the smaller scale data sets I’ve used in the past, its primarily an optimizing for large data sets problem.

Edit:

I’ve added a picture of the equations, they’re weighted averages and standard deviations (there’s also a sum calculation for total tons but that’s simple and performant). The data comes into two tables in Quantrix which list tons per machine per day per site (for TonSource), and chemical quality per machine, per day, per site (StatSource). The calculation matrix is linked to the source tables through categories which auto generate based on the input (the using as statements). The Selectbetween statements are to relate the source tables’ date columns to the Timeline category stored in the Calendar matrix (Start and End are timelinedate() and start+7 respectively to for weekly “Other” ranges).

I’m also attaching this on weighted standard deviations since these aren’t the most common thing out there. We’re using weighted values because a given amount of production tonnage has a sampled chemical quality and when the tonnage outputs of several machines are aggregated together, they need to be based on the relative tonnage of each machine since they don’t produce in equal rates.

immeralcode Answered question May 6, 2019
0

Thanks for posting the formulas.  Those are impressive.  Nice Work!  I don’t think they should be overly burdensome to your calculation.  One thing you mentioned was

” The calculation matrix is linked to the source tables through categories which auto generate based on the input (the using as statements).”
Do you mean that you are using Dynamic Category Generation for these categories?  

If so I have seen some significant performance problems while using this feature on LARGE data sets.

braddo Posted new comment July 20, 2020

That is correct some categories, notably Products and Qualities, dynamically generate based on the input file. There’s currently a debate about expanding from two mineral product classifications to 3, and potentially adding items to categories. I was using the dynamic generation to try and future proof the model so i wasn’t adjusting it while thinks kick back and forth on those discussions.

You are viewing 1 out of 3 answers, click here to view all answers.