I’m working on redoing an old Excel system for forecasting production at an industrial mineral plant and I was hoping to get some advice on optimizing model performance.
The mine engineers output a data file for what they anticipate the site can produce in terms of tons and quality, its then brought into the model where that output file is translated (all simple equations) into human readable values. Where I’m running into problems is that these forecast sheets have daily data per machine, and there ends up being some very large DataLink tables. I’ve increased the RAM allotment to 8GB (it doesn’t appear to go higher than 3GB) and it looks like Quantrix is fully utilizing all 8 threads available to it on the i7 I’m running.
Does anyone else work on large data sets and have any advice on maximizing performance?
The two main equations its running are:
- a weighted average of the daily machine production to a total site value (sum for tons, w-avg for quality data)
- a weighted standard deviation for quality data
It runs those two equations to roll up production to both a standard week and monthly Quantrix Timelines.
I’d attach the model file, but forward looking mineral production statements tend to be very sensitive and I don’t have legal clearance to post it.
It appears to run quickly and normally with the smaller scale data sets I’ve used in the past, its primarily an optimizing for large data sets problem.
I’ve added a picture of the equations, they’re weighted averages and standard deviations (there’s also a sum calculation for total tons but that’s simple and performant). The data comes into two tables in Quantrix which list tons per machine per day per site (for TonSource), and chemical quality per machine, per day, per site (StatSource). The calculation matrix is linked to the source tables through categories which auto generate based on the input (the using as statements). The Selectbetween statements are to relate the source tables’ date columns to the Timeline category stored in the Calendar matrix (Start and End are timelinedate() and start+7 respectively to for weekly “Other” ranges).
I’m also attaching this on weighted standard deviations since these aren’t the most common thing out there. We’re using weighted values because a given amount of production tonnage has a sampled chemical quality and when the tonnage outputs of several machines are aggregated together, they need to be based on the relative tonnage of each machine since they don’t produce in equal rates.
I’m putting this in an answer because I’m not seeing any of my comments actually posting to your answer thread.
That is correct I did use dynamic categories for Products and Qualities due to some flux about potentially expanding to the third product and adding more quality parameters to sample analysis. I was attempting to future proof the application a bit by making those dynamic to the imported data table, but I’ll try configuring a model without those to see if there’s any improvement.
Thanks for posting the formulas. Those are impressive. Nice Work! I don’t think they should be overly burdensome to your calculation. One thing you mentioned was
” The calculation matrix is linked to the source tables through categories which auto generate based on the input (the using as statements).”
Do you mean that you are using Dynamic Category Generation for these categories?
If so I have seen some significant performance problems while using this feature on LARGE data sets.
Notice: compact(): Undefined variable: groupby in /home/quantrixwebsite/public_html/answers/wp-includes/class-wp-comment-query.php on line 853
I have worked with some large models. I recommend you get more RAM on your machine. I currently have a machine with 32GB of RAM and I adjust Quantrix to have full access to all of it, if necessary. I am curious to see the formulas for your 2 main equations, can you post a image of them?