So, it was day in the hackshop and our programming sysadmin was trying to fix a little display problem when showing thermometer data.

So, there are thermometers in both Fridges, Freezers, heating systems and others.
For alerts, you set a level (trigger level) and a time (count of samples) and simply trigger if the last X have been above the threshold level.

Not a big problem, simple, elegant, linear speed.

But when you show data, for example the "current temp, and this days max temp", and not working "real time" you need to look at a whole data-set of points, find the max, and show it. This isn't a difficult operation O(n log n) with a suitable sort operation.

Now, in real life, fridges and freezers have thermostats that cause temperatures to show up more like triangle waves than anything else. This is easy to deal with (you simply don't deal with it, see how easy!).

Then there are things like "automatic defrost".

Automatic defrost means that every X hours, the temperature in a freezer will spike up to ~+1 degree, and then down to -20 again, in a short time span ( ~35 minutes from start to end ). This then prevents frost from forming as badly as normally.

This also means that the day current/max temperatures _always_ are stuck at -20,+1 . A big gap, and not at all representing reality. Or well, representing, sort of how Westrbo Baptist Church represents Christianity.

So, we need to show only those high points, that have been high for a period of time T.

So, "max" for the day is the highest temperature V that has been at this point, or HIGHER for T samples. It's not enough that it's been T samples _total_ for the day, but they need to be consecutive.

The graphical Method

=================
This is unknown in complexity, but could work. It's a good visualization, but I haven't gotten this to actually work.

* You draw your graph
* You draw a line along the max value.
* You move your line down,
* You find all intersections
* You trace the distance between your intersections.
* When your distance between intersections becomes > T, and the graph stays above it, you have found your first match.

The Mold in the Hills Method

=======================
* Pick a Threshold
* Start walking the list until you reach the threshold.
* Mark this point, walk forwards until you reach the next
* if this point is > T points away, you have a hill.
* Destroy all points T as you walk forwards to the next hill.
* Once you only have hills left, increase your threshold
* Keep destroying all data until you have no hills >T wide left. Done!

The window to the World Method

===========================
This is a bit quadric bruteforce version.

Point a window at the first T values.
* Sort a copy of this window
* Take the min & max values. The Min is your new highscore.
* re-center your window to start one step right of the Min value.
* sort your window
* compare min with your highscore, if higher, move one step right of your min, and store highscore.
* If your max is lower than your highscore, jump T units ahead.
* if your min is lower than your highscore, jump to the value closest to your highscore.

This method will take a ton of copies, and lots of sorting of your world.

The "I like lists" Method

===================
Aka. I have no idea what I'm thinking anymore method. (never tried, might work, pigeonhole-principle )

* Prepare a copy of the first T values.
* find the min of that copy, that is your highscore.
* Center your world on the highscore ( skipping values forwards )
* check if your lowest point is still the highscore
* * if so, skip to one right of highscore, start looking forwards.
* if not, re-center your area around the lowest.
* once your lowest is higher than highscore, you have a jackpot and move on.

* walk forwards, keeping copies of the last T values.

Dataset

======
So, if you want to try your hand at an elegant solution to this problem, please be my guest. A dataset can be downloaded from us.

Comments

So if I'm not mistaken, you need a data structure with the capabilities "insert datapoint", "remove datapoint", and "find min" (or max)? Then you'd just use this with a "rolling window" (insert item x+T, remove item x-1, see if item x equals the current min)? I'm pretty sure a balanced search tree would do it. Probably a skip list too. In both cases, the expected behaviour should be something like O(n log T).

By the way, I make no claims about real-world efficiency/relevance of the above. I just treated it like a cute theory problem. :-)

That's not a bad solution, though it means at least one linear search, and then a parse through the tree + compare, it might still work a lot better than the methods shown above (Which are all theoretically quadric)

Thinking more about this data-structure. In a dual-linked queue, a insert is O(1) and delete is O(1) but find min/max is n log n or so, meaning that we totally end up with a quadric equation. Is there really a data-structure that both allows fast inserts&deletes _and_ has easy to access min&max values? All the ones I'm trying sort of end up with nasty effects..

Like I said, either self-balancing search trees (AVL trees or red-black trees) or skip lists. Neither exists natively in Python, though they surely exist as libraries. Skip lists are said to have easier code. (The TreeSet/TreeMap classes in Java are implemented as red-black trees, apparently.) In all of these, insert, delete, min, and max are all O(log n) time. But yes, I was also surprised at how heavy artillery was needed just for that combination (insert, delete, min/max).

What about using signal processing and maybe a library like numpy? It seems you only need to filter out your defrosting pattern. Your max on a sample of time also looks like a filtering problem (a convolution of the signal with an appropriate window). Maybe it is time to reopen a signal processing book? :)

@Magnus: Well, doing a full state iteration ( pos 0 => pos N) and for every step doing insert, pop, max ( a python list has an n-log-n min/max/sort operation as well for this input data (that's almost always in-order or in reverse order. )

@Jul Yes, it seems like what's needed seems to be a low-pass filter with a frequency band of T time units. So yeah, looks as if it's time to head to the library for a solution. Never thought I'd do FFT on temperature signals....

You're just taking the maximum of the N-lookback rolling minimum across the time series. Rolling min/max can be computed for the series in O(n) time and O(k) space using the ascending minima algorithm. http://richardhartersworld.com/cri/2001/slidingmin.html

Just to clarify my variables, n is the length of the series, and k is the length of the window.

Stuff samples into array with T elements, create a min-heap (size T) of the array values. Each new sample replaces the oldest sample in the array and also updates the heap while at the same time checking if the saved max value needs to be updated. Total time complexity: O(n log T). Memory used: O(T). I hacked up something here: https://github.com/art4711/maxminsample

Seems pretty trivial, doable in O(N*T) worst case if I'm not mistaken. The challenge as stated is essentially "find the slice of length T that has the highest minimum". I'd start by looking at slice s1, namely the first T values from the start of the list. Find the minimum of that slice. The next candidate slice s2 starts just to the right of the minimum element of s1, and so on. (in other words: we know we can skip forward until s1's minimum value falls out of the slice. As long as it's in the mix we can't improve the minimum.)