|
Dear All,
I had a 'hanging' linode which did not trigger Lassie. Without external service monitoring, this is troublesome because unable to detect, it might take a while until its reported/noticed.
If it would be possible to analyze the Linode CPU load, network and I/O graph data for anomalies (i.e. >95% or <5%) over a sample period (i.e. 30 mins) it would be great to get a Linode manager e-mail, alternatively to also initiate the Lassie reboot.
In my case of a 'hanging' system, the IO dropped from a steady 0.1 blocks/s to zero, the outgoing traffic dropped from avg 25kB to 0.4KB, the CPU load from avg 8% to 4%. I wish I had a trigger on the IO monitoring data.
I am not sure if this is feasable, it might involve some tricky coding. The benefits for uptime and continous operation would be huge, help us avoid cost&complexity of clustering.
Although rare (I haven't had such a case in a long time), chances are other fellow linode users might experience similar effects once in a while.
Thank You for consideration, Frank
|