A little shocking discovery we had on the Java ThreadPoolExecutor implementation: our large training cluster would somehow never filled up with a specific kind of training task. The culprit was eventually identified in the ThreadPoolExecutor, which gracefully queued up all tasks. I always thought new threads will be created when the queue is not empty and the number of threads is less than the maxPoolSize. Obviously, I did not read the fine-print in JavaDoc. There were quite a few different strategies described in ThreadPoolExecutor: Direct Handoffs (never queue, always create new thread till reach max), Unbounded queues (always queue, and never reach max threads) and Bounded queues (queue to a certain size, then start ceating threads till max threads, then rejects).
My assumption, Direct Handoffs+Unbounded Queue (use as many threads till reach maxPoolSize, then queue up unbounded), was obviously not one of them, even though I would think this is definitely favourable strategy in most cases. Anyway, we implemented this using a customized LinkedBlockingQueue.
Lessons learned: (like dealing with the bank,) always read the fine-print;).