От: | stanislove | ||
Дата: | 19.05.09 12:28 | ||
Оценка: |
От: | bastrakov | http://bastrakof.livejournal.com/ | |
Дата: | 20.05.09 11:44 | ||
Оценка: |
7# Errors / Broken jobs
If a job fails, an entry is written to the alert log and a user trace file is created. Since we are all monitoring our alert log anyway (aren't we?!), monitoring job failure is relatively straightforward. The job is retried in 1 minute (if your execution interval was originally set to being greater than 1 minute), and then Oracle uses an binary exponential backoff algorithm until such stage as the backoff interval exceeds the original job submission interval.
If a job fails 16 times in succession, then it gets marked as BROKEN and it will no longer execution automatically. Whilst 16 times may sound fine, if your schedule is every 10 seconds, then all you need is 160 seconds worth of problems and your job is dead. Some options that can be used to avoid jobs getting into this state are:
* A second job that looks every 'n' minutes for jobs marked as broken with 16 failures, and tries to re-run them. (If a broken job can be run successfully, then it is no longer considered broken).
* Build your own binary exponential backoff routine into the job processing itself. When the job fails, double the execution interval using DBMS_JOB.CHANGE and then raise the error. In this way, you are giving the job more time to rectify itself.
От: | bastrakov | http://bastrakof.livejournal.com/ | |
Дата: | 20.05.09 13:11 | ||
Оценка: |
If a job returns an error while Oracle is attempting to execute it, Oracle tries to execute it again. The first attempt is made after one minute, the second attempt after two minutes, the third after four minutes, and so on, with the interval doubling between each attempt. If the job fails 16 times, Oracle automatically marks the job as broken and no longer tries to execute it. However, between attempts, you have the opportunity to correct the problem that is preventing the job from running. This will not disturb the retry cycle, and Oracle will eventually attempt to run the job again.
When you submit a job it is considered not broken.
There are two ways a job can break:
* Oracle has failed to successfully execute the job after 16 attempts.
If a problem has caused a job to fail 16 times, Oracle marks the job as broken.