[Turn] tf.train.MonitoredTrainingSession () parse

Original Address: https://blog.csdn.net/mrr1ght/article/details/81006343 . This article has deletion.

MonitoredTrainingSession defined

First, tf.train.MonitorSession () understood from the literal meaning of the word is the answer for monitoring the training, the return value is an instance of Object tf.train.MonitorSession () class, tf.train.MonitorSession () will be speaking at the following .

MonitoredTrainingSession(
    master='',
    is_chief=True,
    checkpoint_dir=None,
    scaffold=None,
    hooks=None,
    chief_only_hooks=None,
    save_checkpoint_secs=600,
    save_summaries_steps=USE_DEFAULT,
    save_summaries_secs=USE_DEFAULT,
    config=None,
    stop_grace_period_secs=120,
    log_step_count_steps=100

Args:

  • is_chief: A distributed system, the system is used to determine whether the chief, If True, it is responsible for initializing and restore the underlying TensorFlow session. If False, it will wait chief initialize or restore TensorFlow session.
  • checkpoint_dir: A string. A designated checkpoint file path for recovery variable.
    scaffold: used to collect or establish a scaffolding support of the operation. If not specified, the default will create a default scaffold. It is used to complete the chart
    hooks: An optional list SessionRunHook objects. SessionRunHook can define their own objects, can also be a good SessionRunHook objects are predefined, such as: tf.train.StopAtStepHook () to stop training setting conditions; tf.train.NanTensorHook (loss): if the value of loss Nan stop training;
    chief_only_hooks: list SessionRunHook object. If is_chief == True, these hooks is activated, otherwise it is ignored.
    save_checkpoint_secs: the default frequency Saver checkpoint save checkpoint (in seconds). If save_checkpoint_secs set to None, not saved checkpoint.
  • save_summaries_steps: Use the default summaries saver will write a summary of disk frequency (expressed in the global number of steps). If save_summaries_steps and save_summaries_secs are set to None, the default summaries saver is not used to save summaries. The default is 100
  • save_summaries_secs: Use the default summaries saver will be written to disk summary of frequency (in seconds). If save_summaries_steps and save_summaries_secs are set to None, no default summary save. Not enabled by default.
  • config: Examples of configuration of the session tf.ConfigProtoproto. It is config constructor arguments of tf.Session.
    stop_grace_period_secs: Calling close (the number of seconds after stopping the thread).
    log_step_count_steps: Global recording step frequency global number of steps / sec

Returns: a · MonitoredSession (·) instance.

tf.train.MonitoredSession () Example using

saver_hook = CheckpointSaverHook(...)
summary_hook = SummarySaverHook(...)
with MonitoredSession(session_creator=ChiefSessionCreator(...),
                      hooks=[saver_hook, summary_hook]) as sess:
    while not sess.should_stop():
        sess.run(train_op)

Args:

  • session_creator: the development of ChiefSessionCreator used to create the answer
  • List tf.train.SessionRunHook () instance: hooks

Returns: a MonitoredSession instance.

 

  • Initialization : When you create a MonitoredSession, does the following order:

    • Call [Hooks] list, begin each of Hook () function
    • By scaffold.finalize () to complete the definition of the graph of FIG.
    • Create Session
    • Scaffold provided with the initialization operation (op) to initialize the model
    • If there is a given checkpoint_dir checkpoint file, then use the variable checkpoint recovery
    • Start the queue thread
    • 调用hook.after_create_session()
  • RUN : When calling run () function, the following sequence

    • 调用hook.before_run()
    • Call TensorFlow with fetches combined and feed_dict session.run () (here is the real calling tf.Session (). Run (fetches, feed_dict))
    • Call hook.after_run ()
    • Return session.run user needs () results
    • If AbortedError or UnavailableError occurs, the execution run again () before restoring or re-initialize the session
  • Exit: When you call close () to exit, do the following in order
    • Call hook.end ()
    • Closed session and the session queue thread queuerunners
    • In the context of the monitored_session, processes all suppress erroneous input OutOf Range thrown.


MARSGGBO Original





2019-10-21 11:23:38



Guess you like

Origin www.cnblogs.com/marsggbo/p/11712591.html