活動
名稱	2018 台灣人工智慧學校校友年會（議程）
日期	2018.11.17 (六)
地點	中央研究院台北南港

Curiosity-driven Exploration by Self-supervised Prediction

環境獎勵稀少的強化學習問題往往會面臨學習很沒有效率的困難，其中一種改善方法是設定內部獎勵機制來引導 agent 多多探索未知的部分，進而提高成功的機會。文章提出 Intrinsic Curiosity Module（ICM）架構用以計算內部獎勵。ICM 包含兩部分：（1）Forward dynamics model：學習根據目前狀態和採取的動作預測新的 featurized state，以及（2） Inverse dynamics model：學習看先後兩個 featurized state 反推 agent 曾採取的 action，藉此以自監督式學習的方式訓練出 raw state 中與 agent 的行為真正相關的表徵。使用 featurized space 的預測誤差作為內部獎勵（而不是 raw state 的預測誤差）可以避免隨機或與 agent 無關的資訊干擾以好奇心獎勵 agent 多探索的設計。

Meta-Critic Networks for Sample Efficient Learning

延伸強化學習中的 actor-critic 架構，用同一個 meta-critic 指導許多同類型的不同工作，目標是訓練一個可以有效指導新 few-shot 訓練問題的 meta-critic。為了讓 meta-critic 可以區別不同 task 的差異，必須把「學習軌跡（learning traces）」encode 後也作為 meta-critic 的輸入。

因此 Meta-critic 包含兩部分：（1）對應傳統的 value function 的 Meta-Value Network，以及（2）負責 encode 學習軌跡（state, action, reward 的歷史）再輸入給 MVN 的 Task-Action Encoder Network。實驗於 sine & linear function regression（監督式學習）以及 cartpole control 遊戲問題（強化學習）。

Neural Architecture Search with RL

訓練一個 Controller（LSTM）來自動設計好的神經網路架構。Controller 負責輸出子模型架構的超參數，建立的子模型在資料上訓練後計算 validation accuracy，以此為 reward 回饋給 Controller 參考。Controller 應用強化學習的 policy gradient 方法修正自己，讓下一次產生的架構能更好（產生的子網路 validation accuracy 更高）。

演繹與歸納

演繹（Deduction）與歸納（Induction）都是常見的推理方法，但如何區分兩者總是覺得混淆。韋士英文字典網站有一篇文章介紹 Deduction、Induction、和 Abduction 的區別。

Approaching the problem of matrix factorization from a neural network point of view is quite interesting. The idea is that, when ignoring bias and activation functions, a fully-connected nework is just a series of successive linear transformations that can be expressed as successive matrix multiplications. So finding two matrices (mathbf{M}_1) and (mathbf{M}_2) such that (mathbf{M}_1 mathbf{M}_2 = mathbf{M} ) is analogous to training a one-hidden-layer newtork to produce a net transformation (mathbf{M}). When succeeded, the two weight matrices (mathbf{W}_1) and (mathbf{W}_2) is our (non unique) answer.

Exploring this as an exercise actually led me to another interesting question:

What is the difference between (1) a matrix that is randomly generated, and (2) a matrix that is the product of two matrices that are randomly generated?

Post URL generated by Jekyll

Knowing how Jekyll forms the url for you makes it easier to work with.