A work by Turing Award winner Yoshua Bengio: "Generative Streaming Networks" Expand the Field of Deep Learning

Click the above artificial intelligence algorithm and Python big data to get more dry goods

In the upper right  ...  Set as a star ★, get resources at the first time

Only for academic sharing, if there is any infringement, please contact to delete

Reprinted in: Heart of the Machine

Recently, a paper called "GFlowNet Foundations" has attracted people's attention. This is a new study by Turing Award winner Yoshua Bengio. The paper is 70 pages long.

After Geoffrey Hinton's "capsule network", Bengio, another giant of deep learning, also put forward his own ideas on the future direction of the AI ​​field. In this study, the authors propose an important concept called Generative Flow Networks (GFlowNets).

c8fb1c483f7ba55cdcfcd66f39fa5efe.png

GFlowNets are inspired by the way information is propagated in temporal differential RL methods (Sutton and Barto, 2018). Both rely on the credit assignment consistency principle, and they only achieve asymptotics when training converges. Since the number of paths in the state space grows exponentially, it is difficult to achieve accurate computation of gradients, therefore, both methods rely on local consistency between different components and a training objective, i.e. if all learned components interact with each other are locally consistent, then we get a system that can perform global estimation.

As for the role of GFlowNets, Emmanuel Bengio, one of the authors of the paper, also gave some answers: "We can do many things with GFlowNets: perform general probabilistic operations on sets and graphs, such as handling difficult marginalization problems, estimating partition functions and free can, compute superset conditional probabilities for a given subset, estimate entropy, mutual information, etc.”

f75ab2321d09328937152a6c11786c79.png

This paper provides a formal theoretical basis and an extension of the theoretical result set for active learning scenarios, as well as a broader approach to active learning scenarios. The properties of GFlowNets make them ideal for modeling and sampling from distributions in ensembles and graphs, estimating free energy and marginal distributions, and for learning energy functions from data as Monte-Carlo Markov chains (MCMC) ) a learnable, amortized alternative.

The key property of GFlowNets is that they learn a policy that samples a composite object s in several steps such that the probability P_T(s) of sampling an object s is the same as the value R of a given reward function applied to that object (s) is approximately proportional. A typical example is training a generative model from a positive dataset, GFlowNets are trained to match a given energy function and convert it into a sampler, which we consider a generative policy since the composite object s is generated by a Constructed in a series of steps. This is similar to the implementation of the MCMC method, except that GFlowNets do not require a lengthy random search in such object space, thus avoiding the difficulty of MCMC method in dealing with mode mixing. GFlowNets handle this challenge by transforming it into amortized training of generative policies.

An important contribution of this paper is the concept of conditional GFlowNet, which can be used to compute free energies over joint distributions of different types, such as ensembles and graphs. This marginalization can also estimate entropy, conditional entropy, and mutual information. GFlowNets can also be generalized to estimate multiple flows corresponding to rich outcomes (rather than a scalar reward function), similar to distributed reinforcement learning.

This paper extends the theory of the original GFlowNet (Bengio et al., 2021) to include a formula (or free energy formula) for computing marginal probabilities of subsets of variables, which can now be used for subsets or subgraphs of larger sets; The application of GFlowNet in estimating entropy and mutual information; and the introduction of an unsupervised form of GFlowNet (no reward function required for training, only observations) that can sample from the Pareto frontier.

While basic GFlowNets are more similar to the bandits algorithm (since rewards are only provided at the end of a sequence of actions), GFlowNets can be extended to take into account intermediate rewards and sample based on the rewards. The original formulation of GFlowNet was also limited to discrete and deterministic environments, and this paper suggests how to lift both of these limitations. Finally, while the basic formulation of GFlowNets assumes a given reward or energy function, this paper considers how GFlowNet can be jointly learned with the energy function, opening the door to novel energy-based modeling approaches, energy functions, and the modular structure of GFlowNet door.

e97fc2162898b313ceee15306ab5a410.png

Paper address: https://arxiv.org/pdf/2111.09266.pdf

The Heart of the Machine gives a brief introduction to the main chapters of this paper. For more details, please refer to the original paper.

GFlowNets: Learning Flows

The researcher takes full account of the general issues introduced in Bengio et al. (2021), in which some constraints or preferences on flows are given. The researcher's goal is to use the estimators Fˆ(s) and Pˆ(s→s'|s) to find the function that best matches the needs, such as the state flow function F(s) or the transition probability function P(s→s'|s ), these may not conform to the proper flow. Hence, they call this type of learning machine Generative Flow Networks (GFlowNets for short).

GFlowNets are defined as follows:

82f894f6bb57f58a5613b37916969a8d.png

It is important to note that the state-space of GFlowNet can be easily modified to fit the underlying state-space, where transitions do not form a directed acyclic graph (DAG).

For estimating transition probabilities from a terminal flow, in the setting of Bengio et al. (2021), the researchers obtained the terminal flow corresponding to the "terminal reward function R as a function of state determinism":

544ed75bd77e65941073fb8abb5741c5.png

This makes it possible to extend the framework and handle random rewards in various ways.

GFlowNets can be used as an alternative to MCMC Sampling. The GFlowNet approach amortizes the up-front computation to train the generator, yielding a very efficient computation for each new sample (build a single configuration, no chain required).

Flow matching and detailed balancing losses. To train GFlowNet, researchers need to build a training process that implicitly enforces constraints and preferences. They convert flow-matching or detailed balance conditions into usable loss functions.

For the reward function, the researchers considered the setting where the reward is a random rather than a state deterministic function. If there is a reward-matching loss like in Equation 44, then the effective goal of the terminal stream F(s→s_f) is the expected reward E_R[R(s), since this is to minimize the expected loss on R(s) given s value of .

e25be6846044a0839b268465981eab52.png

If there is a reward matching loss like in Equation 43, the log effective objective of the terminal stream log F(s→s_f) is the expected value of log-reward E_R[log R(s)]. This shows that GFlowNets can generalize to matching random rewards when using reward matching loss.

a48bea44bc92a8a80272b97e1faae5aa.png

Furthermore, GFlowNets can be trained offline just like offline reinforcement learning. For Direct Credit Assignment in GFlowNets, the researchers believe that the process of sampling trajectories using GFlowNet can be equivalent to sampling a sequence of states in a random recurrent neural network. To complicate things further, this type of neural network does not directly output a prediction that matches a target, and the second state may be discrete (or both discrete and continuous).

Conditional flow free energy

This chapter focuses on conditional flows and free energies.

A salient property of flows is that the normalization constant Z can be recovered from the initial state flow F(s_0) if the fine balancing or flow matching conditions are met (Corollary 3). Z also provides the partition function associated with a given terminal reward function R specifying a terminal transition stream. The figure below shows how to conditionalize a GFlowNet, given a state s, consider creating a new set of flows (right) from the original flow (left) and the transition flow.

f7a81e1083cd0ac1f80ac2536a294713.png

Free energy is a general formula for marginalizing operations (i.e. summing a large number of terms) related to an energy function. The researchers found that estimating free energy opens the door to interesting applications, with the historically expensive Markov chain Monte Carlo (MCMC) method often the dominant method.

The state of free energy F(s) is defined as follows:

b761fa0f6e6787fceaa7c3bebc11be31.png

How to estimate free energy? Let us consider a special case of conditional GFlowNet, which allows the network to estimate the free energy F(s). To this end, we propose to train a conditional GFlowNet where the conditional input x is an earlier state s in the trajectory.

The state conditional GFlowNet is defined as follows, and defines F(s|s) as the conditional state self-flow.

5a7ddf40efb1f2a233803253e4527d52.png

The researchers say that energy-based models can be trained using GFlowNet. Specifically, GFlowNet is trained to transform the energy function to approximate the corresponding sampler. Therefore, GFlowNet can be used as an alternative to MCMC sampling.

In addition, GFlowNet can also be used for active learning. In the active learning scheme used by Bengio et al. (2021), GFlowNet is used to sample candidate x, where the researchers expect that the reward R(x) is usually large because GFlowNet samples proportionally to R(x) .

Multiflow, Distributed GFlowNets, Unsupervised GFlowNets, and Pareto GFlowNets 

Similar to distributed reinforcement learning, it is very interesting to generalize GFlowNets to capture not only the expected value of the achievable final reward, but also other distributed statistics. More generally, GFlowNets can be thought of as a family, each of which can model specific future environmental outcomes of interest in its own flow.

The following figure shows the definition of an outcome-conditioned GFlowNet:

7143ccabba68d346a47cca5a2ffb0c88.png

In practice, GFlowNets can never be perfectly trained, so such outcome-conditioned GFlowNets should be treated on the same level as goal-conditioned policies in reinforcement learning or upside-down RL in reinforcement learning. In the future, these outcome-conditioned GFlowNets could be extended to random rewards or random environments.

Furthermore, training a GFlowNet conditioned on results can only be done offline, since the conditioned inputs (such as the final return) may only be known after the trajectory has been sampled.

f05743746d8cedc42dd466926585121f.png

The full table of contents of the paper is as follows:

a5a26e08d63d49fdaf912eb7561cb019.png

79764fe9dd49d84310c4a753c36fb46e.png

---------♥---------

Statement: This content comes from the Internet, and the copyright belongs to the original author

The pictures are sourced from the Internet and do not represent the position of this official account. If there is any infringement, please contact to delete

Dr. AI's private WeChat, there are still a few vacancies

7e7689d56eafea7f3da1a361f8a48f3d.png

c1333e9527cb20d28b9d5834050ecb68.gif

How to draw a beautiful deep learning model diagram?

How to draw a beautiful neural network diagram?

One article to understand various convolutions in deep learning

Click to see support751c18e4fc95b19ee603db0416378983.png5893e7401680caa9523b60eb391ea610.png

Guess you like

Origin blog.csdn.net/qq_15698613/article/details/121623613