Natural Image Statistics A probabilistic approach to early computationalvisi

Outline of the visual system
In this chapter, we review very briefly the structure of the human visual system. This
exposition contains a large number of terms which are likely to be new for readers
who are not familiar with neuroscience. Only a few of them are needed later in this
book; they are given in italics for emphasis.
3.1 Neurons and firing rates
Neurons The main information processing workload of the brain is carried by
nerve cells, or neurons. Estimates of the number of neurons in the brain typically
vary between 1010 and 1011. What distinguishes neurons from other cells are their
special information-processing capabilities. A neuron receives signals from other
neurons, processes them, and sends the result of that processing to other neurons. A
schematic diagram of a neuron is shown in Fig. 3.1, while a more realistic picture is
given in Fig. 3.2.
Axons How can such tiny cells send signals to other cells which may be far away?
Each neuron has one very long formation called an axon which connects it to other
cells. Axons can be many centimeters or even a couple of metres long, so they can
reach from one place in the brain to almost any other. Axons have a sophisticated
biochemical machinery to transmit signals over such relatively long distances. The
machinery is based on a phenomenon called action potential.
Action potentials An action potential is a very short (1 ms) electrical impulse
travelling via the axon of the neuron. Action potentials are illustrated in Fig. 3.3.
Due to their typical shape, action potential are also called spikes. Action potentials
are fundamental to the information processing in neurons; they constitute the signals
by which the brain receives, analyzes, and conveys information.
51
52 3 Outline of the visual system

Axon

transmission
Signal
ses

ynap

Computation

receptio

Signal
Other neurons
body
Cell
Dendrites
Fig. 3.1: A schematic diagram of information-processing in a neuron. Flow of information is from
right to left.
Fig. 3.2: Neurons (thick bodies, some lettered), each with one axon (thicker line going up) for sending out the signal, and many dendrites (thinner lines) for receiving signals. Drawing by Santiago
Ram´on y Cajal in 1900.

3.1 Neurons and firing rates	53
Action potentials are all-or-none, in the sense that they always have the same
strength (a potential of about 100mV) and shape. Thus, a key principle in brain
function is that the meaning of a spike is not determined by what the spike is like
(because they are all the same), but rather, where it is, i.e. which axon is it travelling
along, or equivalently, which neuron sent it. (Of course, the meaning also depends
on when the spike was fired.)

Time
Electric potential
Fig. 3.3: An action potential is a wave of electrical potential which travels along the axon. It travels
quite fast, and is very short both in time and its spatial length (along the axon). The figure shows
the potentials in different parts of the axon soon after the neuron has emitted two action potentials.
Signal reception and processing At the receiving end, action potentials are input
to neurons via shorter formations called dendrites. Typically, an axon has many
branches, and each of them connects to a dendrite of another neuron. Thus, the axon
could be thought of as output wires along which the output signal of the neuron is
sent to other neurons; dendrites are input wires which receive the signal from other
neurons. The site where an axon meets a dendrite is called a synapse. The main cell
body, or soma, is often thought of as the main “processor” which does the actual
computation. However, an important part of the computation is already done in the
dendrites.
Firing rate The output of the neuron consists of a sequence of spikes emitted
(a “spike train”). To fully describe such a sequence, one should record the time
intervals between each successive spike. To simplify the situation, most research in
visual neuroscience has concentrated on the neurons’ firing rates, i.e. the number of
spikes “fired” (emitted) by a neuron per second. This gives a single scalar quantity
which characterizes the activity of the cell. Since it is these action potentials which
are transmitted to other cells, the firing rate can also be viewed as the “result” of the
computation performed by the neuron, in other words, its output.
Actually, most visual neurons are emitting spikes all the time, but with a relatively low frequency (of the order of 1 Hz). The “normal” firing rate of the neuron
when there is no specific stimulation is called the spontaneous firing rate. When the
firing rate is increased from the spontaneous one, the neuron is said to be active or
activated.

54	3 Outline of the visual system
Computation by the neuron	How is information processed, i.e. how are the in

coming signals integrated in the soma to form the outcoming signal? This question
is extremely complex and we can only give an extremely simplified exposition here.
A fundamental principle of neural computation is that the reception of a spike at a
dendrite can either excite (increase the firing rate) of the receiving neuron, or inhibit
it (decrease the firing rate), depending on the neuron from which the signal came.
Furthermore, depending on the dendrite and the synapse, some incoming signals
have a stronger tendency to excite or inhibit the neuron. Thus, a neuron can be
thought of as an elementary pattern-matching device: its firing rates is large when
it receives input from those neurons which excite it (strongly), and no input from
those neurons which inhibit it. A basic mathematical model for such an action is to
consider the firing rate as a linear combination of incoming signals; we will consider
linear models below.
Thinking in terms of the original visual stimuli, it is often thought that a neuron
is active when the input contains a feature for which the neuron is specialized — but
this is a very gross simplification. Thus, for example, a hypothetical “grandmother
cell” is one that only fires when the brain perceives, or perhaps thinks of, the grandmother. Next we will consider what are the actual response properties of neurons in
the visual system.
3.2 From the eye to the cortex
Figure 3.4 illustrates the earliest stages of the main visual pathway. Light enters
the eye, reaching the retina. The retina is a curved, thin sheet of brain tissue that
grows out into the eye to provide the starting point for neural processing of visual
signals. The retina is covered by a more than a hundred million photoreceptors,
which convert the light into an electric signal, i.e. neural activity.
From the photoreceptors, the signal is transmitted through a couple of neural
layers. The last of the retinal processing layer consists of ganglion cells, which
send the output of the retina (in form of action potentials) away from the eye using
their very long axons. The axons of the ganglion cells form the optic nerve. The
optic nerve transmits the visual signals to the lateral geniculate nucleus (LGN) of
the thalamus. The thalamus is a structure in the middle of the brain through which
most sensory signals pass on their way from the sensory organs to the main sensory
processing areas in the brain.
From the LGN the signal goes to various other destinations, the most important
being the visual cortex at the back of the head, where most of the visual processing
is performed. Cortex, or cerebral cortex to be more precise, means here the surface
of the two cerebral hemispheres, also called the “grey matter”. Most of the neurons
associated with sensory or cognitive processing are located in the cortex. The rest of
the cerebral cortex consists mainly of axons connecting cortical neurons with each
other, or the “white matter”.

3.3 Linear models of visual neurons	55
The visual cortex contains some 1/5 of the total cortical area in humans, which
reflects the importance of visual processing to us. It consists of a number of distinct
areas. The primary visual cortex, or V1 for short, is the area to which most of the
retinal output first arrives. It is the most widely-studied visual area, and also the
main focus in this book.

Fig. 3.4: The main visual pathway in the human brain.
3.3 Linear models of visual neurons
3.3.1 Responses to visual stimulation
How to make sense of the bewildering network of neurons processing visual information in the brain? Much of visual neuroscience has been concerned with measuring the firing rates of cells as a function of some properties of a visual input. For
example, an experiment might run as follows: An image is suddenly projected onto a
(previously blank) screen that an animal is watching, and the number of spikes fired
by some recorded cell in the next second are counted. By systematically changing
some properties of the stimulus and monitoring the elicited response, one can make
a quantitative model of the response of the neuron. An example is shown in Fig. 3.5.
Such a model mathematically describes the response (firing rate) r j of a neuron as a
function of the stimulus I(x,y).
In the early visual system, the response of a typical neuron depends only on
the intensity pattern of a very small part of the visual field. This area, where light
increments or decrements can elicit increased firing rates, is called the (classical)
receptive field (RF) of the neuron. More generally, the concept also refers to the
particular light pattern that yields the maximum response.
56 3 Outline of the visual system
time
stimulus spiking activity of cell orientation of bar stimulus
response (spikes/s)
Fig. 3.5: A caricature of a typical experiment. A dark bar on a white background is flashed onto
the screen, and action potentials are recorded from a neuron. Varying the orientation of the bar
yields varying responses. Counting the number of spikes elicited within a fixed time window following the stimulus, and plotting these counts as a function of bar orientation, one can construct a
mathematical model of the response of the neuron.
So, what kind of light patterns actually elicit the strongest responses? This of
course varies from neuron to neuron. One thing that most cells have in common is
that they don’t respond to a static image which consists of a uniform surface. They
respond to stimuli in which there is some change, either temporally or spatially;
such change is called contrast in vision science.
The retinal ganglion cells as well as cells in the lateral geniculate nucleus typically have circular center-surround receptive field structure: Some neurons are excited by light in a small circular area of the visual field, but inhibited by light in a
surrounding annulus. Other cells show the opposite effect, responding maximally to
light that fills the surround but not the center. This is depicted in Figure 3.6a).
3.3.2 Simple cells and linear models
The cells that we are modelling are mainly in the primary visual cortex (V1). Cells
in V1 have more interesting receptive fields than those in the retina or LGN. The socalled simple cells typically have adjacent elongated (instead of concentric circular)
regions of excitation and inhibition. This means that these cells respond maximally
to oriented image structure. This is illustrated in Figure 3.6b).
Linear models are the ubiquitous workhorses of science and engineering. They
are also the simplest successful neuron models of the visual system. A linear model
for a visual neuron1 means that the response of a neuron is modelled by a weighted
1 Note that there are two different kinds of models one could develop for a visual neuron. First,
one can model the output (firing rate) as a function of the input stimulus, which is what we do here.
3.3 Linear models of visual neurons 57
-
-
-
-
-
-
-
--
-
+
simple cells
b
off
+
- -
++
-
+
+++
+
- - -
+
+ -
++
-
+
+
+
++
on
+ +
a
+ -
++ -
-
--
+ +
-
-
+ - --
-
--
-
+
+ +
+ +
+
Fig. 3.6: Typical classical receptive fields of neurons early in the visual pathway. Plus signs denote regions of the visual field where light causes excitation, minuses regions where light inhibits
responses. a) Retinal ganglion and LGN neurons typically exhibit center-surround receptive field
organization, in one of two arrangements. b) The majority of simple cells in V1, on the other hand,
have oriented receptive fields.
sum of the image intensities, as in
r j = ∑
x,y
Wj(x,y)I(x,y)+ r0, (3.1)
where Wj(x,y) contains the pattern of excitation and inhibition for light for the neuron j in question. The constant r0 is the spontaneous firing rate. We can define the
spontaneous firing rate to be the baseline (zero) by subtracting it from the firing rate:
r˜j = r j - r0, (3.2)
which will be done in all that follows.
Linear receptive-field models can be estimated from visual neurons by employing a method called reverse correlation. In this method, a linear receptive field is
estimated so that the mean square error between the estimated r j in equation (3.1),
and the actual firing rate is minimized, where the mean is taken over a large set of
visual stimuli. The name “reverse correlation” comes from the fact that the general
solution to this problem involves the computation of the time-correlation of stimulus and firing rate. However, the solution is simplified when temporally and spatially
uncorrelated (“white noise”, see Section 4.6.4) sequences are used as visual stimuli – in this case, the optimal Wj is obtained by computing an average stimulus
over those stimuli which elicited a spike. Examples of estimated receptive fields are
shown in Fig. 3.7.
Alternatively, one could model the output as a function of the direct inputs to the cell, i.e. the rates
of action potentials received in its dendrites. This latter approach is more general, because it can
be applied to any neuron in the brain. However, it is not usually used in vision research because it
does not tell us much about the function of the visual system unless we already know the response
properties of those neurons whose firing rates are input to the neuron via dendrites, and just finding
those cells whose axons connect to a given neuron is technically very difficult.
58 3 Outline of the visual system
Fig. 3.7: Receptive fields of simple cells estimated by reverse correlation based on single-cell
recordings in a macaque monkey. Courtesy of Dario Ringach, UCLA.
3.3.3 Gabor models and selectivities of simple cells
How can we describe the receptive field of simple cells in mathematical terms? Typically, this is based on modelling the receptive fields by Gabor functions, reviewed in
Section 2.4.2. A Gabor function consists of an oscillatory sinusoidal function which
generates the alternation between the excitatory and inhibitory (“white/black”) areas, and a gaussian “envelope” function which determines the spatial size of the
receptive field. In fact, when comparing the receptive field in Fig. 3.7 with the Gabor functions in Fig. 2.12b)-c), it seems obvious that Gabor functions provide a
reasonable model for the receptive fields.
Using a Gabor function, the receptive field is reduced to a small number of parameters:
• Orientation of the oscillation
• Frequency of oscillation
• Phase of the oscillation
• Width of the envelope (in the direction of the oscillation)
• Length of the envelope (in the direction orthogonal to the oscillation). The ratio
of the length to the width is called the aspect ratio.
• The location in the image (on the retina)

3.4 Nonlinear models of visual neurons	59
These parameters are enough to describe the basic selectivity properties of simple
cells: a simple cell typically gives a strong response when the input consists of a
Gabor function with approximately the right (“preferred”) values for all, or at least
most, of these parameters (the width and the length of the envelope are not criti
cal for all simple cells). Thus, we say that simple cells are selective for frequency,
orientation, phase, and location.
In principle, one could simply try to find a Gabor function which gives the best
fit to the receptive field estimated by reverse correlation. In practice, however, more
direct methods are often used since reverse correlation is rather laborious. Typi
cally, what is computed are tuning curves for some of these parameters. This was
illustrated in Fig. 3.5. Typical stimuli include two-dimensional Fourier gratings (see
Fig. 2.5) and simple, possibly short, lines or bars. Examples of such analyses will
be seen in Chapters 6 and 10.
*3.3.4 Frequency channels*
The selectivity of simple cells (as well as many other cells) to frequency is related
to the concept of “frequency channels” which is widely used in vision science. The
idea is that in the early visual processing (something like V1), information of differ
ent frequencies is processed independently. Justification for talking about different
channels is abundant in research on V1. In fact, the very point in using Gabor models
is to model the selectivity of simple cells to a particular frequency range.
Furthermore, a number of psychological experiments point to such a division
of early processing. For example, in Figure 3.8, the information in the high- and
low-frequency parts are quite different, yet observes have no difficulty in process
ing (reading) them separately. This figure also illustrates the practical meaning of
frequency selectivity: some of the cells in V1 respond to the “yes” letters but do
not respond to the “no” letters, while for other cells, the responses are the other
way round. (The responses depend, however, on viewing distance: stimuli which
are low-frequency when viewed from a close distance will be high-frequency when
viewed from far away.)
3.4 Nonlinear models of visual neurons
*3.4.1 Nonlinearities in simple-cell responses*
Linear models are widely used in modelling visual neurons, but they are definitely a
rough approximation of the reality. Real neurons exhibit different kinds of nonlinear
behaviour. The most basic nonlinearities can be handled by adding a simple scalar

60 3 Outline of the visual system
a)
b) c)
Fig. 3.8: A figure with independent (contradictory?) information in different frequency channels. a)
the original figure, b) low-frequency part of figure in a), obtained by taking the Fourier transform
and setting to zero all high-frequency components (whose distance from zero is larger than a certain
threshold) , c) high-frequency part of figure in a). The sum of the figures in b) and c) equals a).

3.4 Nonlinear models of visual neurons	61
nonlinearity to the model, which leads to what is simply called a linear-nonlinear
model.
In the linear-nonlinear model, a linear stage is followed by a static nonlinearity
f :

r˜j = f ∑
x,y
Wj(x,y)I(x,y)!. (3.3)
A special case of the linear-nonlinear model is half-wave rectification, defined by
f(α) = max{0,α}. (3.4)
One reason for using this model is that if a neuron has a relatively low spontaneous
firing rate, the firing rates predicted by the linear model may then tend to be negative.
The firing rate, by definition, cannot be negative.
We must distinguish here between two cases. Negative firing rates are, of course,
impossible by definition. In contrast, it is possible to have positive firing rates that
are smaller than the spontaneous firing rate; they give a negative ˜ r j in Equation (3.2).
Such firing rates correspond to the sum term in Eq. (3.1) being negative, but not so
large that the r j becomes negative. However, in V1, the spontaneous firing rate tends
to be rather low, and the models easily predict negative firing rates for cortical cells.
(This is less of a problem for ganglion and LGN cells, since their spontaneous firing
rates are relatively high.)
Thus, half-wave rectification offers one way to interpret the purely linear model
in Eq. (3.1) in a more physiologically plausible way: the linear model combines the
outputs of two half-wave rectified (non-negative) cells with reversed polarities into
a single output r j – one cell corresponds to linear RF Wj and the other to RF -Wj.
The linear-nonlinear model is flexible and can accommodate a number of other
properties of simple cell responses as well. First, when the linear model predicts
small outputs, i.e., the stimulus is weak, no output (increase in firing rate) is actually
observed in simple cells. In other words, it seems there is a threshold which the
stimulus must attain to elicit any response. This phenomenon, combined with halfwave rectification, could be modelled by using a nonlinearity such as
f(α) = max(0,α - c) (3.5)
where c is a constant that gives the threshold.
Second, due to biological properties, neurons have a maximum firing rate. When
the stimulus intensity is increased above a certain limit, no change in the cell response is observed, a phenomenon called saturation. This is in contradiction with
the linear model, which has no maximum response: if you multiply the input stimulus by, say, 1,000,000, the output of the neuron increases by the same factor. To take
this property into account, we need to use a nonlinearity that saturates as well, i.e.
has a maximum value. Combining the three nonlinear properties listed here leads us
to a linear-nonlinear model with the nonlinearity
f(α) = min(d,max(0,α - c)) (3.6)

62	3 Outline of the visual system
where d is the maximum response. Figure 3.9 shows the form of this function.
Alternatively, we could use a smooth function with the same kind of behaviour,
such as

f(α) = d α2
c′ + α2. (3.7)
where c′ is another constant that is related to the threshold c.
Fig. 3.9: The nonlinear function in (3.6).
3.4.2 Complex cells and energy models
Although linear-nonlinear models are useful in modelling many cells, there are also
neurons in V1 called complex cells for which these models are completely inadequate. These cells do not show any clear spatial zones of excitation or inhibition.
Complex cells respond, just like simple cells, selectively to bars and edges at a particular location and of a particular orientation; they are, however, relatively invariant
to the spatial phase of the stimulus. An example of this is that reversing the contrast
polarity (e.g. from white bar to black bar) of the stimulus does not markedly alter
the response of a typical complex cell.
The responses of complex cells have often been modelled by the classical ‘energy
model’. (The term ‘energy’ simply denotes the squaring operation.) In such a model
(see Figure 3.10) we have
r j = ∑
x,y
Wj1(x,y)I(x,y)!2 + ∑x,y Wj2(x,y)I(x,y)!2
where Wj1(x,y) and Wj2(x,y) are quadrature-phase Gabor functions, i.e., they have
a phase-shift of 90 degrees, one being odd-symmetric and the other being evensymmetric. It is often assumed that V1 complex cells pool the responses of simple

3.5 Interactions between visual neurons	63
cells, in which case the linear responses in the above equation are outputs of simple
cells.
The justification for this model is that since the two linear filters are Gabors in
quadrature-phase, the model is computing the local Fourier “energy” in a particular
range of frequencies and orientations, see Equation (2.16). This provides a model of
a cell which is selective for frequency and orientation, and is also spatially localized,
but does not care about the phase of the input. In other words it is phase-invariant
(This will be discussed in more detail in Chapter 10.)
The problem of negative responses considered earlier suggests a simple modifi
cation of the model, where each linear RF again corresponds to two simple cells.
The output of a linear RF is divided to the positive and negative parts and half-wave
rectified. In this case, the half-wave rectified outputs are further squared so that they
compute the squaring operation of the energy model. In addition, complex cells sat
urate just as simple cells, so it makes sense to add a saturating nonlinearity to the
model as well.

++
++
-
+
+
++
++
++
+ -
---
---
-------
+
++
++++
STIMULUS
Fig. 3.10: The classic energy model for complex cells. The response of a complex cell is modelled
by linearly filtering with quadrature-phase Gabor filters (Gabor functions whose sinusoidal components have a 90 degrees phase difference), taking squares, and summing. Note that this is purely
a mathematical description of the response and should not be directly interpreted as a hierarchical
model summing simple cell responses.
3.5 Interactions between visual neurons
In the preceding models, V1 cells are considered completely independent units:
each of them just takes its input and computes its output. However, different kinds
interactions between the cells have been observed.
The principal kind of interaction seems to be inhibition: when a cell j is active,
the responses of another cell i is reduced from what they would be without that cell
j being active. To be more precise, let us consider two simple cells whose receptive
fields Wi and Wj are orthogonal (for more on orthogonality see Chapter 19). Take,
for example, two cells in the same location, one with vertical and the other with
horizontal orientation). Take any stimulus I0 which excites the cell Wj. For example,
we could take a stimulus which is equal to the receptive field Wj itself. Now, we add

64	3 Outline of the visual system
another stimulus pattern, say I1, to I0. This simply means that we add the intensities
pixel-by-pixel, showing the following stimulus to the retina:
I(x,y) = I0(x,y)+ I1(x,y)	(3.8)
The added stimulus I1 is often called a mask or a pedestal.
The point is that by choosing I1 suitably, we can demonstrate a phenomenon
which is probably due to interaction between the two cells. Specifically, let us
choose a stimulus which is equal to the receptive field of cell i: I1 = Wi. This is
maximally excitatory for the cell i, but it is orthogonal to the receptive field of cell
j. With this kind of stimuli, the typical empirical observation is that the cell j has a
lower firing rate for the compound stimulus I = I0+I1 than for I0 alone. This inhibi
tion cannot be explained by the linear models (or the linear-nonlinear models). The
mask I1 should have no effect on the linear filter stage, because the mask is orthog
onal to the receptive field Wj. So, to incorporate this phenomenon in our models,
we must include some interaction between the linear filters: The outputs of some
model cells must reduce the outputs of others. (It is not completely clear whether
this empirical phenomenon is really due to interaction between the cells, but that is
a widely-held view, so it makes sense to adopt it in our models.)

a) b) c)
Fig. 3.11: Interaction between different simple cells. a) Original stimulus I0 of a simple cell, chosen here as equal the receptive field of Wj. b) Masking pattern I1 which is orthogonal to I0. c)
Compound stimulus I. The response to I is smaller than the response to I0 although the linear
models predicts that the responses should be equal.
This phenomenon is typically called “contrast gain control”. The idea is that
when there is more contrast in the image (due to the addition of the mask), the
system adjusts its responses to be generally weaker. It is thought to be necessary
because of the saturating nonlinearity in the cells and the drastic changes in illumination conditions observed in the real world. For example, the cells would be responding with the maximum value most of the time in bright daylight (or a brightly
lit part of the visual scene), and they would be responding hardly at all in a dim
environment (or a dimly lit part of the scene). Gain control mechanisms alleviate

3.6 Topographic organization	65
this problem by normalizing the variation of luminance over different scenes, or
different parts of the same scene. For more on this point, see Section 9.5.2
This leads us to one of the most accurate currently known simple-cell models, in
terms of predictive power, the divisive normalization model. Let W1,...,WK denote
the receptive fields of those cells whose receptive fields are approximately in the
same location, and σ a scalar parameter. In the divisive normalization model, the
output of the cell corresponding to RF Wj is given by

r j =
f ∑x,yWj(x,y)I(x,y)
∑K i=1 f ∑x,y Wi(x,y)I(x,y)+ σ 2, (3.9)
where f is again a static nonlinearity, such as the half-wave rectification followed
by squaring. This divisive normalization model provides a simple account of contrast gain control mechanisms. In addition, it also automatically accounts for such
simple-cell nonlinearities as response saturation and threshold. In fact, if the input
stimulus is such that it only excites cell j, and the linear responses in the denominator are all zero expect for the one corresponding to cell j, the model reduces to
the linear-nonlinear model in Section 3.4.1. If we further define f to be the square
function, we get the nonlinearity in Equation 3.7.
3.6 Topographic organization
A further interesting point is how the receptive fields of neighbouring cells are related. In the retina, the receptive fields of retinal ganglion cells are necessarily linked
to the physical position of the cells. This is due to the fact that the visual field
is mapped in an orderly fashion to the retina. Thus, neighbouring retinal ganglion
cells respond to neighbouring areas of the visual field. However, there is nothing to
guarantee the existence of a similar organization further up the visual pathway.
But the fact of the matter is that, just like in the retina, neighbouring neurons in
the LGN and in V1 tend to have receptive fields covering neighbouring areas of the
visual field. This phenomenon is called retinotopy. Yet this is only one of several
types of organization. In V1, the orientation of receptive fields also tends to shift
gradually along the surface of the cortex. In fact, neurons are often approximately
organized according to several functional parameters (such as location, frequency,
orientation) simultaneously. This kind of topographic organization also exists in
many other visual areas.
Topographical representations are not restricted to cortical areas devoted to vision, but are present in various forms throughout the brain. Examples include the
2 In fact, different kinds of gain control mechanisms seem to be operating in different parts of
the visual system. In the retina, such mechanisms normalize the general luminance level of the
inputs, hence the name “luminance gain control”. Contrast gain control seems to be done after that
initial gain control. The removal of the mean grey-scale value (DC component) that we do in later
chapters can be thought to represent a simple luminance gain control mechanism.

66	3 Outline of the visual system
tonotopic map (frequency-based organization) in the primary auditory cortex and
the complete body map for the sense of touch. In fact, one might be pressed to find
a brain area that would not exhibit any sort of topography.
3.7 Processing after the primary visual cortex
From V1, the visual signals are sent to other areas, such as V2, V4, and V5, called
extrastriate as another name for V1 is the “striate cortex”. The function of some
of these areas (mainly V5, which analyzes motion) is relatively well understood,
but the function of most of them is not really understood at all. For example, it
is assumed that V2 is the next stage in the visual processing, but the differences
in the features computed in V1 and V2 are not really known. V4 has been vari
ously described as being selective to long contours, corners, crosses, circles, “non
Cartesian” gratings, colour, or temporal changes (see the references section below).
Another problem is that the extrastriate cortex may be quite different in humans
and monkeys (not to mention other experimental animals), so results from animal
experiments may not generalize to humans.
3.8 References
Among general introductions to the visual system, see, e.g., (Palmer, 1999). A most
interesting review of the state of modelling of the visual cortex, with extensive ref
erences to experiments, is in (Carandini et al, 2005).
For a textbook account of reverse correlation, see e.g. (Dayan and Abbott, 2001);
reviews are (Ringach and Shapley, 2004; Simoncelli et al, 2004). Classic application
of reverse correlation for estimating simple cell receptive fields is (Jones and Palmer,
1987b; Jones et al, 1987; Jones and Palmer, 1987a). For spatiotemporal extensions
see (DeAngelis et al, 1993a,b). LGN responses are estimated, e.g., in (Cai et al,
1997), and retinal ones, e.g. in (Davis and Naka, 1980).
The nonlinearities in neuron responses are measured in (Anzai et al, 1999b;
Ringach and Malone, 2007); theoretical studies include (Hansel and van Vreeswijk,
2002; Miller and Troyer, 2002). These studies concentrate on the “thresholding”
part of the nonlinearity, ignoring saturation. Reverse correlation in the presence of
nonlinearities is considered in (Nykamp and Ringach, 2002).
A review on contrast gain control can be found in (Carandini, 2004). The divisive
normalization model is considered in (Heeger, 1992; Carandini et al, 1997, 1999).
More on the interactions can be found in (Albright and Stoner, 2002). For review
of the topographic organization in different parts of the cortex, see (Mountcastle,
1997).
A discussion on our ignorance of V2 function can be found in (Boynton and
Hedg´e, 2004). Proposed selectivities in V4 include long contours (Pollen et al,

3.9 Exercices	67
2002), corners and related features (Pasupathy and Connor, 1999, 2001), crosses,
circles, and other non-Cartesian gratings (Gallant et al, 1993; Wilkinson et al, 2000),
as well as temporal changes (Gardner et al, 2005). An alternative viewpoint is that
the processing might be quite similar in most extrastriate areas, the main difference
being the spatial scale (Hegd´e and Essen, 2007). A model of V5 is proposed in
(Simoncelli and Heeger, 1998).
Basic historical references on the visual cortex include (Hubel and Wiesel, 1962,
1963, 1968, 1977).
3.9 Exercices
*Mathematical exercises*
1. Show that the addition of a mask which is orthogonal to the receptive field, as in
Section 3.5 should not change the output of the cell in the linear model
2. What is the justification for using the same letter d for the constants in Equa
tions (3.6) and (3.7)?
*Computer assignments*
1. Plot function in Equation (3.7) and compare with the function in (3.6).
2. Receptive fields in the ganglion cells and the LGN are often modelled as a
“difference-of-gaussians” model in which W(x,y) is defined as

exp(- 1
2σ 2
1
[(x- x0)2 +(y- y0)2])- aexp(- 1
2σ 2
2
[(x- x0)2 +(y- y0)2]) (3.10)
Plot the receptive fields for some choices of the parameters. Find some parameter
values that reproduce a center-surround receptive field.

Natural Image Statistics A probabilistic approach to early computationalvisi

猜你喜欢