Rethinking Vision and Action

Action is an important arbitrator as to whether an individual or a species will survive. Yet, action has not been well integrated into the study of psychology. Action or motor behavior is a field apart. This is traditional science with its need for specialization. The sequence in a typical laboratory experiment of see ! decide ! act provides the rationale for broad disciplinary categorizations. With renewed interest in action itself, surprising and exciting anomalous findings at odds with this simplified caricature have emerged. They reveal a much more intimate coupling of vision and action, which we describe. In turn, this prompts us to identify and dwell on three pertinent theories deserving of greater notice.


INTRODUCTION
[W]hether accompanied by consciousness or not, all brain excitation has ultimately but one end, to aid in the regulation of motor coordination. . . . The nature of the problem and current trends in our thinking make it necessary at this time to emphasize particularly the dependence of the mental upon motor activity.
Roger Sperry (1952, pp. 298-99) Roger Sperry shared the Nobel Prize for Physiology and Medicine with David Hubel and Torsten Wiesel in 1981. Sperry's work on split brain patients indicated that higher-order cognitive function and even aspects of consciousness are brain based and could be addressed scientifically. Sperry, however, held very different views 30 years prior to his Nobel Laureate award. In a strongly worded essay titled "Neurology and the mind-brain problem" (Sperry 1952), he disavowed the then current interests in cognition and sensory processing. He insisted on an alternative framework to understand mind and brain, in which sensory processing, subjective experience, and associative memory should be subordinated to the most obvious, most important function of the mind and brain: the coordination of movement.
Ulric Neisser was an acknowledged leader of the cognitive revolution, and his landmark book titled Cognitive Psychology signaled the end of behaviorism and the beginning of a new field (Neisser 1967). However, his own role in this field was surprisingly brief, and he soon became disillusioned by what he regarded as a sterile enterprise. Less than 10 years later, in his book Cognition and Reality (Neisser 1977), he argued that studying cognition independent of action was almost pointless, that cognition and action were always occurring conjointly, that organisms were in an endless cycle of perceiving and acting, and that each process could not be studied in isolation.
Despite such strong views expressed by the early Sperry and the later Neisser, the discipline of psychology has been mostly content to keep to its traditional subdisciplines. There have been some important major exceptions, but the territories are well established, with major meetings, journals, and societies devoted to each area. The specific topics change over the years, but academic and research fields continue, with many making evident progress. Perusing introductory psychology textbooks, there are no chapters on action or motor behavior. Given the importance of action, the discipline of psychology itself has been curiously negligent (Rosenbaum 2005).
If we look at textbooks, or go to meetings, we can sense an implicit picture as to how the whole brain is organized to create action. This picture has been seldom outlined in any formal sense, yet it has been tacitly assumed. Figure 1 shows this most simplified conception. For many who have studied visual psychophysics, especially to characterize the earliest stages of vision, this approach has been an unquestionable success. Case in point is the measurement of the absolute threshold, the smallest amount of light that can be perceived in the dark. Almost by magic, it seems that querying the whole person is all that is needed to show that just one quantum is sufficient to excite one rod photoreceptor (Hecht Shlaer & Pirenne 1942).
An enduring and founding idea in cognitive science has been that there are distinct processing stages, and once a particular stage has completed its action, the job is handed over to the next (Donders 1868, Sternberg 1969). However, not all seemingly pure vision experiments are so easily isolated; below we show that the presence of or potential for action alters the findings significantly, casting doubts on the wisdom of such narrowing down. Conversely, ostensible signature results obtained from a pure isolated motor system may not be so pure after all and are influenced significantly by a wide range of processes outside the motor system.
The overall plan for this review is as follows. We start by stating that while the serial caricature described in Figure 1 still holds sway, its perspective is likely too narrow. After some preliminaries, we describe unusual ways in which vision has a much more intimate relationship with the details of motor execution. This more entangled coupling of vision and action prompts us to look out for broader, more integrative ideas. We thus conclude by placing these anomalous results in the context of three theoretical perspectives: ideo-motor theory, attributed to William James (1890); Bjorn Merker's (2005) theory on the motor origins of consciousness; and Paul Cisek's (2007) affordance competition hypothesis.

SEEING, DECIDING/WILLING, AND ACTING
The paradigm of partially coherently moving random dots has been particularly revealing in showing a nice sequence of serial stages related to seeing and deciding. It starts with Newsome & Pare's (1988) experiments with monkeys trained to discriminate the left or right movement of a tiny number of dots in a field of dots moving in random directions. Newsome and colleagues then went on to obtain thresholds for the whole organism (a behaving monkey) and for individual medial temporal (MT) cortical cells. The result was dramatic: Some cells had sensitivities on a par with the behaving monkey (Newsome et al. 1989). It thus seems possible that these neurons are a critical link in a chain of events that could count as perceiving. A tiny cluster of neurons in area MT were deemed the bottleneck deciding whether the monkey would report seeing the stimulus. Supporting this view, electrical stimulation at the same site biased the choice of the monkey appropriately (Salzman et al. 1990). All this seems to be in line with the picture depicted in Figure 1, identifying the seeing process.
Filling this out more, Shadlen & Newsome (2001) recorded from single neurons in the parietal cortex and used this same moving dots paradigm, but they drew out the process so that the percentage of coherent moving dots increased slowly out of the noise, allowing them to track the decision process more fully. In so doing, they characterized the neurons in the parietal cortex, which, closer to the motor output, are presumed to reflect the decision to act. These results support long-standing drift diffusion models, indicating that after a required accumulation of evidence in parietal neurons, the monkey decides. As such, these neurons reflect a neural decision process and seem to be part of the causal chain of seeing and deciding. This accumulator model (Ratliff 1978) is widely accepted, although it seems very different in the mouse (Harvey et al. 2012) and the buildup could be sudden and not gradual as has been assumed (Latimer et al. 2015). While these two later findings challenge the drift diffusion idea, they still support the hypothesis that this area plays the key decision-making function. Taken together, these experiments with moving random dots provide a nice analysis of a sequence of stages: Information processing in an area closer to perception is followed by processes related to the decision to act. Again, this gives credence to the general scheme outlined in Figure 1.
Going closer to the output side and more explicitly measuring and manipulating neural activity in relation to the motor responses themselves, we see evidence for sequential stages as nicely reviewed by Haggard (2019; see also Calvo-Merino et al. 2006). These studies have relied on electrical stimulation of various brain regions of human patients treated for epilepsy. Not surprisingly, stimulation using implanted electrodes closest to the output, say in the motor cortex, is accompanied by movement in human subjects. However, when asked, the subjects do not own the movement (Desmurget et al. 2009): They deny that they willed the movement. This puts the area closest to the output. In a close premotor area, movements are also elicited by stimulation, but now the subject affirms that they have willed it. Going to the parietal cortex, stimulation here has some of the characteristics of this just mentioned premotor area, but now one can elicit the sense of a willed action but no action itself: We have will without action. With more intense stimulation, there is then the action in addition to the will. These stimulation studies support the idea that going from the parietal cortex forward to the motor cortex (M1), there is a set of sequential processes, something akin to intention followed by action.
Overall, the studies above are consistent with the idea of a sequence of stages from sensory to motor, with a reasonable series of intervening steps. However, below we report examples at odds with this framework, showing that vision and action are more intimately coupled in unexpected ways, indicating highly specific but as yet poorly understood processes that are so closely connected.

A NEW ANATOMICAL CONTEXT
Before continuing, we mention some anatomical facts to provide a supportive context for what follows. The most well-established concept in neuroscience is that the brain is highly differentiated, made up of distinct areas with obvious order and patterns. This occurs at so many levels and will not be detailed here, except to mention some pertinent points. Most important is that these divisions and subdivisions are even more clearly patterned and stable if we consider the phylogenetic history of vertebrates and mammals (Butler & Hodos 2005). The visual cortex is located at the posterior part of the brain, the somatic cortex (with the homunculus topography) is more central, and then the corresponding body homunculus for what is called the motor cortex is adjacent and anterior. The motor system is similarly subdivided, with primary, supplementary, and other more peripheral motor structures as well as large, important subcortical motor structures, such as the basal ganglia. Here there is also ordering within, with certain parts considered more as the input side (e.g., the head of the caudate nucleus) and other portions considered nearer to the output (e.g., Globus pallidus, putamen); very close to the output is the substantia nigra.
However, other studies indicate that this picture obscures another significant aspect: connectivity. There is substantial interconnectivity between very disparate regions of the brain, and these connections are likely to be important. This can be seen in both classical and modern anatomical tracer studies, where neurons from widely distant sites are found to be interconnected. Recordings from single neurons support this view. Despite the clear anatomical separation of the visual and motor cortex, single neurons in the motor system respond to high-level visual stimuli. Most wellknown, provocative, and controversial is the proclaimed existence of so-called mirror neurons. Here, neurons in premotor areas of macaque monkeys very close to the primary motor cortex fire both to specific motor actions and if the monkey is seeing another animal or human perform the same motor action (Rizzolatti et al. 2001). This has aroused unprecedented interest even outside of science, with strong claims and critiques (Heyes & Catmur 2022). While mindful of the importance of these claims for our own thinking, we deem it better to just comment on selected and important empirical facts that have resulted from these ideas. More important for our purposes is the fact that besides the so-called mirror neurons, many more neurons in motor cortices respond to a range of specific visual stimuli, and many to specific actions of monkeys or humans. (Hatsopoulos & Suminski 2011).
A similar invasion of vision into the basal ganglia is also evident. Hikosaka and colleagues indicate that specific visual responses are seen in the head and the tail of the caudate nucleus (the largest structure in the basal ganglia). Neurons at the head of the caudate respond to recent contingencies, perhaps reflecting short-term visual memory, whereas neurons in the tail have very specific responses to patterned stimuli that are very stable over time (Kim & Hikosaka 2013, 2015. The positioning of the latter structure very close to the inferior temporal (IT) cortex may explain these properties.
Of interest is the likely existence of a retinotopic visual map at the very output of the basal ganglia. Substantia nigra neurons project to the superior colliculus and essentially control its function in the generation of saccadic eye movements. They fire at a very high rate and tonically inhibit this structure. When they get a punctate localized signal from the caudate nucleus, inhibiting just some nigral neurons, the inhibitory drive is released to the desired part of the colliculus and a saccade to a retinotopic locus occurs (Hikosaka & Wurtz 1983). This indicates something quite unexpected from a serial sensory-to-motor sequence scheme. It indicates there must be a visual retinotopic map at one of the well-accepted outputs of a cerebral motor system. Independent evidence for such a visual map in the substantia nigra has been recently corroborated using fMRI recordings in humans, showing different responses to different loci in the visual field (DeSimone et al. 2015).
Neurophysiological evidence for the motor system's influence on visual structures is also well documented. This manifests itself in so-called gain fields, where it was shown initially that eye position powerfully influences the firing rate of visual neurons in the parietal cortex in a very specific way (Andersen & Mountcastle 1983). The receptive fields remained the same spatially, and only their strength varied as a function of eye position-thus the term "gain field." Subsequently, gain fields became also evident earlier in the extrastriate cortex, and possibly even in V1. In addition, gain field inputs were not restricted to eye position, but some were influenced by body and limb positions (Snyder 2000). Computational models then revealed that by combining retinotopic and gain field information, localization of visual stimuli with respect to the head and body is theoretically feasible (Zipser & Andersen 1988, Lehky et al. 2016.
These neurophysiological findings strongly suggest that the coupling between vision and action might be much stronger than implied by the received assumptions depicted in Figure 1, indicating that very specific visual responses and the structures to mediate them are well into the motor system. Conversely, positions of the eye and body modify visual responses.
What follows is a survey of a selected range of psychological and behavioral phenomena that indicate that the coupling between vision and action is much closer than originally imagined and could in part rest on the anatomical substrates just described. Then we review theoretical perspectives that may provide a broader conceptual framework within which to understand vision and action.

FUSING VISION AND ACTION: RAMACHANDRAN'S EXISTENCE PROOF
Before recounting a series of important experimental findings, we dwell on one set of wellpublicized clinical findings by Ramachandran & Altschuler (2009). The reported results rest on a powerful visual illusion that, when harnessed, is effective in rehabilitating wrist injuries and restoring movement in stroke patients. Most dramatically, it plays a role in eliminating long-standing intractable pain in patients with amputated limbs. Prior to Ramachandran's work, there was very little in the way of treatment to permanently remove this persistent pain.
The apparatus and procedure were stunning in their simplicity. The patient depicted in Figure 2 was missing much of his left arm. However, a mirror was placed so that if the subject peered into the mirror, he could see a reflection of his intact right arm. Despite the artificiality and crudeness of the setup, where there was little attempt to hide the real situation, many subjects saw the mirror reflection of the right hand as their left. When the patient was asked to make bilaterally symmetrical movements of the arm (including the phantom), the results were nothing short of spectacular. The left paralyzed phantom arm was perceived by the patient as

19.6
Nakayama ARjats.cls August 31, 2022 12:38 moving. Even after a decade of persistent pain, training with this simple visual illusion led to its disappearance. Its application has now been extended to the rehabilitation of wrist injuries and hemiparesis in stroke patients (e.g., Altschuler et al. 1999). Two very recent papers performing a meta-analysis to assess improvement for both lower (Broderick et al. 2018) and upper (Zeng et al. 2018) limb extremities attest to the therapeutic efficacy of this procedure. How is this mediated in the motor system? Do the visually responsive neurons in the motor system we just mentioned play a role? Whatever the reason, it is clear that a simple visual illusion carried the day in medicine when drugs and surgery failed. And it is important that this result is not restricted to phantom limb problems, which could conceivably be considered as a special case; the fact that it helps patients with stroke and wrist injuries put this concern to rest. Ramachandran claims that this calls into question what he calls a hierarchical and serial organization of vision and action. Furthermore, this finding demonstrates that vision plays a key role in determining one's body image, which in turn is critical for the motor system. Figure 1 is admittedly an extreme view, and given current and widespread knowledge it does not hold up as an intellectually defensible theoretical account. Yet, it persists likely as a fallback, particularly in the absence of broader theoretical accounts that would provide alternatives. In this review, we present empirical evidence along with relevant theories, some not widely known, to showcase needed alternatives. Therefore, in this section, we present examples of how visual concepts, imagery, and imagination can have unexpected motor consequences. Song & Nakayama (2008a) presented subjects with a row of three buttons, arranged horizontally on a screen in front of the observer (see Figure 3). Arabic numerals appeared in the middle button. Instructions were to hit the middle button if numeral 5 appeared, hit the left button for numerals 1-4, and hit the right button for numerals 6-9. As such, it was a simple task, where one of just three discrete responses was required. Measurement of hand trajectories revealed something unexpected, especially if one thinks that we have a set of serial stages, as depicted in Figure 1, where processing completed at one stage is passed on to the next. In Figure 3, we see that there is a progressive skewing of the trajectory toward the 5 button as the numeral increases from 1 to 4. This supports the idea that our motor system is influenced by a hypothetical number line corresponding to what we have been taught in school, that is, that the numbers are lined up as a continuum as we go from left to right. While subjects eventually reach the target, along the way it is as if they were aiming at points on an imaginary number line. More recently, researchers have found a similar pattern even with children as young as 5 years old (Erb et al. 2018). In addition, this basic finding has been replicated and extended extensively by Dehaene and colleagues (Dotan et al. 2019). Similar leakages of earlier visual and mental representations have been seen in other discrete pointing tasks as well (Spivey & Dale 2006, Finkbeiner et al. 2008, Song & Nakayama 2009, Song 2017.

Leakage of a Visual-Numeric Concept into the Action System
So, what does this mean? At the very least, it suggests that the coupling of earlier stages to the motor system is more entangled and does not deliver such discrete signals as might have been expected from separate serial stages.

Imagining Action and Watching Others Act
Before the 2006 World Cup, one of the world's best footballers, Ronaldinho, said, "When I train, one of the things I concentrate on is creating a mental picture of how best to deliver the ball to Spatial number-line concept leaking into motor trajectories. Participants are shown a single-digit Arabic numeral in a center square and asked to compare its value with the standard number 5. They then reach for and touch one of three squares on the screen: the left one for "less than 5," the center one for "equal to 5," or the right one for "greater than 5." The panels depict examples in which the value of the target is (a) equal to or (b-e) less than the standard. Panels b-e demonstrate gradual shifts of the reach trajectories toward the center square as the difference in value between the target and the standard decreases. Figure  a teammate, preferably leaving him alone in front of the rival goalkeeper" (Cumming & Ramsey 2009, p. 5). Such mental stimulation through either motor imagery-i.e., imagining the execution of an action without physically performing it-or action observation has been advocated for years in the sports psychology and training literature. Coaches and educators are enthusiastic about its benefits, and there has been much documentation that has been reviewed by Cummings & Ramsey (2009) as well as more recently (Guillot et al. 2021).
However, it is not clear what might be going on. At one extreme, it is possible that such practices have only a very general effect and that the benefits accrue from increased motivation and enthusiasm, as has been found more generally in coaching situations. Alternatively, very specific and beneficial alternations of neural networks of the motor and premotor systems could be at play (Kreilinger et al. 2021).
Several lines of study support the view and practices of coaches and educators, showing that there are likely to be specific consequences beyond just motivating and encouraging higher performance. First there are neuroimaging data, then there is support from targeted electrophysiological studies, and finally there is the emerging field of neuro-prosthesis. We describe these in turn.

Brain imaging correlates of action, imagining action, and observing action.
Does brain activity show similarities between action observation and imagination and the action itself? This has been of interest for a very long time ( Jeannerod 2001). Hundreds of neuroimaging studies starting in the mid-1990s have been reported. To address this question systematically, several meta-analysis studies have been conducted. Most notably, a very large coordinate-based meta-analysis has been done recently by Hardwick et al. (2018). They compared data from human neuroimaging studies, examining the brain networks involved in motor imagery, action observation, and actual movement execution. The latter usually consisted in making flexion or extension movements of the hand, arm, or legs. Action itself had the most circumscribed cortical pattern, around M1. The authors reported that motor imagery and action observation recruited roughly the same premotor-parietal cortical networks. However, contrary to some earlier reports, action observation did not reliably activate the primate motor cortex, M1. So, while motor imagery recruited a similar subcortical network to movement execution, action observation did not consistently recruit any subcortical areas. These data demonstrate the similarities in the networks for motor imagery, action observation, and movement execution, while highlighting key differences. The lack of subcortical motor structure activation with the observation of motor actions seems of particular interest, insofar as they suggest that the basal ganglia is somewhat more closely related to the motor actions themselves than are some premotor areas, which are perhaps more flexibly related to actions and action intentions. In sum, it seems very clear that action observation and action imagination have very specific effects on premotor and motor systems.

Excitability of specific muscle groups during action imagination and action observation.
The results above, as strong and conclusive as they are, do not show that action observation or action imagination actually activates the exact neural networks responsible for the specific actions. By itself, fMRI is just too gross a method to probe at such a specific level. To address this question, Fadiga et al. (1995) used transcranial magnetic stimulation (TMS) of the motor cortex during the observation of actions of others. Pioneered by Barker et al. (1985), TMS was shown to elicit electrical activity, called muscle evoked potentials (MEPs), in peripheral muscles, as recorded by surface electrodes at the site of specific muscles.
Yet TMS of motor cortex (M1) itself is a blunt instrument activating MEPs in a wide range of adjacent muscles. Nevertheless, this broad response can be taken as a baseline upon which one could see specific modulations of each muscle MEPs, in particular those responsible for various actions. The individual muscles involved with thumb or elbow flexion or extension were recorded while subjects were either observing or imagining these same range of actions. The MEPs from those specific muscles were found to be differentially increased for the corresponding actions (Fadiga et al. 1998). This supports the long-standing hypothesis that visual training and experience can have highly specific and beneficial motor effects, as claimed by coaches and practitioners of sports and rehabilitation medicine.

Neuro-prosthesis.
Too large a topic to cover in any detail here, but clearly of interest, is the dramatic success of otherwise paralyzed transected spinal cord patient patients who can manipulate a robot arm to grasp a cup of water and bring it to the mouth, tip it, and drink. This was accomplished either by recording from permanently implanted multi-electrode arrays in the primary motor cortex (Hochberg et al. 2006) or using similar arrays in the parietal cortex (Andersen et al. 2019). Added to this were major advances in computer power and machine learning. The result is that after a period of learning with the apparatus, the patient imagines the robot arm doing the task, and the task is accomplished. We see this as a vindication or at least an illustration of ideomotor theory, which we discuss in Section 8.1.

MOTOR SYSTEM INFLUENCES ON VISION
Back in Section 3, we listed a surprising number of studies in which single units all through the basal ganglia and motor cortex showed reliable responses to visual stimuli. Moreover, there are PS74CH19_Nakayama ARjats.cls August 31, 2022 12:38 motor influences on the visual system as well, one example being the widespread modulation of visual receptive fields by eye and body position.

Visual Enhancement Led by Actions
The question is whether simply learning a motor act would have a direct influence on vision. We have already reviewed evidence that in humans, simply watching the actions of others leads to very specific potentiation of the same muscles that are active in the observed subjects. Is there some reverse connection as well, such that motor learning alone modifies visual capacities? Point-light walkers ( Johansson 1973) have been a boon to those interested in human actions, providing a stripped-down version of human activity without other identifying information. Yet, with this very unusual sparse stimulus, observers can identify gender, specific persons, and general mood (Cutting & Kozlowski 1977). Of particular interest is whether people are good or perhaps even better at recognizing their own identity when all other identifying visual clues are missing.
Surprisingly, this topic has a long history. Long ago, Wolff (1932) showed that persons otherwise disguised by baggy clothing and head obscured were able to identify themselves from films of them viewed sagittally. Later, many others researchers used point-light walkers and made the same claims (Cutting & Kozlowski 1977, Beardsworth & Buckner 1981. Most recently, Loula et al. (2005) conducted the most systematic study so far that found that people were able to identify themselves more easily than the point-light walkers of others. We look at ourselves in the mirror frequently but never, or at least very rarely, do we see ourselves walking, particularly if viewed from the side.
While each of these studies was conducted with care, taken together, they were not fully convincing. Wolff's (1932) subject could have recognized some other aspect, such as the particular baggy clothes they might have remembered wearing when being a model. The latest study (Loula et al. 2005) did not show the self-advantage except for very expressive movements, in particular dancing and boxing. As mentioned in that paper, college students are quite self-conscious about their own expressive movements and may have remembered some more general aspect of their own performances that is possibly identifiable, such as their verve and skill. So, while the selfversus-other paradigm has intrinsic appeal, it should not be our sole source of information. It is clear that other approaches are needed.
Clearly what is needed is to develop a new motor competence by itself, without vision, and then determine whether this can be reflected in some test of visual performance. Mindful of these issues, Casile & Giese (2006) used point-light stimuli and had subjects do a same-different task in pairs of videos that were either the same or only slightly different. In the normal pattern of walking, the arms are 180°out of phase with the legs, and presumably it would be relatively easy for subjects to discern subtler differences from this commonly experienced pattern. This was indeed the case. Two unusual arm motion patterns, 225°and 270°out of phase, not associated with usual stable walking, were also presented, and deviations at these phase angles were much more difficult to discern. The main experiment was to test subjects before and after training sessions in which blindfolded subjects were trained to oscillate their arms with the 270°phase relationship. Some subjects became experts in making these movements, others less so. After blindfolded motor-only training, improvement in performance only occurred for the key 270°condition, showing, as hypothesized, that blind training increased visual skill. While the overall effect of the training was modest, the authors' claim was more persuasive insofar as the good visual performance after training was positively related to the amount of nonvisual training.
In terms of experimental design, we think this is some of the best evidence for a deep connection between action and vision. However, we are concerned that it is based on a single experiment and on very few measurements. It would be nice if others or the same group could replicate such an 19. 10 Nakayama ARjats.cls August 31, 2022 12:38 experiment or conduct others with this clarity of design. Furthermore, could this imply that there is some connection between proprioception and vision? Partially addressing this issue, Saygin et al. (2004) reported that observers viewing point-light stimuli did show activation in the premotor cortex.
Guo & Song (2019) developed a dual-task paradigm in which participants prepared an action (e.g., grasping) while concurrently performing an orientation discrimination task. They experimentally manipulated the fluency of the grasping action by requiring subjects to use either an easy or a hard grasp. Ruling out dual-task costs, they found that fluent action led to improved perceptual discrimination. Subsequently, they reported that improvement in orientation discrimination is led by improvement in precision-grasping training. Because this is a very low-level visual discrimination task, this finding raises a further question as to whether this improvement is mediated by perhaps even earlier visual mechanisms than suggested by the studies mentioned above.

Better Perception in Relation to the Active Hand
From the perspective of the primate motor system, hands loom large. Very pertinent to the topic of this review, there exist neurons in monkey parietal cortex that have dual and matching receptive fields, one somatic and one visual. In a study of tool use, Iriki et al. (1996) described such neurons in the intraparietal sulcus by placing a monkey's arm and hand on a table waist-high in front of it. In one neuron, the somatosensory receptive field was essentially the whole surface of the palm.
The visual receptive field was tested by introducing food pellets in the area and recording the neural responses as the monkey used a hand or a rake to reach them.
When the monkey was using a hand to retrieve more distant food pellets, the area of visual responsiveness was circumscribed to the area of the hand itself. After using the rake, the area of visual responsiveness became much more extended along the axis of the rake. This in turn raised the issue of body image, suggesting that body image was extended along the rake itself as a result of using it.
Such results cry out for psychological experiments to more fully understand the likelihood of these multimodal visual/tactile neurons. Researchers have devised behavioral experiments to examine the possible role of these neurons. Reed et al. (2006) showed that when human subjects placed their hands near the left or right side of a computer monitor, reaction times to stimuli placed nearer to the hand were shorter. The results here are robust and have been replicated repeatedly, and many of these studies have been reviewed by Tseng et al. (2012) and Perry et al. (2016).
Pertinent to our topic, it needs to be established whether enhanced visual processing is specifically related to planned action or just reflects the fact that greater attention is allocated near the hand. Reed et al. (2010) addressed this question by indicating that while both processes were operative, a specific relation to action was evident. Employing a rake, the tool also used by Iriki et al. (1996), they were able to show similar visual processing advantages adjacent to the hand-held rake, but only when it was used as a tool.
More recently Thomas (2015Thomas ( , 2017 has taken this idea further by specifying the exact type of action used. She had subjects manipulate objects in two different ways. In one situation they were to use a precision grip, using thumb and forefingers, and in a second situation they had to use a power grip. She hypothesized that each type of action had different visual requirements, with the precision grip requiring high-resolution spatial information and the power grip generally requiring dynamic processing. She obtained the predicted results, a double dissociation of visual performance appropriate to each grip, hinting that this reflected corresponding magnoand parvo-cellular function. McManus & Thomas (2020) showed that there were limitations as to when a tool could qualify as part of a body image. They found that only with hand-held tools was the effect of tool use evident. The study does show a bias toward a natural use of tools, but the success of teleoperated systems (in medical surgery and other applications) does not rule out that with much more extended training, more indirect tools could also be incorporated as part of the body. This is only a short description of a wealth of studies. It indicates that it is very likely that changes in visual processing tailored to specific motor tasks occur. However, the level of visual processing that is involved is mostly unspecified. The dissociation between two different kinds of visual processing and two kinds of hand grip (Thomas 2015(Thomas , 2017 is quite surprising in its specificity, going far beyond some kind of general attentional modulation. Because the parvo and magno visual streams are thought to merge at higher levels, one possible implication is that the motor system influences vision closer to the periphery in the extrastriate cortex. The anatomical locus of such adaptations has been left unspecified, but Iriki et al.'s (1996) work on the parietal cortex is suggestive. In addition, in another study, there is some hint that the tuning of receptive fields of monkey V2 neurons can be modified in relation to specific actions (Perry et al. 2015).

Visual Search in an Action Context
Traditional work on visual search has characterized the role of low-level visual features and perceptual organization in determining how quickly targets are detected and where attention is directed (Treisman 1985, Wolfe 1994. As such, a salience or heat map has been calculated (Itti & Koch 2000), accounting for a good range laboratory findings. However, actions in these cases were just button clicks. No attempt was made to see whether these findings would apply to situations in which subjects are actually doing things, and whether they would predict deployment of attention and eye movements in naturalistic settings.

Eye movements in natural settings.
The study of attention and eye movements with respect to action in more natural settings has not been neglected. This includes research on reading (O'Regan 1990, Rayner 1995, music reading (Weaver 1943, Land & Furneaux 1997, and steering a car (Land & Lee 1994). Although these studies represent real-world settings, all share the repetitive nature of laboratory experiments and the actions are extremely simple. Many things we do are not like this. Food preparation, housework, gardening, and carpentry involve a succession of subtasks that are different from the ones just described. In addition, subtasks often involve actions with one or more objects, often in cluttered environments. To make stew, we could be peeling and chopping carrots and then placing them in a pot. Both involve two objects or more.
Along these lines, we are grateful for two landmark studies. Making a cup of tea in England (Land et al. 1999) and making a peanut butter and jelly sandwich in America (Hayhoe 2000) have the requisite sequences of actions and subactions. Each study got videos from the mobile observer's eye position and aligned eye gaze direction with sufficient accuracy and precision (1°or so). Later, the two studies were compared and summarized in a joint paper (Land & Hayhoe 2001). While there were differences between them, many important commonalities were evident.
Most important and surprising was that eye movements were closely tied both temporally and spatially to the requirements of the specific subtask. For example, making tea is certainly a welllearned or overlearned task. People are often listening to music and have no awareness of where their eyes are pointing. Yet, each action is reliably preceded by an eye movement to the next object of interest approximately a half second before the hand moves (in the case of tea making). Furthermore, the exact patterns of eye movements and hand movements were highly consistent 19. 12 Nakayama ARjats.cls August 31, 2022 12:38 across subjects, suggesting strong task demands. Hayhoe (2000) put many distracting irrelevant objects on the table for sandwich making. Even in such cluttered environments, whereas previous research would predict that conspicuous low-level image properties would draw the eye toward the distracting objects, this rarely occurred. Adding to this, in an entire sequence of tea making consisting of 250 saccades, only one or two were irrelevant to the task (Land et al. 1999). This indicates the limited applicability of image-based visual search models (Wolfe 1994, Itti & Koch 2000, which could be irrelevant in light of action-based considerations.

Focal versus distributed attention.
The studies discussed above show persuasively the importance of the ongoing cycle of actions to direct our attention and thus our perception. Missing, however, is some better description of the nature of attention that is required for directed action toward goals and objects. Especially important is focal attention, deployed very locally in the service of discerning some fine detail. This is to be contrasted with distributed attention, where larger areas are apprehended. This variable range of attention was articulated in a zoom lens model of attention (Eriksen & St. James 1986) and supported by studies by Jonides (1980) and Sperling & Melchner (1978). Distributed attention would necessarily cover a wider spatial area with low spatial resolution in comparison to focal attention, which would permit the appreciation of fine detail over a much smaller spatial locus (Nakayama 1990, Nakayama & Martini 2011. A persuasive argument for the distinction between distributed and focal attention is provided by Sagi & Julesz (1985). In their study, multiple targets are placed within a homogenous texture field of distractors (short horizontally oriented line segments). The targets, three diagonal lines, can vary randomly, being at either 45°or 135°. The three oblique targets are arranged as in a triangle, and the subject is determining whether the three targets form a right triangle. To do this task, each target needs to be localized sufficiently accurately and the task is done quickly, much more easily than when subjects are asked to identify the targets (as placed at either 135°or 45°). Thus, the location of multiple odd items can be readily determined even when the orientation identity (a finer discrimination) cannot be reported. Most importantly, it seems that distributed attention easily handles the locations of the three odd targets, and focal attention, which would identify the orientation of the oblique targets, is unnecessary. From this study it would seem that distributed attention would be adequate for ordinary eye and hand movements, especially where no high-acuity task is required.
The very specific eye movement behavior found by Land & Hayhoe (2001) is suggestive of this, but the issue was not directly addressed. Kowler et al.'s (1995) well-cited study was a good first start. Previous to this, it was not clear whether saccades even required attention, even though Fischer (1987) made this claim. Careful efforts by Kowler et al. (1995) yielded clear results: Attention was deployed just before a saccade. However, only a very tiny amount of attention was evident. This study adopted a straightforward design using letter recognition at the target site of the saccade. Also using letter recognition for reaching, Deubel et al. (1998) showed that just before the reaching movement, letter recognition was better at the target site in than for nearby letters. These studies were the first to show increased visual discrimination in relation to the goal site of a future action, applicable to saccades and to reaches.

Is focal attention really necessary for actions?
While the results described above are clear, they do not show that focal attention is required for these motor actions. The characteristics of such dual-task experiments naturally bias the subject to perform a certain way, often identified as demand characteristics. If subjects are asked to do letter recognition, they will comply and act accordingly and thus show signs of focal attention. We cannot assume, however, that focal attention would really be needed if it was not asked for. So far, we can also assume that distributed  attention might do just as well. The localization of an element, possibly sufficient for a directed motor action, would not require fine discernment of its shape.
What is needed is to test for focal attention without the confounding effects of a fine target discrimination task. In other words, is there a way to show the presence of focal attention without asking for it directly? An opening came from a study of visual search that revealed a characteristic behavioral signature for focal attention (Bravo & Nakayama 1992).
In this study, each observer participated in four different visual search conditions. A multielement display of red and green diamonds was presented against a dark background, similar to what is shown in the top row of Figure 4. The target had the opposite color among distractors, which could vary randomly in number on any given trial. There were two different tasks. First was the usual detection task, which has been conducted many times and documented in hundreds of papers: The subject had to press a key as soon as they saw the odd target. The second task was a very simple extension of the basic visual search task but had never been reported on previously. Here, the participant was required to report a subtle aspect of the odd colored target and to identify whether the diamond was truncated on the right or on the left. This second task was different from the classic visual detection search task in that a detail of the target had to be reported. This presumably required the deployment of focal attention.
Individual trials within each of the two tasks were either (a) blocked, that is, the target color remained the same over many trials; or (b) mixed, that is, the target and distractor color changed randomly from trial to trial. Results for the detection task were as expected. Reaction times were very short and did not depend on the number of distractors, confirming the results of published studies that are not discussed here. The interpretation here is that for the first task, that of simple detection, focal attention is not necessary and the result is just a one-shot recognition task with distributed attention to the whole display [as more fully explained by Nakayama & Joseph (1998) and Nakayama & Martini (2011)].
The results for the second task are shown in the left bottom panel of Figure 4. Reaction times declined with increasing numbers of distractors for the mixed condition. In the blocked condition, while reaction times were still relatively high, reaction times were lower than in the mixed condition and there was no increase or decrease associated with varying the number of distractors.
Focal attention is required to do this second discrimination task. The focal attentional deployment in an array of sparse distractors is a challenge. For example, in the mixed condition with only two distractors and where the target color changes randomly, discerning exactly which one is the odd target is not obvious. Attentional deployment to the odd target here takes greater time. This is not the case for the larger array of distractors, where the odd target is much more evident.
How, then, is it easy for focal attention to be deployed in the blocked condition, even with just two distractors? Well, of course, the subject could actively remember that the odd target had always been the red or the green one. Surprisingly Maljkovic & Nakayama (1994) discovered that subjects do not use such a higher-level cognitive strategy. Instead, there is an implicit, unconscious short-term memory aiding focal attentional deployment. Each time a target color is presented, it primes attentional deployment for subsequent trials for this same color, and this influence wanes in time. As such, trials as far back as a dozen trials before the current one can influence the reaction time for the same color. The time course of this is depicted in the top panel of Figure 5. Priming Time course of priming for perception (top) and saccades (bottom), characterizing the advantages of repeating the same color for each trial in terms of the positions of trials in the past and future. The x-axis is the trial position in the past or the future in relation to the current trial. No effect is expected from future events. Top and bottom panel from Maljkovic & Nakayama (1994) and McPeek et al. (1999) here can accumulate more or less linearly over many seconds and thus explain the reaction times in the blocked condition (see lower left graph in Figure 4). Taking these findings together, we have a three-part signature of focal attentional deployment: It is deployed more quickly with more distractors when target color varies randomly, it is faster and constant when target color remains the same (Figure 4), and it has memory with a characteristic time function (Figure 5).
We can now address the question at hand to see whether focal attention accompanies motor actions without the need for a fine-grained visual discrimination task. Would this same signature of focal attention derived from a perceptual study have the same characteristics? The bottom middle and right graphs of Figure 4 show that the same signature of focal attention is seen as a function of distractor number and of mixed versus blocked conditions for saccades and manual pointing, respectively. It is clear that the same pattern seen in the perceptual experiment is obtained (Figure 4, left bottom). Adding further specificity to the common signature of focal attention, Figure 5 (bottom) shows the same memory time function for saccadic eye movements as was seen for perception (Figure 5, top). Also, not shown here are comparable results of manual pointing in both humans (Song & Nakayama 2006) and monkeys .
What links all these very similar experiments is that what is primed is not the individual indicator of the priming-perceptual or motor-but the target of previous focal visual attention allocations. This is further strengthened by reports using a mixture of tasks, key presses, saccades, and pointing presented at random. Here, there was near-equivalent priming between trials in which the target was a different motor response (Moher & Song 2014. This reinforces that the common process primed is focal attention, not any particular motor response. Overall, this shows that focal attention in distinction to distributed attention accompanies all visually directed motor acts, even in situations in which high-acuity visual processing is not required. Does this mean that there is a special role for focal attention, unrelated to its perceptual function (Allport 1987)? Here, recent work on possible reference frames for motor action could be relevant. Head-based, eye-based, and arm-based systems are conceivable. Because of gain fields (see Section 3), all coordinate frameworks and origins can in principle be derived from each other. However, Andersen and colleagues have found through experimentation that just one reference frame is the rule, the eye-based one (Snyder 2000, Cohen & Andersen 2002. They speculate that this allows more natural integration across sensory inputs, especially from hearing. An eyebased system centered on the fovea also identifies the focus of the action and also places it in the animal's environment explicitly; it is thus jointly represented with respect to the body and the surrounding environment. Such ideas are not original, and they have been articulated by others (Ballard et al. 1997, Land 2012) who have advocated a role for eye position independent of perception in aiding motor behavior. In essence, this perspective links an egocentric framework to an allocentric (environmental) representation, possibly of great importance, being the coordinate origin of action in relation both to the body and to the pertinent part of the world.
One additional series of experiments shows perhaps an even more direct example of how, in an action context, visual search results can be completely different from the usual pattern and how even in what might seem a straight vision experiment, motor goals reverse the usual pattern of results. Moher et al. (2015) showed that the typical disruptive pattern of a perceptually salient distractor in an otherwise normal visual search array was significantly reduced when a motor response was to be made to the odd target. This finding, along with Hayhoe's (2000) lack of otherwise expected distractibility during nature behavior, provides mounting evidence for the importance of action in what might seem to be an experiment on vision only.

Paradoxically, a Stable Aspect of Conscious Visual Perception Is Not Influenced by Action
A long known and perhaps most fundamental aspect of visual perception is that it is not influenced by action. We make saccades 2-3 times a second, at more than double our resting heart rate, with the retinal image shifting abruptly each time. Yet this seems to be of no consequence for us as perceivers. We do not even know we are doing it. Similarly, as we move our head and bodies, the retinal image changes as well, but the world remains essentially steady and present. The stability of the perceived visual world has been recognized as an issue at least since Helmholtz (1896), and for almost a century his explanation has been tacitly assumed, that is, that there is a compensating efference copy of the eye movement signal that is sufficient to stabilize the perceived world. Recently, Bridgeman (2007) has critically reviewed the topic and has shown that such explanations have never been adequate. In addition, they may not be as necessary as previously assumed. In its place, he mentions that we are only modestly aware of our visual world at all, citing change blindness (Simons & Levin 1997). This topic is closely related to the topic of visual consciousness and consciousness more generally, which we take up below as we discuss ideomotor theory and Merker's (2005) theory on the motor origins of consciousness.

PARALLEL PROCESSING IN THE MOTOR SYSTEM
Parallel processing in vision and other sensory systems is well recognized. However, with the motor system, this aspect has not been so clear. At any given moment our bodies can be doing only a few things at a time. Along these lines, behaviorists argued that even complex actions are inherently serial: Each tiny action is linked to the one before by associative learning. In a classic essay, Lashley (1951) argued against such serial chaining, citing Spoonerisms, a speech error whereby different parts of sentences can get transposed-for example, "our queer old dean" instead of "our dear old queen." He argued that only a hierarchical parallel system, holding both words in question at the same time, would make an error like this, whereas a serial chaining system would not. Parallel processing is evident for both saccades and reaching in a visual search-like paradigm with stimuli similar to that seen in Figure 4. Because of the priming just described, often a formerly well-primed color will appear as a distractor on a particular trial, and thus focal attention will initially be drawn to it even though it is not the odd-colored target. Testing human subjects, McPeek et al. (2000) often found pairs of saccades in quick succession, the first to the primed distractor and the second to the odd-colored target. The temporal intervals between the two saccades were unusually short, too short (0 to 100 ms) for the second saccade to have been prepared and launched after the first. In other words, preparation of the two saccades overlapped, such that the second saccade could be programmed even before the first saccade was launched. Based on this human study, McPeek et al. (2000) concluded the two saccades were prepared concurrently, that is, in parallel.
Adding much further support to this idea, McPeek & Keller (2002) found essentially the same result in the superior colliculus in alert, behaving monkeys, where they actually observed neural activity corresponding to the site of the second saccade. In a simple yet elegant experiment, they observed the exact timing and its spatial location in the colliculus at the same time the first saccade was in flight. The activity in the colliculus for the second saccade was appropriately located spatially and present during the first saccade. Thus, the preparation of the second saccade overlapped with the first in time. This, to our knowledge, is the first example of a detailed neural description of motor parallelism, identifying the neural structures and timing of activity of two specific concurrent processes. Corresponding behavioral evidence for concurrent processing related to reaching/pointing has also been demonstrated in humans (Song & Nakayama 2008b) and in monkeys . This was based on the timing of curved hand trajectories, indicating that parallel processing is not just some peculiarity of eye movements and that it is likely to be a more general property of motor behavior.
We should also mention that there is another very different example of parallel processing that we humans are doing all the time. In multitasking, we can be listening to the radio, opening a window, and driving in traffic, all seemingly in parallel. This is a highly researched area in cognitive psychology, and it is clear that many processes can happen at the same time with little evident cost.
However, with certain dual-task situations, there is an irreducible limit to parallelism (Pashler 1984). There is a hypothetical central process that cannot be shared in the course of dual-task situations. During this time, called the psychological refractory period (PRP), this process cannot be shared, and if the two tasks overlap in time, performance will inevitably drop. This is an almost universal finding. However, Anthony Greenwald and colleagues have found that there are some exceptions to this. Because this is best understood in terms of the theories we will present, we leave this for the next section.

THEORETICAL PERSPECTIVES
We have surveyed a small sample of contemporary findings, many of which challenge the simplest assumptions and call for some broad thinking to at least supplement the see → decide → act framework. We survey three theories below that are directly relevant for the experimental results we have described. In doing so, we also acknowledge that we are selecting the ideas that seem suitable for our argument.

Ideomotor Theory
Most enduring is ideomotor theory, described by William James (1890) and long forgotten but revived in several guises, notably by Hommel et al. (2001). All are far broader in scope than the see → decide → act framework. Ideomotor theory proposes two phases for representing actions (see Witt 2018). At first, associations are learned between an action and its perceptual/sensory effects. The perceptual effects of an action include the effects on the external environment and on the body itself. As an example, consider grabbing an apple from a tree branch. The sensory effects include seeing the apple, the tactile feeling of the apple, the feeling of the body and arm outstretched, and then the feeling of the apple freeing itself from the moving branch. According to ideomotor theory, associations are learned between each of the sensory effects of grabbing, pulling, etc. Once these associations are learned, and ever after, in a second phase the simple thought of getting the apple is sufficient to trigger the action. Actions are not represented as a sequence of component movements; rather, actions are represented just by their sensory/perceptual outcomes. This is a uniquely psychological theory, in contrast to a physical, neural, or computational theory. There is no specification as to the masses, forces, velocities, joint angles, neurons, etc. The person knows little or nothing of these. The actor simply imagines and desires a perceptual outcome and it's done.
At least implicitly, this theory asserts without question the existence of consciousness and, furthermore, that consciousness leads to action. Not surprisingly, this was vehemently denounced by the early behaviorists (Thorndike 1913) as magical thinking, and until very recently, consciousness was a word avoided in science. Now, after more than 100 years, it is a topic of great interest, with many well-cited theories and reviews (Baars 1993, Tononi 2008, Dehaene 2014.
The long-standing shunning of consciousness, however, was never complete. For over a century, vision scientists have depended on visual awareness/consciousness to conduct their perceptual and psychophysical studies. As scientists and even as materialists, they just forged ahead. Motor

19.18
Nakayama ARjats.cls August 31, 2022 12:38 scientists who took ideomotor theory seriously sidestepped the question of how mere thoughts could change the physical world with its firmly established causal laws. No one really has found a satisfactory explanation as to how mind and matter interact (Searle 1992). The beauty of science historically is that some scientists have not paid much attention to such troubling contradictions and have just gone on with their work, occasionally with great success. Perhaps the most exciting new results pertinent to ideomotor theory is demonstrated by the existence of neuro-prosthetic devices. Their evident success serves as an ideomotor theory demonstration project. The sampling from just 100 neurons in parietal or motor cortex allows a tetraplegic person to manipulate a robot arm so as to grasp a cup and bring it to the mouth, thus allowing them to drink (Hochberg et al. 2006, Andersen et al. 2019. The instructions seem to be straight out of an ideomotor theory playbook: "Just imagine the arm grasping the cup, drawing it near to you. . ." In early training, subjects are presented with a computer screen and are instructed to move the cursor to desired loci. At other times, the successfully manipulated robot arm is completely separate from the patient's body, sitting in front of the patient, about a meter away in full view. Earlier, we mentioned unusual examples of body images, including the hand in a mirror box and a rake. Now added to this are much more unusual things, such as a cursor on the screen and a robot arm not even attached to the body. All of this speaks to the infinite malleability of the body image serving motor requirements, in evident accord with ideomotor theory. Where in the brain might these systems reside? The existence of visually responsive neurons in the premotor and even the motor cortex itself has been well established, so they are likely candidates. We reported that just imagining or observing a particular motor pattern selectively activated specific muscles of the hand, potentiating just the muscles of the thumb and forefinger (Fadiga et al. 1995). This again attests to the extremely strong coupling between visual consciousness and very specific muscle contractions.
Making this connection to ideomotor theory with even greater specificity, Umiltà et al. (2008) trained monkeys to pick up food with two kinds of pliers attached to their fingers: regular pliers, where one closes the finger grip to grasp the food, and reverse pliers, where one has to open the grip for the pliers to close. Monkeys learn to use both expertly, also switching from one kind to the other. The real test for ideomotor theory came after training. Do the premotor neurons respond in sync with monkey's physical grip status, opening or closing the fingers, or do they respond in synchrony with the opening and closing of the pliers themselves, independently of the physical hand grip? The result was clear: Neurons responded according to the state of the pliers and not of the hand. The body image of the pliers overrode that of the fingers for this particular situation. Goal and function determine a new body image of the pliers, supplanting that of the real body itself.
Ideomotor theory, proposed over 100 years ago, is surprisingly relevant and provides a framework to account for some of the most technically advanced and elegant studies reported. Merker (2005Merker ( , 2007 proposed a new theory of consciousness, arguing that the challenges faced by mobile animals very early in animal evolution led to its appearance. The basic premise is that animals, especially highly mobile ones in open environments, need to distinguish afferent input from the environment from afferent input generated by their own actions. The traditional solution offered is that an efference copy of the motor command allows for a cancellation of the self-generated afferent input. As mentioned earlier, Bridgeman (2007) has critically reviewed this literature in humans and has found such explanations insufficient. Merker (2005) acknowledges that efference copy likely works for some animals, say the earthworm, where the sensory stimuli and the actions are relatively simple. However, for animals that move in a more open environment with many sense organs, some of which move relative to the PS74CH19_Nakayama ARjats.cls August 31, 2022 12:38 body, it is just too complicated to compensate for each action and subaction. What is needed is a forum where all kinds of disparate information just give the best estimate as to the important realities, the state of the body in relation to the environment, so that decisions can be made based on motivational and emotional states. Furthermore, this is an arena where adaptive associative learning can occur most broadly. One needed aspect of this reality function is consciousness. It should be pointed out that against most contemporary views, consciousness is not a higher neural function. It is present in animals who do not have a neocortex as well as in many others (Merker 2007). As one neuroscientist put it, "Consciousness is not critically related to being smart; it is not just clever information-processing. Consciousness is the experience of the body and world, without necessarily understanding what one is experiencing" (Panksepp 2007, p. 102).

Consciousness as a User Interface for an Evolving Motor System
Merker posited the existence of consciousness in the most primitive vertebrates, identifying specific neural structures, especially the tectum, hypothalamus, and basal ganglia. These are present in all vertebrates and even in vertebrate precursors, appearing 500 million years ago. Based on Merker's reasoning, Barron & Klein (2016) also argued for the existence of consciousness in insects. Land (2012) made the same claim for insects as well.
Before continuing, we note some accepted points about consciousness, which arguably apply also to these more ancient phyla. In his global workspace account consciousness, Baars (1993) observed that consciousness (a) has limited capacity, that is, only very few things can get processed at a time; and (b) allows widely different parts of the brain to be connected to a common space, thus allowing great flexibility. In sum, it is a highly adaptable low-bandwidth general purpose system. Because of the limited capacity of consciousness, it is almost axiomatic that there needs to be something like attention to select what is to be processed (Cohen et al. 2012). Baars (1993), however, did not emphasize or specify any particular contents of consciousness.
For Merker (2005), the content is all-important: This is the body, the environment, and their relationship. Mobile animals in order to survive need a reality space, a user-friendly interface for the motor system. This is what consciousness is for. Almost all brain activity lies outside the realm of consciousness. According to ideomotor theory and established empirical evidence, essentially the whole motor system is outside of this realm. Also, accordingly, much of the visual system itself lies outside of consciousness (Merker 2007, figure 5). We are not conscious of the activities of the retinotopic cortex, with its wildly fluctuating inputs accompanying saccades.
Probably the best hint as to the existence of a reality space comes from some spectacular studies of single cells in the medial temporal lobe of rats and mice, richly deserving of the 2014 Nobel Prize (Burgess 2014). Place cells, grid cells, and head direction cells help to accurately specify the position of the animal in a local environment. These cells transcend individual sensory and motor systems, combining inputs from many parts of the brain (O'Keefe & Dostrovsky 1971, Taube et al. 1990, Moser et al. 2008. If this reality function is so significant for mobile animals, it should be seen very early in phylogeny in supposedly lowly creatures. Consider the fruit fly Drosophila melanogaster, a tiny insect about 2.5 mm long with a brain 250 µm in width. Deep in this brain lies the tiniest ring of neurons in the central complex, an integrative brain region. Using calcium imaging to record neural activity, Seelig & Jayaraman (2015) placed the fly in an apparatus where it could rotate freely in a local environment with a vertical luminous bar as a landmark. What is astonishing is that when the fly rotates relative to this landmark, a corresponding position on the ring becomes active with each successive rotation (see also Fisher et al. 2019). Not only does the animal have head direction cells, which is astonishing in itself, but also the cells are organized spatially as if they were points on a compass. In this insect brain, there is a geometric representation of the local environment. The miniaturization here is extreme-a reality readout function in the drosophila brain's central complex that is far smaller than the width of a human hair.

19.20
Nakayama •  Concerning Merker's (2005) claim that consciousness is both needed and present in the most primitive of animals, new evidence is pertinent. Earlier, we mentioned that one of the important characteristics of consciousness is its very limited bandwidth. Very few things can be processed and only slowly, and thus an attentional mechanism is needed to select what will be processed by this limited, precious resource. This has been well documented and reviewed elsewhere (Cohen et al. 2012). It follows that if we can demonstrate selective attention in an organism, we have some assurance that there is a reason for this, and at least provisionally, that is to steer consciousness and to control its access.
In a recent comprehensive review surveying the possibility of selective attention in insects, de Bivort & van Swinderen (2016) found at least eight persuasive examples of such process. Building on the years of work on selective attention in humans and nonhuman primates to create welldesigned experiments, many different techniques have been used. Gross electrical recordings, analogous to human EEG; highly sophisticated calcium imaging of very local brain signals; and a variety of behavioral tests were also conducted on insects. All showed selective attention according to the current standards of evidence.
Researchers on consciousness have long lamented that animals do not speak and thus cannot provide information as to the contents of consciousness. These studies are the best evidence so far for insect consciousness, and while the reasoning is indirect, the conclusions are as exciting as they are significant. What is appealing is that this is a new way to think specifically about where and when in the animal kingdom consciousness might arise, and in particular what specific function it might fulfil. This issue of a plausible function has been sorely missing during the last two decades of consciousness studies. Contrast this to the work of Tononi (2008), whose theory does not address the fundamental biological question as to the function of consciousness.
We suggest there is a yet-to-be-explored field of sentient ethology that could lie ahead; the consciousness in our phyletic ancestors could be very basic, although now unfathomable to us. 1 Research on change blindness (Simons & Levin 1997) and inattentional blindness (Mack & Rock 1998) shows that our own visual consciousness is extremely limited and impoverished; yet we cope so well with this in ways that are not understood. Insects and our own vertebrate ancestors are likely to also have a very limited visual consciousness, perhaps even more so. Nevertheless, this consciousness plays a needed role. All this is to say that consciousness could be extremely important historically, but its essential nature is not the higher, exalted complex function it now seems to have.
Merker's theory is important because up until now, ideomotor theory was based mostly on psychological observations and experimentation on humans. What is important is that his account provides a much broader context in which to put ideomotor theory, situating it within an evolutionary framework. More broadly, it puts the evolving motor system center stage for our study of the brain.
As an example of the benefits of thinking of ideomotor theory conjointly with Merker's ideas, we can reconsider the PRP, referring to that process in multitasking situations that cannot be shared (Pashler 1984). As just mentioned, in earlier work Greenwald and associates found a set 1 Endless discussion exists as to what consciousness is like in animals, especially those who are very different from us. We offer here something from our own experience that might assist in stimulating one's imagination. Most of us drive cars on controlled-access highways, where for hours we might be listening to music or talking with companions and where we are only dimly aware of the surrounding vehicles moving along with us. Nevertheless, we have a situational awareness of these vehicles that can guide our actions, if needed. Although there is no evidence that this is anything like insect consciousness, it is an example of a reality function that is different from our usual description of consciousness, one that is functional and flexible. of exceptions in which the PRP does not exist, using examples of stimuli and responses that were dubbed ideomotor compatible (e.g., Greenwald & Shulman 1973, Greenwald 2003. These were more natural combinations of tasks that intuitively a person could do-say, give a verbal response by naming a letter that was presented or pressing a key with the left hand in response to a left-facing arrow. Related to this are a whole class of stimuli and responses that are dubbed stimulus-response (S-R) compatible-for example, the Simon effect, where there is a bodily congruence between the response and the stimulus presented. Much of this is related to how the body would respond in a real situation with real objects at hand, not an arbitrary mixture of stimulus response choices. If true, this would emphasize the significance of a real body acting in a natural as opposed to a contrived situation. This, in turn, adds further weight to the view that a coherent representation of the body image is critical for action.

Affordance Competition Theory
Perhaps the most vocal, radical, and consistent critic of the see → decide → act framework has been Cisek (2007) with his affordance competition hypothesis. Cisek is well versed in the long history of cognitive science, and his broader critique is actually extensive and far-reaching, rejecting current reigning frameworks. This approach involves an almost completely different set of ideas and concepts, consisting in tracing brains and behaviors of representative phyla over the past half billion years (Cisek 2019). Whether one accepts this view or not, we can still consider his theory of action. Affordance competition theory identifies two issues regarding motor behavior: action selection and action specification.
Action selection is important because of the finite physical (not virtual) existence of the body: It all has to conform to the laws of physics, in particular, mechanics. We have two arms and two legs, and doing anything (say, reaching toward the left) entails a whole series of postural adjustments so as to keep the center of gravity of the body appropriately placed. Therefore, specific muscles in the abdomen, pelvis, and legs need to be flexed, often in advance of the movements of the arms. This would seem to indicate an early decision as to which action should be chosen. Much of the time, with some notable exceptions, we can only do one or few things at a time, and all the postural preparation for these tasks is needed.
Action specification is the exact trajectory to be taken; the exact muscle forces and joint angles need corresponding specification. Surprisingly, it is the endpoint that is specified and achieved, but often not the several joint angles on which it depends. They can vary considerably for each reach, while the pointing to the final destination is less variable and surprisingly accurate (Todorov 2004). This would seem to be an important finding for any account of motor control.
According to Cisek, traditional theories assume that these issues are resolved in a serial manner, that we decide what to do before planning how to do it. Affordance competition theory rejects this hypothesis, asserting that these two processes must occur simultaneously and, furthermore, that they continue even during the overt performance of these movements.
Just as there are no specific, repeatable joint angles for exact pointing, the process is fluid and dynamically distributed with many competing tendencies, yet it resolves in accurate functional actions. The brain is continuously using sensory information to specify potential actions available in the world (so-called affordances) while at the same time collecting cues for selecting which one is most appropriate at a given moment. Cisek's theory is much more specific and extensive than this, identifying specific brain structures and pathways for such interactions. For our purposes, we need not dwell on any of these details.
As the name of the theory implies, behavior emerges through a dynamic winnowing process in which selection and action need to evolve together. This is well-accepted practice in other, faraway realms. For example, it happens in industry, where the nature of business choices and implementation is complex. Following the practice of concurrent engineering (Prasad 1996), the decision to manufacture a truck over a sedan requires simultaneous exploration of many specifics about implementation, availability of raw materials, transportation, foreign labor costs, etc. The final outcome is the result of a dynamic, iterative, interactive process. As such, decision making here has a parallel with Cisek's brain model.
Taking affordance competition seriously implies that the popular field of decision making as described in Section 2 (see Gold & Shadlen 2007) is unlikely to have applicability outside the wellcontrolled environment explored. The simple accumulation of evidence to make a decision is only a tiny fraction of what is required. Moreover, there is plenty of behavioral evidence that even in very restricted laboratory situations, behavior does not conform to a serial model but is more accurately described by the conception advanced by Cisek. Case in point, the leakage revealed in the numerosity experiment depicted in Figure 3 indicates a higher level cognitive influence on motor trajectories (Song & Nakayama 2008a), the quick changes in saccadic direction (McPeek et al. 2000), and the curved hand trajectories (Song & Nakayama 2008a,b;Song et al. 2009) all show persuasively that Cisek's theory captures something significant and characteristic about behavior. Moreover, even in the simplest of situations, there seems to be a dynamic process where variability is always present, but somehow the goal itself is attained. Although some have called such examples "changes of mind" (Resulaj et al. 2009), in light of affordance competition, they simply represent the normal operation of the motor system.

Psychology Back in Action?
We have reported that vision and action are entangled almost everywhere. This suggests that vision and action should be part of a single field, not subdivided as they are now. Perception, attention, selection, awareness, and goals are all closely related to action. The material just presented and the theories reviewed are suggested pointers. Despite more than 100 years of neglect, we suggest that psychology, with its unique set of approaches and broad scope, could contribute substantially to our understanding of vision and action, including the mind and brain as well.

DISCLOSURE STATEMENT
The authors are not aware of any affiliations, memberships, funding, or financial holdings that might be perceived as affecting the objectivity of this review.