Welcome to my blog.  I want this blog to become a place where we can share information, good references, and ideas about how vision and perception interact to create our visual experiences.  


Two Good Books for the Journey Ahead

I will stick by my earlier strong recommendation of Vision Science - Photons to Phenomenology by Stephen Palmer.   It is still my go-to textbook but, as I mentioned earlier, it is big and kind of pricey at $80 on Amazon.  Please see my 18 June post for a convenient direct link to the proper Amazon page. 

Let me mention two additional books that are more modest in depth but which cover the full scope of this blog and are less expensive.    

The first is Perception by Irvin Rock.   He was a pioneer in vision science.  In this book he has described visual perception in very lucid terms.   It has the kind of clarity that only comes from someone deeply experienced in their field.  The paperback version is very reasonably priced on Amazon.  I bought mine used for under $6, including postage.  I highly recommend the book.  


The second book is Cognition and the Visual Arts by Robert Solso.  It is a less rigorous treatment of visual perception but it smoothly bridges the full scope of this blog, from vision through aesthetics.  It is an easy and interesting read.



Looking Beyond the Primal Sketch

Let's return now to the raw Primal Sketch and to the remarkable scientist, David Marr, who formulated the notion of a Primal Sketch as part of his systemic scheme for thinking about vision. 

Stephen Palmer says of David Marr's last book, entitled Vision, that it is "one of the most influential books on vision ever written...."  In addition to the sweeping range of technical insights in Vision, perhaps Marr's most enduring contribution has been his organizing principle, usually referred to as Marr's "tri-level hypothesis."  It seems to me that Marr could be the Lewis and Clark of vision science.  His tri-level hypothesis has helped to organize the scientific exploration of vision, not only during his lifetime but even to the present. 

What is this tri-level hypothesis?  Simply put, think about vision in three levels.  Like the layers in a Dagwood sandwich, these three levels are integrally linked.  They become three ways to think about vision science. 

Keep in mind what a small sliver of the vision science field we are considering here in this blog.  Vision science includes at least: neuroscience, cognitive psychology, computer science, psychophysics, ophthalmology, neurology, and even linguistics.   Even though some of Marr's specific ideas have been surpassed by subsequent research, his tri-level hypothesis remains a key organizing principle in what I have been able to read of vision science. 

The top level, which Marr called the "computational" level, represents the key questions that vision tries to answer for creatures with eyes and why each question matters to those creatures.  I think of it as the goal level.  In his book, entitled Vision, Marr defines the top level goal of vision: "What does it mean, to see?  The plain man's answer (and Aristotle's, too) would be, to know what is where by looking.  In other words, vision is the process of discovering from images what is present in the world and where it is."  

Imagine that we are ready to market a new "Seeing Machine" which is as good as normal human vision and approved for sale.  The sales brochure for our miraculous product might include the promises of what the machine is intended to do, e.g. provide the user with 20-20 vision within a specified range of uses.  That is the goal implied at Marr’s computational level.  Marr also includes the important question of why each goal exists.  What does a seeing creature gain from vision?  In an evolutionary sense, the answer seems quite simple -- survival.  Our Seeing Machine sales brochure would certainly stress its role in the user's safety.  More than survival, it seems to me that human vision provides a major aesthetic impact upon a person's feelings about life.   Our sales brochure ought to include claims about improvements to quality of life. 

Marr called the middle level the "representation and algorithm" or just the "representational" level -- the scheme by which vision tries to achieve its goals.  His notion of a Primal Sketch is an example of thinking about vision at a "representational" level.  Returning to our Seeing Machine, users will probably need a users manual.  Unlike normal human vision that occurs effortless by early childhood, our user will need some training to become adept in using this Seeing Machine.  The users manual might include details of the steps that the machine will follow (maybe a flowchart or other schematic).  These details would be useful not only to simply explain how the Seeing Machine works but also to permit the user to recognize and diagnose problems which might occur during its use. 

Briefly, Marr called the bottom level the "implementation" level, the precise details on how specific components work, individually and together.  This would be the Seeing Machine maintenance manual, including detailed engineering information about the Seeing Machine components, how they are interconnected, and how each component works.

I want this blog to operate within what David Marr would consider to be the representational level of visual perception.  We may need to make brief excursions to the top, "computational," level when we consider aesthetics in more detail.  Time will tell. I'll stick by my promise in the 24 June post and try to avoid the "fine details of neural anatomy and physiology.” For our purposes, I see little point in dwelling at the nitty-gritty "implementation" level.

Next we will try draw the ideas from Marr and Palmer into a flow chart for visual perception.  This “representational” level integration should give us a framework for considering the key features of visual perception that affect us aesthetically.


Visual Perception, in Five Easy Pieces

It has been too long since my last posting.  I certainly didn't intend that but life happens.   So let's get started. 

I have promised you a flowchart-like overview of the visual perception process.  True to form, Stephen Palmer's Vision Science book has exactly what we need.  He describes visual perception in five stages: 

1. Retinal Image – This stage, of course, refers to the role of the eyes in sensing incident light but it also includes the transmission of visual information to the brain. 

2. Image-based Processing – None of your brain's efforts to reconstruct a suitably accurate 3-D perception of the world would be possible without this initial systematic sampling and analysis of the raw 2-D information that your brain receives from your retinas.  That is the sum and substance of what Palmer calls the Image-based Stage.  It is also the stage which David Marr called the raw and full "Primal Sketch." 

No surprises here.  In Palmer’s own words, this is the stage in which your brain "detect(s) local edges and lines, link(s) local edges and lines together more globally, match(es) up corresponding images in the left and right eye, define(s) two-dimensional regions in the image and detect(s) other image-based features, such as line terminations and 'blobs.' " I will post an exemplar of the image-based process immediately following this post. 

3. Surface-based Processing – Here, your brain recovers what Palmer calls, "the intrinsic properties of visual surfaces in the external world that would have produced the features that were discovered in the image-based stage.” (emphasis added) (I have emphasized this portion of the Palmer quote because it is a crucial aspect of how we need to think about all of these “steps.”  More on that at the end of this post.) 

Palmer also stresses fundamental differences between the surface-based representation of the external world and the image-based representation.  “The surface-based processing produces a spatial layout of visible surfaces in 3-D, whereas image-based processing refers to image features in the 2-D pattern of light falling on the retina."   

4. Object-based Processing – In this stage, your brain calls upon its prior knowledge to create hypothetical 3-D objects from the surface-based scene.  These objects are fully hypothesized in 3-D.  They include not just the surfaces that were visible in the retinal image but also those surfaces in a 3-D scene that you could not have seen from your perspective.   

For example, I am writing this blog on a Macbook Pro laptop.  I see the screen on my side of the open lid but I cannot see the other side.  Yet I know, from general life experience, that the lid is a 3-D solid with sides and a back surface.  More specifically, I know that the lid of my particular laptop has a lighted white Apple logo on that back surface, even though I cannot see it.    

5. Category-based Processing – In this final stage, your brain recognizes the specific object as a member of a broader category of similar objects, e.g. I know that my Macbook Pro is a computer and not a bathroom scale with a cute Apple logo nightlight. 

Along with this categorization comes the functional understanding of the object's "affordance" – what benefit the object “affords” if you use it. Imagine that you have worked very hard for several hours outside on a hot, humid day.  Your spouse brings you an ice-cold bottle of your favorite beverage.  You look at that icy glass bottle from a distance.  You recognize its shape and the label on the “object” but your powerful feeling of anticipation is based upon the bottle's category-based “affordance.”

Finally, let’s briefly return to the earlier quotation from Palmer about Surface-based processing. “…the intrinsic properties of visual surfaces in the external world that would have produced the features that were discovered in the image-based stage.”

Palmer’s wording is extremely important: starting with Surface-based processing, the brain makes conjectures about what actual visible surfaces might have existed that could have produced the patterns observed in the Primal Sketch.  Up to this point, all these steps have seemed to be, in Palmer’s words, “data-driven,” like a top-down assembly line.   His point here is that these stages also involve conjecture, in this case heuristic assumptions, about future steps.  Palmer calls this bottom-up direction “hypothesis-driven.”

These conclusions imply a very high degree of interaction (even iteration) among the portions of the brain responsible for the stages of visual perception.  These stages of visual perception might operate in parallel in order to blend the “data-driven” benefits with those of the concurrent “hypothesis-driven” approaches. 

As promised here, I will shortly post a simple Primal Sketch exemplar to better illustrate the Image-based stage.  In the next post (hopefully tomorrow!) we will return to the whole idea of heuristics in visual perception.  Heuristics certainly increase the accuracy of our perceptions but they also introduce some surprises that, while uncommon, are well known in human visual experience. 


A Simple Primal Sketch Exemplar


I find the written description of the Primal Sketch to be a bit vague without examples.  Palmer's book has several great examples but, to honor his copyright, I have created some other examples for this post.  They are much more limited than Palmer's but I hope they still serve a useful purpose.  

Recall that the information from the retina arrives in numeric form (specifically the neural spike rates which indicate the brightness of the light at each point).  The numerical data are arranged in what Palmer calls a "mosaic."  Maybe we can call each point within the mosaic a "pixel" since each is an element within the retinal picture.  What Palmer calls a mosaic I earlier called a "checkerboard" but for this example, it seems even clearer to think of it simply as a spreadsheet of numeric entries. 

To generate data for this spreadsheet, I first created a simple, computer 3-D scene and rendered it into a 2-D grayscale image.  I then extracted a very small sample square (47x47 pixels) for this particular demonstration.  From the sample square, I recorded the grayscale values in the spreadsheet for each point (pixel) within my small sample square.  (Although the sample is actually square, the spreadsheet is not because the individual spreadsheet cells are not square.) 


Do you see any images in this figure?  I actually do see the hint of what looks like an eye; it is lighter because the differing digits in the changing squares affect the greyscale values in those squares.  That said, the effect I see is too subtle to work with.  Palmer's examples seem equally meaningless in this numeric form.  We need something more than simple visual inspection of a matrix in order to make sense of this "visual" information. 

Your brain conducts the Image-based process with a search for "edges" — boundaries between two adjacent regions which differ in brightness.  Why?  What's the big deal about edges? Why should edges even matter?  Because they do matter in the real world.   Look away from your computer screen for a moment and observe your surroundings.  I don't know what you see but it is easy to know important aspects of what you are seeing.  You are seeing light reflected from the surfaces of the objects around you, e.g., people, trees, water, furniture, buildings, landscapes, even blue sky or fog.  Since most objects have definite dimensions, the surfaces of those objects do as well.  Since the brightness of light reflected from the surface of an object usually differs from the brightness of the scene's background, the edges of surfaces are almost always discernible.

Palmer describes schemes for edge searches using specialized filters.  These schemes are simple to understand but not particularly easy for me to implement so I relied upon the creative work of others.  The Filter Forge community (www.filterforge.com) has developed a collection of over 8,000 Photoshop filters for nearly as many uses.  I chose their Sobel Edge Detector filter; it is not as elegant as some of the filters Palmer describes but it seems up to this particular task.  Palmer would have applied his filters directly to the numerical matrix above because that is actually what takes place within the Primal Sketch.  I cheated and applied my Sobel Edge Detector filter to the 47x47 pixel image from which the numerical matrix was derived.  Here are the results.

The edges within both the actual image and the edge detection filter image are pretty rough because of the small (47x47) pixel resolution.  I hope you see the idea of this edge detector.  Let's try a larger image.  The left-hand image below is the Sobel Edge Detection filter applied to a larger portion of my rendered scene.  The rendered size of the actual image is 350x240, a total of 84,000 pixels.  


Try to ignore the "Actual Image" for a moment and just consider the "Edge Detection Filter" image in more detail.  Please spend some time with this on your own before proceeding.  

Good.  Here are some of my thoughts.  

Although the top front edge is not visible, I can't imagine (literally speaking) any consistent 3-D object that lacks this invisible edge so, my mind just assumes (demands) that the invisible edge is really there.

From the configuration of the closed loop of three front edges and three top edges, it is hard for me to imagine any shape other than something resembling a brick on its side.  The top left and right edges are not exactly parallel and they seem to converge to some very distant vanishing point behind the figure.     This classic cue of visual depth perspective strongly reinforces the "brick" interpretation in my mind. 

Regarding the little triangle in the lower left "back" side, the interpretation as a shadow cast on the ground plane seems clear but I can also imagine it as a back lid flap that is partially opened.  Honestly, I can't be sure how I would feel if I had not already seen the Actual Image.  Ignoring first impressions can be like unringing a bell.

Finally, let's consider the circular bump and dimple on the front face of the "brick."  The perimeters of the two rims are identical as are the height and depth of the bump and dimple respectively. Which one do you believe is the bump, the left one or the right one?  More importantly, why do you feel that way? 

We'll return to that question briefly in the next post.



Dimples and Bumps

Welcome back.  For those of us who live on the eastern coast of the US, I hope Hurricane Irene turned out to be as timid with you as she was with me.  We did pretty well. 

Let's return briefly to the question of the dimple and the bump from the last post.  Which is the dimple and which is the bump in the Original Image below?  I'm going to assume that, in the Original Image, you picked circular shape A as the dimple and B as the bump. 

Original Image 

The second question is why A and B each appears that way?  Can you force your perception to see circular shape A as a bump?  I can't do that without changing the direction of the lighting.  The circular shape B is the bump but what makes it a bump?  Pay attention to the lighting; it is the key.   I didn't tell you that the lighting was coming from above but that is probably what you assumed, as did I.  I'll speculate that we have become accustomed to seeing shadows cast from light sources (like the sun) that are most often ABOVE the scene.  When the light is from above, circular shape A with the shadow on top must the dimple and B must be the bump.

Of course, you and I occasionally see items illuminated from below but these are the exception to the rule.  Those exceptions often have a special, even eerie look such as the illuminated face of a ghoulish monster.  Maybe the look seems eerie because it is not what we are used to seeing.  

Below, I have flipped the same image 180 degrees.   Now circular shape A has become the bump and B is now the dimple.  At least that is one interpretation.  It is not quite that clear, however, because the lighted "top" surface on the box is now on the bottom of the image.  Maybe this demonstrates that the scene is now bottom-lighted.   It is an ambiguous call although my mind still favors a "light-above" assumption that would make A the bump.

Original Image Flipped


A key conclusion here is that our perception depends upon subtle assumptions that our brain makes about the geometry of the lighting.  Such assumptions are essential in solving the visual perception problem but they remain merely assumptions.  They are what you might want to think of as conjectures or educated guesses.  Our guesses are educated by past experience and they are essential to what Palmer calls the "Heuristic Process" of vision.    

More about the "Heuristic Process" of vision in the next post.


Visual Intelligence Examined

Visual Intelligence:  How We Create What We See, by Donald D. Hoffman 

This is an inexpensive, 200-page paperback that is written in a style closer to Scientific American than a university textbook or scholarly paper.  It is fearless in the face of difficult topics but with clear explanations.  I highly recommend it.

According to the preface, "We have long known about IQ and rational intelligence.   And, in part because of recent advances in neuroscience and psychology, we have begun to appreciate the importance of emotional intelligence.  But we are largely ignorant that there is even such a thing as visual intelligence. ... The culprit in our ignorance is visual intelligence itself.  Vision is normally so swift and sure, so dependable and informative, and apparently so effortless that we naturally assume that it is indeed effortless.  But the swift ease of vision, like the graceful ease of an Olympic ice skater, is deceptive."  

Hoffman's core hypothesis: "Vision is not merely a matter of passive perception, it is an intelligent process of active construction." ... "Just as scientists intelligently construct useful theories based upon experimental evidence, so your visual system intelligently constructs useful visual worlds based on images at the eyes." 

I was initially skeptical that anyone could directly experience visual intelligence outside of carefully controlled laboratory experiments.   The case of visual intelligence seemed analogous to another important human function -- metabolism.  Like visual intelligence, I know that metabolism occurs within my body and that it does so without any conscious effort.  Like my visual experiences, I can also experience the results of metabolism, e.g. my energy for physical activities, my body temperature, and changes in my body weight are just a few examples. Analogously, I experience the results of visual intelligence during every waking moment but how can I break through and directly experience visual intelligence at work?   Is it possible, through simple and guided personal examples, to glimpse the essence of visual intelligence?  Emphatically yes. 

Hoffman's book demolished my skepticism.  After a useful twenty-page orientation, Hoffman begins a guided tour of visual intelligence in the form of rules that our brains use to construct our visual experiences from ambiguous retinal images.  He begins with a "meta-rule," the Rule of Generic Views, followed by thirty-five specific rules.  For instance, Rule 24 says to "put light sources overhead."  We encountered Rule 24 when interpreting the earlier "dimple and bump" problem.  Hoffman's brief description of each rule includes one or more figures intended to illustrate that rule through your visual interactions with the figures. Interactions with those figures are precisely the simple and personal examples I needed to directly experience my visual intelligence at work.


Horses and Zebras

In Vision Science, Palmer stresses that vision is a "heuristic process in which inferences are made about the most likely environmental condition that could have produced a given image.  The process is heuristic because it makes use of inferential rules of thumb -- based on the additional assumptions -- that are not always valid and so will sometimes lead to erroneous conclusions, as in the case of perceptual illusions."

The book I reviewed in the last posting, Visual Intelligence, demonstrates the kind of heuristics that underpin visual perception.  Remember that the author, Donald Hoffman, stresses that "your visual system intelligently constructs useful visual worlds based on images at the eyes."  

Your brain certainly does not KNOW exactly what 3-D objects created the 2-D images at your eyes so it must make educated guesses.  These guesses are the heuristic inferences to which Palmer was referring. 

Before we dive into several examples of heuristics in visual perception, let's make clear exactly what we mean by a "heuristic" or a "heuristic inference."  In 1949, a distinguished British physician, Dr. Richard Asher, captured the essence of heuristic inferences in a most compelling manner.  He wrote an article in The Lancet medical journal entitled "The Seven Sins of Medicine" -- seven important pitfalls that young doctors should avoid in their new practices. 

Asher's "Sin Number 5" was "Love of the Rare."  He wrote, "The desire for rare and interesting disease causes many medical students and young doctors to seek the bizarre rather than seeing a mundane diagnosis."  Asher's now famous rule of thumb:  "If you hear hoof-beats, think horses.  Not zebras."   In the parlance of what doctors call differential diagnosis, start by ruling out all the common possible diagnoses, the "horses," before considering the rare ones, the "zebras."  Why?  Because "horse" inferences are more likely to be correct. 

The same applies to the heuristics of visual perception.  Hoffman offers 35 specific "rules" (heuristics) which describe what the brain assumes in 3-D, based upon what it "sees" in the 2-D retinal information.  These rules all call for “horse” inferences, not by coincidence but, again, because "horse" inferences are more likely to be correct.  

More on that in the next post.

Page 1 2