Perceptron Explained

In machine learning, the perceptron (or McCulloch–Pitts neuron) is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belongs to some specific class. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector.

History

The artificial neuron network was invented in 1943 by Warren McCulloch and Walter Pitts in A logical calculus of the ideas immanent in nervous activity.[1]

In 1957, Frank Rosenblatt was at the Cornell Aeronautical Laboratory. He simulated the perceptron on an IBM 704.[2] [3] Later, he obtained funding by the Information Systems Branch of the United States Office of Naval Research and the Rome Air Development Center, to build a custom-made computer, the Mark I Perceptron. It was first publicly demonstrated on 23 June 1960. The machine was "part of a previously secret four-year NPIC [the US' [[National Photographic Interpretation Center]]] effort from 1963 through 1966 to develop this algorithm into a useful tool for photo-interpreters".[4]

Rosenblatt described the details of the perceptron in a 1958 paper.[5] His organization of a perceptron is constructed of three kinds of cells ("units"): AI, AII, R, which stand for "projection", "association" and "response". He presented at the first international symposium on AI, Mechanisation of Thought Processes, which took place in 1958 November.[6]

Rosenblatt's project was funded under Contract Nonr-401(40) "Cognitive Systems Research Program", which lasted from 1959 to 1970,[7] and Contract Nonr-2381(00) "Project PARA" ("PARA" means "Perceiving and Recognition Automata"), which lasted from 1957 to 1963.[8]

In 1959, the Institute for Defense Analysis awarded his group a $10,000 contract. By September 1961, the ONR awarded further $153,000 worth of contracts, with $108,000 committed for 1962.[9]

The ONR research manager, Marvin Denicoff, stated that ONR, instead of ARPA, funded the Perceptron project, because the project was unlikely to produce technological results in the near or medium term. Funding from ARPA go up to the order of millions dollars, while from ONR are on the order of 10,000 dollars. Meanwhile, the head of IPTO at ARPA, J.C.R. Licklider, was interested in 'self-organizing', 'adaptive' and other biologically-inspired methods in the 1950s; but by the mid-1960s he was openly critical of these, including the perceptron. Instead he strongly favored the logical AI approach of Simon and Newell.[10]

Mark I Perceptron machine

The perceptron was intended to be a machine, rather than a program, and while its first implementation was in software for the IBM 704, it was subsequently implemented in custom-built hardware as the Mark I Perceptron with the project name "Project PARA", designed for image recognition. The machine is currently in Smithsonian National Museum of American History.[11]

The Mark I Perceptron had 3 layers. One version was implemented as follows:

Rosenblatt called this three-layered perceptron network the alpha-perceptron, to distinguish it from other perceptron models he experimented with.[12]

The S-units are connected to the A-units randomly (according to a table of random numbers) via a plugboard (see photo), to "eliminate any particular intentional bias in the perceptron". The connection weights are fixed, not learned. Rosenblatt was adamant about the random connections, as he believed the retina was randomly connected to the visual cortex, and he wanted his perceptron machine to resemble human visual perception.[13]

The A-units are connected to the R-units, with adjustable weights encoded in potentiometers, and weight updates during learning were performed by electric motors.[14] The hardware details are in an operators' manual.[15]

In a 1958 press conference organized by the US Navy, Rosenblatt made statements about the perceptron that caused a heated controversy among the fledgling AI community; based on Rosenblatt's statements, The New York Times reported the perceptron to be "the embryo of an electronic computer that [the Navy] expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence."[16]

Central Intelligence Agency’s (CIA) Photo Division, from 1960 to 1964, studied the use of Mark I Perceptron machine for recognizing militarily interesting silhouetted targets (such as planes and ships) in aerial photos.[17] [18]

Principles of Neurodynamics (1962)

Rosenblatt described his experiments with many variants of the Perceptron machine in a book Principles of Neurodynamics (1962). The book is a published version of the 1961 report.[19]

Among the variants are:

The machine was shipped from Cornell to Smithsonian in 1967, under a government transfer administered by the Office of Naval Research.

Perceptrons (1969)

See main article: Perceptrons (book). Although the perceptron initially seemed promising, it was quickly proved that perceptrons could not be trained to recognise many classes of patterns. This caused the field of neural network research to stagnate for many years, before it was recognised that a feedforward neural network with two or more layers (also called a multilayer perceptron) had greater processing power than perceptrons with one layer (also called a single-layer perceptron).

Single-layer perceptrons are only capable of learning linearly separable patterns.[20] For a classification task with some step activation function, a single node will have a single line dividing the data points forming the patterns. More nodes can create more dividing lines, but those lines must somehow be combined to form more complex classifications. A second layer of perceptrons, or even linear nodes, are sufficient to solve many otherwise non-separable problems.

In 1969, a famous book entitled Perceptrons by Marvin Minsky and Seymour Papert showed that it was impossible for these classes of network to learn an XOR function. It is often incorrectly believed that they also conjectured that a similar result would hold for a multi-layer perceptron network. However, this is not true, as both Minsky and Papert already knew that multi-layer perceptrons were capable of producing an XOR function. (See the page on Perceptrons (book) for more information.) Nevertheless, the often-miscited Minsky and Papert text caused a significant decline in interest and funding of neural network research. It took ten more years until neural network research experienced a resurgence in the 1980s.[20] This text was reprinted in 1987 as "Perceptrons - Expanded Edition" where some errors in the original text are shown and corrected.

Subsequent work

Rosenblatt continued working on perceptrons despite diminishing funding. The last attempt was Tobermory, built between 1961 and 1967, built for speech recognition.[21] It occupied an entire room.[22] It had 4 layers with 12,000 weights implemented by toroidal magnetic cores. By the time of its completion, simulation on digital computers had become faster than purpose-built perceptron machines.[23] He died in a boating accident in 1971.The kernel perceptron algorithm was already introduced in 1964 by Aizerman et al.[24] Margin bounds guarantees were given for the Perceptron algorithm in the general non-separable case first by Freund and Schapire (1998),[25] and more recently by Mohri and Rostamizadeh (2013) who extend previous results and give new and more favorable L1 bounds.[26] [27]

The perceptron is a simplified model of a biological neuron. While the complexity of biological neuron models is often required to fully understand neural behavior, research suggests a perceptron-like linear model can produce some behavior seen in real neurons.[28]

The solution spaces of decision boundaries for all binary functions and learning behaviors are studied in.[29]

Definition

In the modern sense, the perceptron is an algorithm for learning a binary classifier called a threshold function: a function that maps its input

x

(a real-valued vector) to an output value

f(x)

(a single binary value):

f(\mathbf) = h(\mathbf \cdot \mathbf + b)

where

h

is the Heaviside step-function,

w

is a vector of real-valued weights,

wx

is the dot product \sum_^m w_i x_i, where is the number of inputs to the perceptron, and is the bias. The bias shifts the decision boundary away from the origin and does not depend on any input value.

Equivalently, since

wx+b=(w,b)(x,1)

, we can add the bias term

b

as another weight

wm+1

and add a coordinate

1

to each input

x

, and then write it as a linear classifier that passes the origin:f(\mathbf) = h(\mathbf \cdot \mathbf)

The binary value of

f(x)

(0 or 1) is used to perform binary classification on

x

as either a positive or a negative instance. Spatially, the bias shifts the position (though not the orientation) of the planar decision boundary.

In the context of neural networks, a perceptron is an artificial neuron using the Heaviside step function as the activation function. The perceptron algorithm is also termed the single-layer perceptron, to distinguish it from a multilayer perceptron, which is a misnomer for a more complicated neural network. As a linear classifier, the single-layer perceptron is the simplest feedforward neural network.

Power of representation

Information theory

From an information theory point of view, a single perceptron with K inputs has a capacity of 2K bits of information.[30] This result is due to Thomas Cover.[31]

Specifically let

T(N,K)

be the number of ways to linearly separate N points in K dimensions, thenT(N, K)=\left\

Notes and References

  1. McCulloch . W . Pitts . W . A Logical Calculus of Ideas Immanent in Nervous Activity . Bulletin of Mathematical Biophysics . 1943 . 5 . 4 . 115–133 . 10.1007/BF02478259 .
  2. Rosenblatt . Frank . 1957 . The Perceptron—a perceiving and recognizing automaton . Report 85-460-1 . Cornell Aeronautical Laboratory.
  3. Rosenblatt . Frank . March 1960 . Perceptron Simulation Experiments . Proceedings of the IRE . 48 . 3 . 301–309 . 10.1109/JRPROC.1960.287598 . 0096-8390.
  4. O’Connor . Jack . 2022-06-21 . Undercover Algorithm: A Secret Chapter in the Early History of Artificial Intelligence and Satellite Imagery . International Journal of Intelligence and CounterIntelligence . en . 1–15 . 10.1080/08850607.2022.2073542 . 0885-0607 . 249946000. subscription .
  5. Rosenblatt . F. . 1958 . The perceptron: A probabilistic model for information storage and organization in the brain. . Psychological Review . 65 . 6 . 386–408 . 10.1037/h0042519 . 13602029 . 1939-1471. subscription .
  6. Frank Rosenblatt, ‘Two Theorems of Statistical Separability in the Perceptron’, Symposium on the Mechanization of Thought, National Physical Laboratory, Teddington, UK, November 1958, vol. 1, H. M. Stationery Office, London, 1959.
  7. Rosenblatt, Frank, and CORNELL UNIV ITHACA NY. Cognitive Systems Research Program. Technical report, Cornell University, 72, 1971.
  8. Muerle, John Ludwig, and CORNELL AERONAUTICAL LAB INC BUFFALO NY. Project Para, Perceiving and Recognition Automata. Cornell Aeronautical Laboratory, Incorporated, 1963.
  9. Penn . Jonathan . Inventing Intelligence: On the History of Complex Information Processing and Artificial Intelligence in the United States in the Mid-Twentieth Century . 2021-01-11 . [object Object] . 10.17863/cam.63087 . en.
  10. Guice . Jon . 1998 . Controversy and the State: Lord ARPA and Intelligent Computing . Social Studies of Science . 28 . 1 . 103–138 . 10.1177/030631298028001004 . 285752 . 11619937 . 0306-3127.
  11. Web site: Perceptron, Mark I . 2023-10-30 . National Museum of American History . en.
  12. Book: Nilsson, Nils J. . The Quest for Artificial Intelligence . 2009 . Cambridge University Press . 978-0-521-11639-8 . Cambridge . 4.2.1. Perceptrons.
  13. Book: Talking Nets: An Oral History of Neural Networks . 2000 . The MIT Press . 978-0-262-26715-1 . Anderson . James A. . en . 10.7551/mitpress/6626.003.0004 . Rosenfeld . Edward.
  14. Book: Bishop, Christopher M. . Pattern Recognition and Machine Learning . Springer . 2006 . 0-387-31073-8.
  15. Book: Hay, John Cameron . Mark I perceptron operators' manual (Project PARA) / . 1960 . Cornell Aeronautical Laboratory . Buffalo . https://web.archive.org/web/20231027213510/https://apps.dtic.mil/sti/tr/pdf/AD0236965.pdf . 2023-10-27 .
  16. Olazaran . Mikel . 1996 . A Sociological Study of the Official History of the Perceptrons Controversy . Social Studies of Science . 26 . 3 . 611–659 . 10.1177/030631296026003005 . 285702 . 16786738.
  17. Web site: Perception Concepts to Photo-Interpretation . 2024-11-14 . www.cia.gov.
  18. Irwin . Julia A. . 2024-09-11 . Artificial Worlds and Perceptronic Objects: The CIA's Mid-century Automatic Target Recognition . Grey Room . en . 97 . 6–35 . 10.1162/grey_a_00415 . 1526-3819.
  19. Principles of neurodynamics: Perceptrons and the theory of brain mechanisms, by Frank Rosenblatt, Report Number VG-1196-G-8, Cornell Aeronautical Laboratory, published on 15 March 1961. The work reported in this volume has been carried out under Contract Nonr-2381 (00) (Project PARA) at C.A.L. and Contract Nonr-401(40), at Cornell Univensity.
  20. Book: Sejnowski, Terrence J.. Terry Sejnowski. The Deep Learning Revolution . 2018. MIT Press . 978-0-262-03803-4 . en. 47.
  21. Rosenblatt, Frank (1962). “A Description of the Tobermory Perceptron.” Cognitive Research Program. Report No. 4. Collected Technical Papers, Vol. 2. Edited by Frank Rosenblatt. Ithaca, NY: Cornell University.
  22. Nagy, George. 1963. System and circuit designs for the Tobermory perceptron. Technical report number 5, Cognitive Systems Research Program, Cornell University, Ithaca New York.
  23. Nagy, George. "Neural networks-then and now." IEEE Transactions on Neural Networks 2.2 (1991): 316-318.
  24. Aizerman . M. A. . Braverman . E. M. . Rozonoer . L. I. . 1964 . Theoretical foundations of the potential function method in pattern recognition learning . Automation and Remote Control . 25 . 821–837 .
  25. 10.1023/A:1007662407062 . 1999 . Large margin classification using the perceptron algorithm . Freund . Y. . Yoav Freund . . 37 . 3 . 277–296 . Schapire . R. E. . 5885617 . Robert Schapire . free .
  26. Mohri . Mehryar . Rostamizadeh . Afshin . Perceptron Mistake Bounds . 1305.0208 . 2013 . cs.LG .
  27. https://mitpress.mit.edu/books/foundations-machine-learning-second-edition
  28. Cash . Sydney . Rafael . Yuste . Linear Summation of Excitatory Inputs by CA1 Pyramidal Neurons . . 22 . 2 . 1999 . 383–394 . 10.1016/S0896-6273(00)81098-3 . 10069343 . free .
  29. Book: Liou . D.-R. . Learning Behaviors of Perceptron . Liou . J.-W. . Liou . C.-Y. . iConcept Press . 2013 . 978-1-477554-73-9.
  30. Book: MacKay, David . Information Theory, Inference and Learning Algorithms . 2003-09-25 . . 9780521642989 . 483 . David J. C. MacKay.
  31. Cover . Thomas M. . June 1965 . Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition . IEEE Transactions on Electronic Computers . EC-14 . 3 . 326–334 . 10.1109/PGEC.1965.264137 . 0367-7508. subscription .