A logical calculus of the ideas immanent to nervous activity is a 1943 article written by Warren McCulloch and Walter Pitts.[1] The paper, published in the journal The Bulletin of Mathematical Biophysics. The paper proposed a mathematical model of the nervous system as a network of simple logical elements, later known as artificial neurons, or McCulloch-Pitts neurons. These neurons receive inputs, perform a weighted sum, and fire an output signal based on a threshold function. By connecting these units in various configurations, McCulloch and Pitts demonstrated that their model could perform all logical functions.
It is a seminal work in computational neuroscience, computer science, and artificial intelligence. It was a foundational result in automata theory. John von Neumann cited it as a significant result.[2]
The artificial neuron used in the original paper is slightly different from the modern version. They considered neural networks that operate in discrete steps of time
t=0,1,...
The neural network contains a number of neurons. Let the state of a neuron
i
t
Ni(t)
\theta
Each neuron can connect to any other neuron (including itself) with positive synapses (excitatory) or negative synapses (inhibitory). That is, each neuron can connect to another neuron with a weight
w
We can regard each neural network as a directed graph, with the nodes being the neurons, and the directed edges being the synapses. A neural network has a circle or a circuit iff there exists a directed circle in the graph.
Let
wij(t)
j
i
t
H
The paper used, as a logical language for describing neural networks, Language II from The Logical Syntax of Language by Rudolf Carnap with some notations taken from Principia Mathematica by Alfred North Whitehead and Bertrand Russell. Language II covers substantial parts of classical mathematics, including real analysis and portions of set theory.[3]
To describe a neural network with peripheral afferents
N1,N2,...,Np
Np+1,Np+2,...,Nn
Pr
N1,...,Np
t
t
N1,...,Np
Pr(N1,N2,...,Nn,t)
Pr(N1,N2,...,Nn,0),Pr(N1,N2,...,Nn,1),...
A logical sentence
Pr(N1,N2,...,Nn,t)
T\geq0
i
Np+1(0),...,Nn(0)
t
i
t+T
In the paper, they considered some alternative definitions of artificial neural networks, and have shown them to be equivalent, that is, neural networks under one definition realizes precisely the same logical sentences as neural networks under another definition.
They considered three forms of inhibition: relative inhibition, absolute inhibition, and extinction. The definition above is relative inhibition. By "absolute inhibition" they meant that if any negative synapse fires, then the neuron will not fire. By "extinction" they meant that if at time
t
i
\thetai(t+j)=\thetai(0)+bj
j=1,2,3,...
i
bj=0
j
Theorem 4 and 5 state that these are equivalent.
They considered three forms of excitation: spatial summation, temporal summation, and facilitation. The definition above is spatial summation (which they pictured as having multiple synapses placed close together, so that the effect of their firing sums up). By "temporal summation" they meant that the total incoming signal is
n | |
\sum | |
j=1 |
wij(t)Nj(t-\tau)
T\geq1
bj\leq0
They considered neural networks that do not change, and those that change by Hebbian learning. That is, they assume that at
t=0
t
Ni(t)=1,Nj(t)=1
i,j
They considered "temporal propositional expressions" (TPE), which are propositional formulas with one free variable
t
N1(t)\veeN2(t)\wedge\negN3(t)
For neural nets with loops, they noted that "realizable
Pr
(\existsx)(\psix)
As a remark, they noted that a neural network, if furnished with a tape, scanners, and write-heads, is equivalent to a Turing machine, and conversely, every Turing machine is equivalent to some such neural network. Thus, these neural networks are equivalent to Turing computability, Church's lambda-definability, and Kleene's primitive recursiveness.
The paper built upon several previous strands of work.[5] [6]
In the symbolic logic side, it built on the previous work by Carnap, Whitehead, and Russell. This was contributed by Walter Pitts, who had a strong proficiency with symbolic logic. Pitts provided mathematical and logical rigor to McCulloch’s vague ideas on psychons (atoms of psychological events) and circular causality.[7]
In the neuroscience side, it built on previous work by the mathematical biology research group centered around Nicolas Rashevsky, of which McCulloch was a member. The paper was published in the Bulletin of Mathematical Biophysics, which was founded by Rashevsky in 1939. During the late 1930s, Rashevsky's research group was producing papers that had difficulty publishing in other journals at the time, so Rashevsky decided to found a new journal exclusively devoted to mathematical biophysics.[8]
Also in the Rashevsky's group was Alston Scott Householder, who in 1941 published an abstract model of the steady-state activity of biological neural networks. The model, in modern language, is an artificial neural network with ReLU activation function.[9] In a series of papers, Householder calculated the stable states of very simple networks: a chain, a circle, and a bouquet. Walter Pitts' first two papers formulated a mathematical theory of learning and conditioning. The next three were mathematical developments of Householder’s model.[10]
In 1938, at age 15, Pitts ran away from home in Detroit and arrived in the University of Chicago. Later, he walked into Rudolf Carnap's office with Carnap's book filled with corrections and suggested improvements. He started studying under Carnap and attending classes during 1938--1943. He wrote several early papers on neuronal network modelling and regularly attended Rashevsky's seminars in theoretical biology. The seminar attendants included Gerhard von Bonin and Householder. In 1940, von Bonin introduced Lettvin to McCulloch. In 1942, both Lettvin and Pitts had moved in with McCulloch's home.[11]
McCulloch had been interested in circular causality from studies with causalgia after amputation, epileptic activity of surgically isolated brain, and Lorente de Nò's research showing recurrent neural networks are needed to explain vestibular nystagmus. He had difficulty with treating circular causality until Pitts demonstrated how it can be treated by the appropriate mathematical tools of modular arithmetics and symbolic logic.
Both authors' affiliation in the article was given as "University of Illinois, College of Medicine, Department of Psychiatry at the Illinois Neuropsychiatric Institute, University of Chicago, Chicago, U.S.A."
It was a foundational result in automata theory. John von Neumann cited it as a significant result. This work led to work on neural networks and their link to finite automata. Marvin Minsky was influenced by McCulloch, built an early example of neural network SNARC (1951), and did a PhD thesis on neural networks (1954).[12]
McCulloch was the chair to the ten Macy conferences (1946--1953) on "Circular Causal and Feedback Mechanisms in Biological and Social Systems". This was a key event in the beginning of cybernetics, and what later became known as cognitive science. Pitts also attended the conferences.[13]
In the 1943 paper, they described how memories can be formed by a neural network with loops in it, or alterable synapses, which are operating over time, and implements logical universals -- "there exists" and "for all". This was generalized for spatial objects, such as geometric figures, in their 1947 paper How we know universals.[14] Norbert Wiener found this a significant evidence for a general method for how animals recognizing objects, by scanning a scene from multiple transformations and finding a canonical representation. He hypothesized that this "scanning" activity is clocked by the alpha wave, which he mistakenly thought was tightly regulated at 10 Hz (instead of the 8 -- 13 Hz as modern research shows).
McCulloch worked with Manuel Blum in studying how a neural network can be "logically stable", that is, can implement a boolean function even if the activation thresholds of individual neurons are varied.[15] They were inspired by the problem of how the brain can perform the same functions, such as breathing, under influence of caffeine or alcohol, which shifts the activation threshold over the entire brain.