This is a Week12 | Lecture page, written Sun Apr 6 21:09:13 PDT 2008.
Think of a gang, patrolling at the front gate of a castle, protecting an area on a bridge.
Looking inside each members of the gang game, we may see
SteeringForce +=
wander() * wanderAmount * wanderProirity +
obstacleAvoidance() * avoidAmount * avoidPriority +
aligness() * alignAmount * alignPriority +
cohesion() * cohesionAmount * cohesionPriority +
seperation() * seperationAmount * seperationPriority +
alignment() * alignAmount * alignPriority +
wallAvoidance() * wallAvoidanceAmount * wallAvoidPriority +
chasePrey() * chaseAmount * chasePriority +
avoidPrey() * avoidAmount * avoidPriority
What is going on here? A round robin between two models:
steer | | v Player ---> thrust
How to control a whole host of behaviors:.
When in doubt, pattern movement. At any "X", going in direction "Y", pick any neighbor at random, favoring those nearer your current compass bearing:
pattern[0]={1111111111
1111111111
1111111111
1111111111
1111111111
1111111111
1111111111}
pattern[1]={1...1..111
.1.1..1...1
1.1.......1
.1..111.1.1
1..1...1.1.
1.1........
11.........}
pattern[2]=
Pick patterns at random. Note: pattern[0] means random walk.
Actor X drops bread crumbs. Actor Y steers towards zones of higher bread crumb results.
Bread crumbing have some nice computational properties; e.g. very fast to compute.
Kind of pattern movements, for groups.
Look around to a distance of "D" for an plus/minus "X" degrees from your current compass bearing:
With that knowledge, work out how much to operate, align, cohere:
![]() | Separation: steer to avoid crowding local flock mates (give this one the highest priority) |
| Alignment: steer towards the average heading of local flock mates |
| Cohesion: steer to move toward the average position of local flock mates,
maybe weighted by your compass bearing
741 973 741 |
(See also Lennard-Jones functions for combining the above into one equation.)
Do a weighted sum of the neighborhood, giving a "leader" more weight than the rest of the flock.
Now the problem of programming the flock reduces to just the problem of programming the leader.
Note: the leader can be selected dynamically:
Predict where the food will be, steer towards there.
Predict where the prey will be, steer away from there.
Or, with knowledge of the terrain, find a predator's blind spot (behind a rock) and go steer to there.
Extend "feelers".
If they fall within "r" of some obstacle, then insert a steering force "b" to deflect the actor
Given a terrain map of regions (rooms, valleys with passes between them, oceans connected by straights),
To compute way points:
M1= A B C D E F G
A . 1 . . . . .
B . . 1 . . . .
C . . . 1 1 . .
D . . . . . . .
E . . . . . 1 .
F . . . . . . 1
G . . . . . . .
M2= A B C D E F G
A . 1 . . . . .
B 1 . 1 . . . .
C . 1 . 1 1 . .
D . . 1 . . . .
E . . 1 . . 1 .
F . . . . 1 . 1
G . . . . . 1 .
If you want some geography, add distance measure instead of just "1"
D1= A B C D E F G
A . 4 . . . . .
B . . 2 . . . .
C . . . 9 7 . .
D . . . . . . .
E . . . . . 8 .
F . . . . . . 3
G . . . . . . .
These numbers can reflect difficulty in traversing some region (e.g. mud, quicksand, steep slopes)
Navigation in two steps: go straight to the nearest way point, then path follow from weigh point to weigh point.
1 function Dijkstra(Graph, source):
2 for each vertex v in Graph: // Initializations
3 dist[v] := infinity // Unknown distance function from source to v
4 previous[v] := undefined
5 dist[source] := 0 // Distance from source to source
6 Q := copy(Graph) // All nodes in the graph are unoptimized - thus are in Q
7 while Q is not empty: // The main loop
8 u := extract_min(Q) // Remove and return best vertex from nodes in two given nodes
// we would use a path finding algorithm on the new graph, such as depth-first search.
9 for each neighbor v of u: // where v has not yet been removed from Q.
10 alt = dist[u] + length(u, v)
11 if alt < dist[v] // Relax (u,v)
12 dist[v] := alt
13 previous[v] := u
14 return previous[]
When wondering a path, remember to stagger a little and steer more at corners.
And for crowds to walk paths, add obstacle avoidance.
Recall our geography maps:
D1= A B C D E F G
A . 4 . . . . .
B . . 2 . . . .
C . . . 9 7 . .
D . . . . . . .
E . . . . . 8 .
F . . . . . . 3
G . . . . . . .
Remember that these numbers can reflect difficulty in traversing some region (e.g. mud, quicksand, steep slopes)
An influence map is a second "geography" map that changes at runtime. E.g. suppose you keep dying every time you walk from G to A:
Inf= A B C D E F G
A . . . . . . .
B . . . . . . .
C . . . . . . .
D . . . . . . .
E . . . . . . .
F . . . . . . .
G 10 . . . . . .
So now we can do A* to minimize g(x)+h(x) where "h=D+Inf"
So if the above methods are all standard, the enemy knows them too.
Q1: How can you reason about the enemy reasoning about your reasoning?
Q2: How you do the above faster?
See http://www.ocf.berkeley.edu/~yosenl/extras/alphabeta/alphabeta.html
There is a common illusion that we first generate a n-ply minmax tree and then apply alpha-beta prune. No, we generate the tree and apply alpha-beta prune at the same time (otherwise, no point).
How effective is alpha-beta pruning? It depends on the order in which children are visited. If children of a node are visited in the worst possible order, it may be that no pruning occurs. For max nodes, we want to visit the best child first so that time is not wasted in the rest of the children exploring worse scenarios. For min nodes, we want to visit the worst child first (from our perspective, not the opponent's.) There are two obvious sources of this information:
When the optimal child is selected at every opportunity, alpha-beta pruning causes all the rest of the children to be pruned away at every other level of the tree; only that one child is explored. This means that on average the tree can searched twice as deeply as before?a huge increase in searching performance.
This is a Lecture | Week10 page, written Sat Mar 15 19:31:42 PDT 2008.
To understand graph-based abduction, it is best to look at its historical origins.
In the beginning was Doyle's justification-based truth maintenance system (TMS). A TMS is a knowledge representation method for representing both beliefs and their dependencies. The name truth maintenance is due to the ability of these systems to restore consistency.
My own belief is that once you can infer from observations to consequences, then sort out the result into consistent sets of beliefs, then a whole host of applications are just queries to those sets of beliefs structure.
In Doyle's scheme ( J. Doyle. A Truth Maintenance System. AI. Vol. 12. No 3, pp. 251-272. 1979):
In the JTMS, rules take the form:
if INS and not OUTS then"conc" is in.(And if you aren't "in", you are "out").
The current set of "in" literals and "out" literals are called an environment (a.k.a. world, context, scenario, etc etc)
Now if realize that "conc" is "in":
Note that the JTMS can reel from environment to environment with no memory of the past. At each step, it might be repeating all the inference it used during the last change of environment.
In DeKleer's ATMS (assumption-based TMS: J. de Kleer (1986). An assumption-based TMS. Artificial Intelligence, 28:127-16) the dependency network is pre-computers and cached. This lets us quickly jump from environment to environment.
The ATMS is a smart place for some problem solver to store its conclusions, justifications for those conclusions.
query ---> --
\
problemSolver --> justifications --> atms <-> lattice <-> environments
\ /
\--<--- beliefs ---<----/
Here:
The ATMS offers several services:
Example follows. And btw, I no longer think this is the way to do this stuff. But we'll get to that later.
In the following example, a lattice has been built using the following knowledge:
The following graphical notation is used:
Not shown in the above are the conclusions associated with each assumption- but you can imagine them sitting on each node.
The ATMS is incremental: the above network is changed every time a new
Default reasoning: if we want to believe in "B" then it is fast to find what else we can assume (follow parent links).
Example of monitoring: Some oracle declares that "C" can't happen. So we can quicky rule out anything that uses "C" (follow the parent links). Actually, given the nogood of {a,b,e}, then that single observation of "not C" rules out everything except {a,b}, {a,b,d}.
Example of planning: To avoid some known badness at {a,c,d,e} run down the lattice to the smallest subset that you can direct control over (e.g. try to turn off either {a,c} or {d,e}).
Example of dependency driven backtracking: We are told that that whole "not C" was a mistake. So:
I thought I could better than the ATMS (pride cometh before the fall). In 1993 I was generating multiple environments and my inference engines were running slow. I was worried about the lattice construction time- all those subset calculations.
My thesis was an experiment in speeding up environment generation using as a side-effect of backward chaining from "goals" back to "ins". The idea was, as you ascend from the "goals" to the "ins", the algorithm keeps the top-most controversial assumption yet found. When you arrive at the "ins", then these top-most controversial assumptions are the most important issues to consider- and we use these to generate the environments.
More experiments with an ISSAMP-variant by OWEN 2002 ... 2007 (On the advantages of approximate vs. complete verification: bigger models, faster, less memory, usually accurate Owen, D.; Menzies, T.; Heimdahl, M.; Jimin Gao Software Engineering Workshop, 2003. Proceedings. 28th Annual NASA Goddard Volume , Issue , 3-4 Dec. 2003 Page(s): 75 - 81) showed that the random search scaled much better, worked better for larger models, less effected by to changes in the input model etc etc
Random world generation with LURCH
LURCH does not report the key assumptions that selects for different outcomes so the lattice construction problem remains
This is a Lecture | Week8 page, written Mon Mar 3 18:21:22 PST 2008.
(Reference: Science, 20 June 1980, Vol. 208. no. 4450, pp. 1335 - 1342 "Expert and Novice Performance in Solving Physics Problems", Jill Larkin, John McDermott, Dorothea P. Simon, and Herbert A. Simon. )
Much of the early AI research was informed by a cognitive psychology model of human expertise. The model has two parts:
Reasoning was, according to this mode: a match-act cycle:
In this model
Q: How to experts know what is relevant?
Much, much work on mapping this cognitive psychological insight into AI programs
(Reference: "An Expert System for Raising Pigs" by T.J. Menzies and J. Black and J. Fleming and M. Dean. The first Conference on Practical Applications of Prolog 1992 . Available from http://menzies.us/pdf/ukapril92.pdf )
PIGE: Australia's first exported rule-based expert systems
Written by me, 1987 who trapped the expertise of pig nutritionists in a few hundred rules.
( V.L. Yu and L.M. Fagan and S.M. Wraith and W.J. Clancey and A.C. Scott and J.F. Hanigan and R.L. Blum and B.G. Buchanan and S.N. Cohen, 1979, Antimicrobial Selection by a Computer: a Blinded Evaluation by Infectious Disease Experts, Journal of American Medical Association, vo. 242, pages 1279-1282)
Given patient symptoms, what antibiotics should be prescribed?
Approx 500 rules.
Extensively studied:
Compared to humans, performed very well:
Why did it perform so well?
John McDermott, DEC, 1980s. Two systems:
With strong theoretical support and excellent industrial case studies, the future for AI seemed bright.
Begin the AI summer (mid-1980s). Just like the Internet bubble of 2000: fledging technology, over-hyped, immature tool kits.
And after the summer, came the fall. Enough said.
Happily, then came the 1990s, data mining, and results from real-time AI planners, and stochastic theorem provers.
By 2003, I could say I was an AI-nerd and people would not run away.
Here's a good introduction on rule-based programming. Read it (only pages 1,2,3,4,5,6,7) carefully to learn the meaning of:
Backward chaining supports how and why queries:
Forward chaining supports only how not why.
Define match, resolve, act
Some nice advantages for software engineering:
As rule-based programs became more elaborate, their support tools more intricate, the likelihood that they had any human psychological connection became less and less.
As we built larger and larger rule bases, other non-cognitive issues became apparent:
Note that both these problems come from the global nature of match: all rules over all working memory.
Go to simpler representations. e.g. state machines (used in gaming, next lecture)
As used in many rule-based tools including OPS5, ART, JESS. Compile all the rules into a big network wiring all the conditions together.
If searching/understanding all is too much, find ways to built the whole from local parts. Have separate rule bases for each part.
Q: How to divide the system?
E.g. here's Clancy's abstraction of MYCIN:
And here's his abstraction of an electrical
fault localization expert system:
Note that the same abstract problem solving method occurs in both
A whole mini-industry arose in the late 1980s, early 1990s defining libraries of supposedly reusable problem solving methods.
e.g. Here's "diagnosis"

Here's "monitoring":

Note that you could write and reuse rules based to handle select, compare, specify.
And there's some evidence that this is useful. Marcus and McDermott built rule bases containing 7000+ rules via role-limiting methods. That is, once they knew their problem solving methods, they write specialized editors that only let users enter in the control facts needed to just run those methods.
For a catalog of problem solving methods, see Cognitive Patterns By Karen M. Gardner (contributors James Odell, Barry McGibbon), 1998, Cambridge University Press
We study AI to learn useful tricks. Sometimes, we also learn something about people. Here's a radically different, and successful, method to the above. What does it tell us about human cognition?
Ripple-down rules is a maintenance technique for rule-based programs initially developed by Compton. Ripple-down rules are best understood by comparison with standard rule-based programming. In standard rule-based programming, each rule takes the form
rule ID IF condition THEN actionwhere condition is some test on the current case. In the 1970s and early 1980s rule-based systems were hailed as the cure to the ills of software and knowledge engineering. Rules are useful, it was claimed, since they: represent high-level logic of the system expressed in a simple form that can be rapidly modified.
However, as these systems grew in size, it became apparent that rules were surprisingly harder to manage. One reason for this was that, in standard rule-based programming, all rules exist in one large global space where any rule can trigger any other rule. Such an open architecture is very flexible. However, it is also difficult to prevent unexpected side-effects after editing a rule. Consequently, the same rule-based programs hailed in early 1980s (XCON) became case studies about how hard it can be to maintain rule-based systems.
Many researchers argued that rule authors must work in precisely constrained environments, lest their edits lead to spaghetti knowledge and a maintenance nightmare. One such constrained environment is ripple-down rules that adds an EXCEPT slot to each rule:
rule ID1 IF condition THEN EXCEPT rule ID2 THEN conclusion because EXAMPLE
Here, ID1,ID2 are unique identifiers for different rules; EXAMPLE is the case that prompted the creation of the rule (internally, EXAMPLEs conjunction of features); and the condition is some subset of the EXAMPLE, i.e., condition ⊆ EXAMPLE (the method for selecting that subset is discussed below). Rules and their exceptions form a ripple-down rule tree:
At run time, a ripple-down rule interpreter explores the rules and their exceptions. If the condition of rule ID1 is found to be true, the interpreter checks the rule referenced in the except slot. If ID2's rule condition is false, then the interpreter returns the conclusion of ID1. Else, the interpreter recurses into rule ID2.
(Note the unbalanced nature of the tree, most patches are shallow. This is a common feature of RDRs).
Ripple-down rules can be easier to read than normal rules:
But in practice, Compton advises hiding the tree behind a difference-list editor. Such an editor constrains rule authoring as follows:
condition2 = EXAMPLE1 - condition1 condition2 ⊆ EXAMPLE2
When users work in a difference list editor, they watch EXAMPLEs running over the ripple-down rules tree, intervening only when they believe that the wrong conclusion is generated. At that point, the editor generates a list of features to add to condition2 (this list is generated automatically from the above equations). The expert picks some items from this list and the patch rule is automatically to some leaf of the ripple-down rules tree.
Ripple-down rules are a very rapid rule maintenance environment. Compton et.al. report average rule edit times between 30 to 120 seconds for rule bases up to 1000 rules in size [Compton 2005]
AFAIK, ripple-down-rules are the current high-water mark in knowledge maintenance.
Q: what does this tell us about human knowledge?
This is a Lecture | Week6 page, written Sun Feb 17 09:36:15 PST 2008.
A Sudoku puzzle:
One of the hardest Sudoku's ever found: the dreaded Al Escargot problem:
Encoding Sudoku into CNF (so it can be solved by,say, MAXWALKSAT.
Here s[x,y,z]=true means that the square at x,y has value z. The following rules expand into propositional formulae:
Note that the encoding can be very large. But that is the game with using MAXWALKSAT (how to encode a domain into the low level representation used by the solver).
This is a Lecture | Week6 page, written Sun Feb 17 09:28:49 PST 2008.
As engineers, you need to sensibly select technologies. Some exciting candidate technologies are too bleeding edge, or too slow or cumbersome to use in practice. And you have to decide which is which.
The next two lectures offer examples of this process. Note that in both case, the gaming industry uses technology (b) while theoreticians prefer (a).
In your career you'll have to strike your own balance between the limitations of the simpler method (b) vs the extra functionality offered by the more complex method (a).
The words figure and fictitious both derive from the same Latin root, fingere (to form, create). Beware!
--M.J. Moroney
Everything is vague to a degree you do not realize :till you have tried to make it precise. --Bertrand Russell
I'm pretty certain that I need better ways to handle uncertainty. Many expressions of human knowledge are not crisp ideas. Rather, they are hedges, shades of gray.
Consider four problems in programming a game:
The problem is that conventional principles of logic are very unfriendly towards uncertainty. In classical logic, things are either true or false. You do, you don't. Worse, if a model contains a single inconsistency, the whole theory is false. So one little doubt, one little "this or not this" and you are screwed. Not very helpful for introducing gradual degrees of hedged knowledge that may be contradictory.
This is not just a game playing problem.
The drawbacks of classical logic are well known.
Theoreticians like Lofti Zadeh offer a Principle of Incompatibility :
Philosophers like Karl Popper claim that accessing 100% evidence on any topic a fool's goal. All "proofs" must terminate on premises; i.e. some proposition that we accept as
true without testing, but which may indeed be wrong.
In terms of most human knowledge, a recursion to base premises is
fundamentally impractical. For example:
There is much evidence for Popper's thesis. Gathering all the information required to remove all uncertainties can be very difficult:
The (in)famous "Limits to Growth"
study attempted to predict the international effects of continued global economic growth. Less than 0.1% of the data required for the models was available
Experience with mathematical simulation suggest that an over-enthusiasm for quantitative analysis can confuse, rather than clarify, a domain. For systems whose parts are not listed in a catalog, which evolve together, which are difficult to measure, and which show unexpected capacity to form new connections, the results of (quantitative) simulation techniques have been less than impressive:
Anyway, why seek total knowledge? Some levels of uncertainty are a good thing:
When co-ordinating multiple agents, it is good to leave some room for
local improvisation. Consider the phrase "close enough for jazz":
Sometimes we can learn more by allowing some uncertainties. Creative solutions and insights can sometimes be found in a loosely constrained space than may be impossible in tightly constrained, certain spaces. For example,
PROXIMITY
SIZE close medium far
-------- ----------------------
tiny medium low low
small high low low
moderate high medium low
large high high medium
So we need to change classical logic. Uncertainty is unavoidable. Despite this, we persist and continue building systems of increasing size and internal complexity. Somehow, despite uncertainty, we can handle uncertainty and doubt and predict the future a useful percent of the time. How do we do it? How can we teach AI algorithms to do it?
Officially, there are rules. Unofficially, there are shades of gray. Fuzzy logic is one way to
handle such shades of gray.
For example, consider the rule ``if not raining then play golf''. Strange to say, sometimes people play golf even in the rain. Sure, they may not play in a hurricane but a few drops of rain rarely stops the dedicated golfer.
Fuzzy logic tries to capture these shades of gray. Our golf rule means that we rush to play golf in the sunshine, stay home during hurricanes, and in between we may or may not play golf depending on how hard it is raining.
Fuzzy logic uses fuzzy sets. Crisp sets have sharp distinctions between true and falsehood (e.g. it is either raining or not). Fuzzy sets have blurred boundaries (e.g. it is barely raining). Typically, a membership function is used to denote our belief in some concept. For example, here are some membership functions for the belief that a number is positive. All these beliefs have the same membership function:
1/(1+exp(-1*x*crisp))
Or, in another form...
(defun crisp (x a b crisp)
"Return a point in a sigmoid function centered at (a - b)/2"
(labels ((as100 (x min max)
(+ -50 (* 100 (/ (- x min) (- max min))))))
(/ 1 (+ 1 (exp (* -1 (as100 x a b) crisp))))))
(defun fuzzy-grade (x &optional (a 0) (b 1) (slope 0.1))
(crisp x a b slope))
where the crisp parameter controls how sharp is the boundary in our beliefs:

Note that for large crisp values the boundary looks like classical logic: at zero a number
zaps from positive to negative.
However, in the case of our golfing rule, there are
shades of gray in between sunshine and hurricanes.
We might believe in playing golf, even when
there is a little rain around.
For these situations, our beliefs are hardly crisp, as modeled by setting
crisp to some value less than (say) one.
The crisp value lets us operationalize a set of linguistic variables or hedges such as ``barely'', ``very'', ``more or less'', ``somewhat,'' ``rather,'' ``sort of,'' and so on. In this approach, analysts debrief their local experts on the relative strengths of these hedges then add hedge qualifiers to every rule. Internally, this just means mapping hedges to crisp values. For example:
definitely: crisp=10 sort of: crisp=0.5 etc
Alternatively, analysts can ask the local experts to draw membership functions showing how the degrees of belief in some concept changes over its range. For example:

Sometimes, these can be mapped to mathematical functions as in the following:
(defun fuzzy-triangle (x a b c) (max 0 (min (/ (- x a) (- b a)) (/ (- c x) (- c b))))) (defun fuzzy-trapezoid (x a b c d) (max 0 (min (/ (- x a) (- b a)) 1 (/ (- d x) (- d c)))))
Note that the function need not be represented mathematically. If the local experts draw any shape at all, we could read off values from that drawing and store them in an array. For example, an analyst can construct an array of values for various terms, either as vectors or matrices. Each term and hedge can be represented as (say) a 7-element vector or 7x7 matrix. Each element of every vector and matrix a value between 0.0 and 1.0, inclusive, in what might be considered intuitively a consistent manner. For example, the term ``high'' could be assigned the vector
0.0 0.0 0.1 0.3 0.7 1.0 1.0
and ``low'' could be set equal to the reverse of ``high,'' or
1.0 1.0 0.7 0.3 0.1 0.0 0.0
The AND, OR, NOT operators of boolean logic exist in fuzzy logic, usually defined as the minimum, maximum, and complement; i.e.
[1] truth (not A) = 1.0 - truth (A)
[2] truth (A and B) = minimum (truth(A), truth(B))
[3] truth (A or B) = maximum (truth(A), truth(B))
Or, in another form...
(defmacro $and (&rest l) `(min ,@l)) (defmacro $or (&rest l) `(max ,@l)) (defun $not (x) (- 1 x))
When they are defined this way, the are called the Zadeh operators, because they were originally defined as such in Zadeh's original papers.
Here are some examples of the Zadeh operators in action:
[1] truth (not A) = 1.0 - truth (A)


[2] truth (A and B) = minimum (truth(A), truth(B))



[3] truth (A or B) = maximum (truth(A), truth(B))



Here are some more examples:
Here's yet another example
Some researchers in fuzzy logic have explored the use of other interpretations of the AND and OR operations, but the definition for the NOT operation seems to be safe.
Note that if you plug just the values zero and one into the definitions [1],[2],[3], you get the same truth tables as you would expect from conventional Boolean logic. This is known as the EXTENSION PRINCIPLE, which states that:
Assume that we have some fuzzy sub membership functions for combination of TALL and OLD things defined as follows:
function tall(height) {
if (height < 5 ) return Zero;
if (height <=7 ) return (height-5)/2;
return 1;
}
function old(age) {
if (age < 18) return Zero;
if (age <= 60) return (age-18)/42;
return 1;
}
function a(age,height) { return FAND(tall(height),old(age)) }
function b(age,height) { return FOR(tall(height), old(age)) }
function c(height) { return FNOT(tall(height)) }
function abc(age,height) {
return FOR(a(age,height),b(age,height),c(height)) }
(The functions FNOR, FAND, FOR are the Zadeh operators.)
For compactness, we'll call our combination functions:
a = X is TALL and X is OLD
b = X is TALL or X is OLD
c = not (X is TALL)
abc= a or b or c
Here's the OLDness functions:

Here's the TALLness function:

Here's (c); i.e. the not TALLness function:

Here's (a): TALL and OLD

Here's (b): TALL or OLD

Here's (abc): a or b or c

Methodology, a fuzzy logic session looks like this:
Suppose we have an output fuzzy set AMOUNT which has three members: ZERO, MEDIUM, and HIGH. We assign to each of these fuzzy set members a corresponding output value, say 0 for Zero, 40 for MEDIUM, and 100 for HIGH. A Defuzzification procedure might then be:
return (ZERO*0 + MEDIUM*40 + HIGH*100)/(0+40+100)
How close is the enemy?
How large is the enemy's forces?
What is the size of the threat?
PROXIMITY
SIZE close medium far
-------- ----------------------
tiny medium low low
small high low low
moderate high medium low
large high high medium
We need some membership functions:
(defun dist (d what)
(case what
(range '(close medium far))
(close (fuzzy-triangle d -30 0 30))
(medium (fuzzy-trapezoid d 10 30 50 70))
(far (fuzzy-grade d 0.3 50 100))
(t (warn "~a not known to dist" what))))
(defun size (s what)
(case what
(range '(tiny small moderate large))
(tiny (fuzzy-triangle s -10 0 10))
(small (fuzzy-trapezoid s 2.5 10 15 20))
(moderate (fuzzy-trapezoid s 15 20 25 30))
(large (fuzzy-grade s 0.3 25 40))
(t (warn "~a not known to size " what))))
And we'll need to model the above table:
(defun fuzzy-deploy (d s what)
(macrolet (($if (x y) `($and (dist d ',x) (size s ',y))))
(case what
(range '(low medum high))
(low ($or ($if medium tiny)
($if far tiny)
($if medium small)
($if far small)))
(medium ($or ($if close tiny)
($if medium moderate)))
(high ($or ($if close small)
($if close moderate)
($if close large)
($if medium moderate)))
(t (warn "~a not known to fuzzy-deploy" what)))))
Finally, we need a defuzzication function:
(defun defuzzy-deploy (&key distance size)
"Return how many marines to deploy?"
(let ((low (fuzzy-deploy distance size 'low ))
(medium (fuzzy-deploy distance size 'medium))
(high (fuzzy-deploy distance size 'high )))
(round
(/ (+ (* 10 low) (* 30 medium) (* 50 high))
(+ low medium high)))))
So, how many marines to deploy?
(defuzzy-deploy :distance 25 :size 8)) ==> 19
(What I'm about to say seemed like a good
idea at the time- early 1990s. But most folks who took this seriously
ran into computational problems and had to switch from discrete-space
logic to some continuous variant. However, IMHO,
recent empirical stochastic
results suggest that this all might be worth a second look.
And, much to my surprise,
I see that
others think the same.
Also, technically speaking, what I'm about to show you is my own local
variant of abduction called graph-based abduction that comes
from my
1995 Ph.D. thesis.
For a more general treatment, that is more
standard, see
Bylander T., Allemang D., Tanner M.C., Josephson J.R.
The computational complexity of. abduction, Artificial Intelligence Journal
49(1-3), pp. 25-60, 1991.)
Republicans aren't pacifists by quakers are. Is Nixon (who was both a republican and a quaker) a pacifists?
This is a horrible problem. Observe that it can't be solved by sitting at the Nixon note and querying his immediate parents. You have to search into the network for contradictory conclusions. That is, it can't be solved by some fast local propagation algorithm.
If you can't gather more information, you have two options:
Alternatively, if we are allowed to probe the world for more information, we can:
That is, unlike fuzzy logic (where everything is quickly inferred from hard-wired numbers set at design time), here we must extract and explore the topology of our doubts, then probe the key points.
Formally, this is abduction.
Welcome to abduction, the logic of "maybe". Abduction can be
contrasted with induction and deduction as follows:
rules + antecedents ==> consequences
<antecedent,consequence> <antecedent,consequence> <antecedent,consequence> <antecedent,consequence> ==> rules
rules + consequences ==> antecedentsNote that abduction is not a certain inference since there may exist multiple rules that explain the consequence. For example:
In practice, all these methods run together:
(That's the official story. My own experience is that by the time you get an abductive device going, you have much of the machinery required for deduction and induction. But don't tell anyone,)
Formally, abduction looks like this:
Without rule2, abduction becomes deduction:
With rule2, abduction gets very slow:
That is, when the accountants run the race,
rules 1 & 2 have to be modified:
Assumptions define worlds of belief.
Assumptions come in two forms:
Note that the least informative assumptions are the non-base, non-controversial assumption (the yawn set).
(BTW, is your head spinning yet? Don't you wish we were still doing fuzzy logic?)
An example makes all this clear. Here's a theory:
In this example, our inputs are:
Our goals are to explain the outputs:
We have some background knowledge
The above theory supports the following consistent chains of reasons that start at the inputs and end at the outputs:
In the above, companyProfits=up and companyProfits=down are controversial assumptions. They are also base since they depend on no upstream controversial assumptions.
We say that one world of belief is defined by each maximal consistent subsets of the base controversial assumptions. By collecting together chains of reasons consistent with each such subset, we find two worlds:
So our example leads us to two possibilities:
Each world is internally consistent (by definition) so predictions can be made without checking for inconsistencies. So, ignoring the world generation cost (which can be scary), the subsequent prediction cost is very cheap.
Note the connection of abduction to deduction and induction.
When multiple worlds can be generated and a best operator selects the preferred world: i.e.
Different applications for abduction arise from different best operators: For example, classification is just a special case of prediction where some subset of the goals have special labels (the classes).
Another special case of prediction is validation where we score a theory by the maximum of the number of goals found in any world. This is particularly useful for assessing theories that may contain contradictions (e.g. early life cycle requirements).
Yet another special case of prediction is planning. Here, we have knowledge of how much it costs to use part of the theory and planning-as-abduction tries to to maximize coverage of the goals while minimizing the traversal cost to get to those goals. Note that, the directed graph in the generated worlds can be interpreted as an order of operations.
Monitoring can now be built as a a post-processor to planning. Once all the worlds are generated, we cache them (? to disk). As new information arrives, any world that contradicts that new data is deleted. The remaining worlds contain the remaining space of possibilities.
Minimal fault diagnosis means favoring worlds that explain most diseases (goals) with the fewest inputs; i.e.
Probing is a special kind of diagnostic activity that seeks the fewest tests that rule out the most possibilities. In this framework, we would eschew probes to non-controversial assumptions and favor probes to the remaining base assumptions.
The list goes on. Explanation means favoring worlds containing ideas that the user has seen before. This implies maintaining a persistent user profile storing everything the user has seen prior to this session with the system.
Tutoring is a combination of explanation and planning. If the best worlds we generate via explanation are somehow sub-optimum (measured via some domain-specific criteria) we use the planner to plot a path from the explainable worlds to the better worlds. This path can be interpreted as a lesson plan.
The list goes on. But you get the idea. We started with uncertainty and got to everything else.
Managing uncertainty is a core process in many tasks.
Abduction as a unifying principle for implementing uncertain reasoning.
One Ring to rule them all, One Ring to find them, One Ring to bring them all and in the darkness bind them
Well, we all know what comes next...
Abduction belongs to a class of problems called NP-hard. That is, there no known fast and complete algorithm for the general abductive problem. And we say that after decades of trying to find one.
So if everything is abduction then everything is hard. Hmmm... sounds about right.
But I gave up on abduction when all my abductive inference engines ran into computational walls. The following runtimes come from validation-as-abduction using an interpreted language on a 1993 computer. But the general shape of this graph has been seen in other abductive inference engines.
But I'm starting to think I should come back to abduction. Here's one experiment with an ISSAMP-style world generation method for validation-as-abduction. Repeatedly, this abductive inference engine built one consistent world, making random choices as it went. The algorithm terminated when new goals stopped appearing in the generated worlds. This algorithm ran in polynomial time (as opposed to the complete validation-as-abduction method shown on the left.
It is hardly surprising that an incomplete random search runs faster than a complete one. But the real significant finding was that in the region where both algorithms terminated, the random search found 98\% of the goals found by the complete search.
Why? Well, that's another story for another time but if you are interested, then read this (which is an ok paper) or this (which is a really nice paper).
By now you probably have a pretty strong view on the relative merits of fuzzy logic or abduction (implementation complexity, generality, etc). So what are you going to use in your commercial AI work?
Well, that depends. If you are just playing games then what-the-hell, use fuzzy logic.
And if you are trying to diagnose potentially life threatening diseases using the least cost and fewest number of invasive procedures, you might consider the complexity of abduction well worth the implementation effort.
But you're engineers- you decide.
This is a Lecture | Week2 | Search page, written Sun Jan 13 16:11:05 PST 2008.
Search is a universal problem solving mechanism in AI. The sequence of steps required to solve a problem is not known a priori and it must be determined by a search exploration of alternatives.
In computer science, a state space is a description of a configuration of discrete states used as a simple model of machines. Formally, it can be defined as a tuple [N, A, S, G] where:
Often, there are oracles that offer clues on the current progress in the search
While "g(x)" can be known exactly (since it is a log of the past) while "h(x)" is a heuristic guess-timate (since we have not searched there yet).
The state space is what state space search searches in. Graph theory is helpful in understanding and reasoning about state spaces.
A state space has some common properties:
In this lecture, we will explore two kinds of search engines:
Ordered search is useful for problems with some inherent ordering (e.g.) walking a maze.
Unordered search is useful for problems where ordering does not matter. For example, if a simulator accepts N inputs, then an unordered search might supply M ≤ N inputs and the rest are filled in at random.
In practice, S may not be pre-computed and cached. Rather, the successors to some state "s" may be computed using some "successors" function only when, or if, we ever reach "s". In the following code:
(defun tree-search (states goal-p successors combiner)
(labels ((next () (funcall successors (first states)))
(more-states () (funcall combiner (next) (rest states))
(cond ((null states) nil) ; failure. boo!
((funcall goal-p (first states)) (first states)) ; success!
(t (tree-search ; more to do
(more-states) goal-p successors combiner))))
Example: Monkey pushing a chair under a banana, climbs the chair, eats the banana
(defparameter *banana-ops*
(list
(op 'climb-on-chair
:preconds '(chair-at-middle-room at-middle-room on-floor)
:add-list '(at-bananas on-chair)
:del-list '(at-middle-room on-floor))
(op 'push-chair-from-door-to-middle-room
:preconds '(chair-at-door at-door)
:add-list '(chair-at-middle-room at-middle-room)
:del-list '(chair-at-door at-door))
(op 'walk-from-door-to-middle-room
:preconds '(at-door on-floor)
:add-list '(at-middle-room)
:del-list '(at-door))
(op 'grasp-bananas
:preconds '(at-bananas empty-handed)
:add-list '(has-bananas)
:del-list '(empty-handed))
(op 'drop-ball
:preconds '(has-ball)
:add-list '(empty-handed)
:del-list '(has-ball))
(op 'eat-bananas
:preconds '(has-bananas)
:add-list '(empty-handed not-hungry)
:del-list '(has-bananas hungry))))
If the search space is small, then it is possible to write it manually (see above).
More commonly, the search space is auto-generated from some other representations (like the two examples that follow).
Here's walking a maze where "op" is inferred from the maze description:
(defparameter *maze-ops*
(mappend #'make-maze-ops
'((1 2) (2 3) (3 4) (4 9) (9 14) (9 8) (8 7) (7 12) (12 13)
(12 11) (11 6) (11 16) (16 17) (17 22) (21 22) (22 23)
(23 18) (23 24) (24 19) (19 20) (20 15) (15 10) (10 5) (20 25))))
(defun make-maze-op (here there)
"Make an operator to move between two places"
(op `(move from ,here to ,there)
:preconds `((at ,here))
:add-list `((at ,there))
:del-list `((at ,here))))
(defun make-maze-ops (pair)
"Make maze ops in both directions"
(list (make-maze-op (first pair) (second pair))
(make-maze-op (second pair) (first pair))))
Here's stacking some boxes till they get into some desired order.
(defun move-op (a b c)
"Make an operator to move A from B to C."
(op `(move ,a from ,b to ,c)
:preconds `((space on ,a) (space on ,c) (,a on ,b))
:add-list (move-ons a b c)
:del-list (move-ons a c b)))
(defun move-ons (a b c)
(if (eq b 'table)
`((,a on ,c))
`((,a on ,c) (space on ,b))))
(defun make-block-ops (blocks)
(let ((ops nil))
(dolist (a blocks)
(dolist (b blocks)
(unless (equal a b)
(dolist (c blocks)
(unless (or (equal c a) (equal c b))
(push (move-op a b c) ops)))
(push (move-op a 'table b) ops)
(push (move-op a b 'table) ops))))
ops))
General search methods to:
M(b,d) <= b^d*(1 - 1/b)^(-2)As the branching factor increases, the extra overhead of DFID's repeated search over breadth-first "b^d" search approaches unity:
This stochastic tree search can be readily adapted to other problem types.
Eg#1: scheduling problems
Eg#2: N-queens with the lurch algorithm. From state "x", pick an out-arc at random. Follow it to some max depth. Restart.
Q: Why does Isamp work so well. work so well?
Like tree search, but with a *visited* list that checks if some state has not been reached before.
Visited list can take a lot of memory
A* search: standard game playing technique:
Strangely, order may not matter. Rather than struggle with lots of tricky decisions,
S. Kirkpatrick and C. D. Gelatt and M. P. Vecchi Optimization by Simulated Annealing, Science, Number 4598, 13 May 1983, volume 220, 4598, pages 671680,1983.
s := s0; e := E(s) // Initial state, energy. sb := s; eb := e // Initial "best" solution k := 0 // Energy evaluation count. WHILE k < kmax and e > emax // While time remains & not good enough: sn := neighbor(s) // Pick some neighbor. en := E(sn) // Compute its energy. IF en < eb // Is this a new best? THEN sb := sn; eb := en // Yes, save it. FI IF random() > P(e, en, k/kmax) // Should we move to it? THEN s := sn; e := en // Yes, change state. FI k := k + 1 // One more evaluation done RETURN sb // Return the best solution found.
Note the space requirements for SA: only enough RAM to hold 3 solutions. Very good for old fashioned machines.
But what to us for the probability function "P"? This is standard function:
FUNCTION P(old,new,t) = e^((old-new)/t)Which, incidentally, looks like this:
That is, we jump to a worse solution when "random() > P" in two cases:
Note that such crazy jumps let us escape local minima.
Kautz, H., Selman, B., & Jiang, Y. A general stochastic approach to solving problems with hard and soft constraints. In D. Gu, J. Du and P. Pardalos (Eds.), The satisfiability problem: Theory and applications, 573586. New York, NY, 1997.
State of the art search algorithm.
FOR i = 1 to max-tries DO
solution = random assignment
FOR j =1 to max-changes DO
IF score(solution) > threshold
THEN RETURN solution
FI
c = random part of solution
IF p < random()
THEN change a random setting in c
ELSE change setting in c that maximizes score(solution)
FI
RETURN failure, best solution found
This is a very successful algorithm. Here's some performance of WALKSAT (a simpler version of MAXWALKSAT) against a complete search
You saw another example in the introduction to this subject.
The movie at right shows two AI algorithms t$
solve the "latin's square" problem: i.e. pick an initial pattern, then try to
fill in the rest of the square with no two colors on the same row or column.
This stochastic
local search kills the latin squares problem (and, incidently, many other problems).
This is a Week1 | Lecture | Start page, written Sat Dec 1 08:36:24 PST 2007.
AI is
the study and design of intelligent agents
where an intelligent agent is a system that
perceives its environment and takes actions
which maximizes its chances of success.
Newell (1982) characterized the actions of such an knowledge level agent as...
(Actually, it probably also means looking after your leap, so you can learn from the past to be more rational in the future.)
Note that Newell makes no commitments as to how the knowledge level is operationalized. Underneath the knowledge level there could be any number of substrates (biological, mechanical, a collection of wind-powered beer cans, whatever) that implement rationality.
Now you might object at this separation of "rationality" from "humanity". You might protest that the only thing that can be rational like a person is another person. And many people would agree with you.
But I don't think that I am the only kind of thing that can think. That would be like saying that only birds can fly and that air planes, which don't flap their wings, don't "really" fly.
What I do think is that there is some abstract notion of flying/thinking that is independent of birds/humans. Like Spock said: "Intelligence does not require bulk, Mr. Scott".
Every computer scientist knows this to be true. Two generations of algorithms research has shown that there exist properties of computation that are independent of what processor the algorithm runs on, or the implementation language. Dijkstra once said "computer science is no more about computers than astronomy is about telescopes"- and he could have been talking about AI.
Not convinced?
Well, try another example. Do you think that a robot could/should walk like a human?
(see movie, right, of Dinesh Pai's
Platonic Beast).
This little fellow walks by occasionally throwing a spare limb over
the top of itself. Such a move would tear us apart, but it is the natural way
to do it for that kind of walking thing.
And here's another example:
The movie at right shows two AI algorithms trying to
solve the "latin's square" problem: i.e. pick an initial pattern, then try to
fill in the rest of the square with no two colors on the same row or column.
So, once again, how we best think is a local decision, based on the properties of the thing doing the thinking. And just because humans do it one way, does not mean that that is the best way for AIs to do it.
(Note an opposing view to the above. The literature is full of claims that
AI works like people do. For example in Edward Feigenbaum's knowledge transfer view- which I don't agree with- building knowledge-based systems was like
"mining the jewels in the
expert's head"; i.e. looking at the cogs and wheels in people's head and replicating them on a computer.
While I agree that at the knowledge level, beer cans can think like
the wet-ware between our ears, I think we need to respect the substrate in order to select the best method for implementing rationality.
And
whatever substrate we select,
some issues will be the same; e.g.
Newell's knowledge level insight and
issues relating to representation and search.)
Based on all this, I offer two predictions for the future. One, that we will see a growing number of rational computers but, two, they are going to be aliens (i.e. won't work exactly like human intelligence) with very different motivations, needs, and desires to us. Instead, the 21st century will see a menagerie of many different kinds of intelligence. Some you'll know about, like the book-buying assistants wired into Amazon.com that sometimes send you recommendations about what books to read. And some you won't even see-
But does this sounds crazy to you? Too optimistic? Where is the proof, you might demand, that this different-to-humans AI-approach is equal to (or better than) the human way?
Well, there's lots of proof. AI is no longer a bleeding-edge technology -- hyped by its proponents and mistrusted by the mainstream. AI has achieved much:
But don't expect AI to be all flash and dazzle. In the 21st century, AI is not necessarily amazing. Rather, it's often routine. Evidence for AI technology's routine and dependable nature abounds. See for example, the list of applications in a special issue I edited 21st-Century AI - Proud, Not Smug IEEE Intelligent Systems (May/June 2003).
Hopefully, that is enough motivation for you. As Nils Nilsson says:
This is a Lecture page, written Mon Dec 3 10:28:39 PST 2007.
(Warning: this article may contain heresies.
Also, it is quite dated since I wrote it in the mid- 1990s. So this is a passionate argument about stuff most folks don't care about anymore.)
Human "knowledge" is a context-sensitive, approximate and inaccurate hypotheses that requires continually testing. Compton argues that knowledge is a context-sensitive construct [1]. Models may be inappropriate when used out of the context in which they were elicited [2].
Knowledge representation theorists stress that our KBs are approximate surrogates of reality [3-5]; i.e. there accuracy is doubtful. Contrast this view with Ed Feigenbaum who described knowledge engineering as "mining the jewels in the expert's head" [6]; i.e. KBs were representations of what experts actually have in their heads. The changeover from the Feigenbaum "expertise transfer" view and the modern "knowledge-modelling" view seems to have occurred in the early nineties [7]. O'Hara notes that some KR theorists still make occasional claims that their KR theory has some psychological basis. However, when pressed, their public line is that representations are models/ surrogates only [8].
Popper argues that all knowledge is an hypothesis since nothing can ever be ultimately proved; our currently believed ideas are merely those that have survive active attempts to refute them [9]. Compton describes knowledge acquisition (KA) cycles where "test" is the dominate technique [10, 11]. Elsewhere we have argued that KE methodologies based on testing can out-perform alternative, more complicated, methodologies [12].
Our knowledge representation (KR) research is somewhat flawed if we uncritically enshrine our knowledge bases (KB). Clancey argues the knowledge structures found during knowledge acquisition (frames, rules, etc) are structures created on-the-fly in response to the specifics of the situation in which they were elicited (the example being studied, the experts used, etc); i.e. they have little/no isomorphism with structures present in an expert's information processing system. Clancey is silent on where these structures come from (but hints that the substrate may be neural). [13]
This is a Lecture | Week3 page, written Tue Jan 22 12:24:46 PST 2008.
This subject is about inhuman AI, all the tricks that computers can use to be smart that humans may or may not use.
Just to give the humans' a little more equality in this subject, today were going to talk about humans and AI. The field of cognitive science is devoted to discovering more about human intelligence using insights from a range of other areas including:
Brief notes on all these follow. Note that:
Human brain cells are very different to computer chips. In your brain, there is:
A never cell can have up to 1000 dendritic branches, making connections with tens of thousands of other cells. Each of the 1011 (one hundred billion) neurons has on average 7,000 connections to other neurons.
It has been estimated that the brain of a three-year-old child has about 1016 synapses (10 quadrillion). This number declines with age, stabilizing by adulthood. Estimates vary for an adult, ranging from 1015 to 5 x 1015 synapses (1 to 5 quadrillion).
Just to say the obvious- that's a BIG network.
Neuro-physiology is a very active field. The latest generation of MRI scanners allow for detailed real-time monitoring of human activity, while they are performing cognitive tasks.
This field shows great promise but, as yet, they are still working on locomotion and pain perception and vision and have yet to rise to the level of model-based reasoning.
The field of neural networks originally began as an experiment in exploiting massive repetition of a single simple structure, running in parallel, to achieve cognition. As the field evolved, it turned more into some curve fitting over non-linear functions (and the tools used to achieve that fit have become less and less likely to have a biological correlate).
For another example of AI research, initially inspired by a biological metaphor, see genetic algorithms.
Noam Chomsky is one the towering figures of the 20th century. He's a linguistic and a political commentator. Every few years he disappears to re-emerge with a new book the redefines everything. For example, a lot of computer science parsing theory comes from Chomsky's theory of language grammars.
In AI circles, Chomsky is most famous for his argument that we don't learn language. Rather, we are born with a general grammar and , as a child grows all up, all they are doing is filling some empty slots referring to the particulars of the local dialect.
This must be so, argues Chomsky otherwise language acquisition would be impossible.
The implications are staggering. Somewhere in the wet-ware of the brain there is something like the grammars we process in computer science. At its most optimistic, this also means that grammar-based languages (like LISP, etc) have what it takes to reproduce human cognition. We'll return to this below (when we talk about the "physical symbol system hypothesis").
But is there really a "language" of thought? Or is this just an interpretation of chemicals sloshing around the dendrites (under the hood) which we interpret as language.
Well, there is evidence of some model-based manipulation by our wet ware. In classic mental rotation experiments, it was shown that the time required to check if some object was rotation of another was linear proportional to the size of the rotation. It is as if some brain box is reaching out to a sculpture of the thing we are looking at, the turning it around at some fixed rate.
Anyway, if ask a philosopher, "is it really neurons, or are their symbolic
models in between our ears?", they might answer who cares?. Whatever stance works best
is the right one.
Daniel Dennett asks a simple questions. Try and beat a chess playing program. What are you going to do?
Which is the right stance? The answer is, it depends. What do you want to do? Stop being short circuited by a loose wire? You want the physical stance? Beat the program at chess? You want the intentional stance.
Bottom line: a computer is not just "a machine". It is a mix of things, some of which are best treated like any other intelligence.
Don't believe me? Well, pawn to king four and may the best stance win.
(By the way, for a good introduction to AI and philosohy, see The Mind's I.)
I think therefore I am. I don't think therefore...
There used to be a savage critic by certain philosophers along the lines that AI was impossible. For example, John Searle is a smart guy. His text Speech Acts: An Essay in the Philosophy of Language. 1969 is listed as one of the most cited works on the 20th century.
In one of the most famous critiques of early AI, Searle invented the Chinese Room: an ELIZA-like AI that used simple pattern look ups to react to user utterances. Searle argued that this was nonsense- that such a system could never be said to be "really" intelligent.
Looking back on it all, 27 years later, the whole debate seems wrong-headed. Of course ELIZA was the wrong model for intelligence- no internal model that is refined during interaction, no background knowledge, no inference, no set of beliefs/desires/goals, etc.
Searle's argument amounts to "not enough- do more". And we did. Searle's kind of rhetoric (that AI will never work) fails in the face of AI's many successes.
Here's some on-line resources on the topic:
There is some mathematical support for Searle's pessimism. In 1930, the philosophical world was shaken to its foundation in 1930 by
a mathematical paper that proved:
That is, formal systems have fundamental limits.
So Godel's theorem gives us an absolute limit to what can be achieved by "formal systems" (i.e. the kinds of things we can write with a LISP program).
Godel's theorem might be used to argue against the "logical school" of AI. If formal logics are so limited, then maybe we should ignore them and try other procedural / functional representations instead:
Godel's theorem is somewhat arcane. He showed that somethings were unknowable but he did not say what those things are.
Enter Steve Cook. In 1971, he showed that commonly studied problems (e.g. boolean satisfiability) belong to a class of systems for which the solution takes exponential time to compute.
An army of algorithms researchers have followed Cook's lead and now there are vast catalogues of commonly studied programs for which there is no known fast (less than exponential time) and complete (no heuristic search) solution.
O.K., so formal systems can never be omniscient, but how good do you have to be to "as smart as humans"?
The answer is, sometimes, not very smart at all. The cognitive psychology literature is full of examples where humans repeatedly reason in characteristic sub-optimal ways (see the wonderful Wikipedia page listing 35 decision-making biases, 28 biases in probability and belief, 20 social biases, and 7 memory errors).
In fact, one the early successes of AI was not replicated some human cognitive skills, but also human cognitive failings. In the 1970s, AI researchers adopted the physical symbol system hypothesis:
The AI work did not come in isolation. The physical symbol system hypothesis, for example, owed much to decades of psychological research. In particular, the cognitive psychology research that evolved as a reaction to behaviorism (from the early part of the 20th century). In its most extreme view, behaviorism denied all internal states and allowed for only the objective study of externally observable behavior (i.e. no mental life, no internal states; thought is covert speech).
Well, that flew for a few decades then it just ran out of steam. After decades of trying to map human behavior into (what seems now) trite stimulus response models, cognitive psychology made the obvious remark that the same input yields different outputs from different people because of their internal models. That is, intelligence is not just a reaction to the world. Rather, it is the careful construction and constant review of a set of internal states of belief which we use to decide how to best act next.
Maybe just because a representational system like LISP is limited does not mean that it is useless:
And anyway, if I am only trying to be as good as human intelligence, then sometimes I don't need to try too hard.
So please sleep easy tonight. And keep typing away at LISP.
This is a Lecture | Philosophy page, written Sat Dec 15 06:26:11 PST 2007.
John Searle is a smart guy.
His text Speech Acts: An Essay in the Philosophy of Language. 1969 is
listed as one of the most cited works on the 20th century.
In one of the most famous critiques of early AI, Searle invented the Chinese Room: an ELIZA-like AI that used simple pattern look ups to react to user utterances. Searle argued that this was nonsense- that such a system could never be said to be "really" intelligent.
Looking back on it all, 27 years later, the whole debate seems wrong-headed. Of course ELIZA was the wrong model for intelligence- no internal model that is refined during interaction, no background knowledge, no inference, no set of beliefs/desires/goals, etc.
Searle's argument amounts to "not enough- do more". And we did. Searle's kind of rhetoric (that AI will never work) fails in the face of AI's many successes.
Here's some on-line resources on the topic:
This is a Week1 | Lecture page, written Fri Jan 11 12:42:23 PST 2008.
AI and advanced AI techniques. Spring 2008. LCSEE, WVU