We know one mechanism that produced an intelligence (although a pretty messed-up one): the evolution. Initially algorithmically very simple, evolution changes heritable traits at random & evaluates results for reproductive fitness. But biological evolution is obscenely inefficient because intelligence is only one element of reproductive fitness, & selection is extremely coarse-grained: on the level of a whole genome rather than individual traits.
From the definition: an ability to predict/plan by discovering & projecting patterns, a fitness function specific to intelligence is predictive correspondence of the recorded input patterns.
Correspondence is a representational analog of reproduction, maximized by an internalized evolution:
the heritable traits for predictions are past inputs, variation is a change of their coordinates & resolution, selected by their fitness: cumulative match to the following inputs. This match is discovered by comparison: an iterative subtraction, which also adds syntactic variation by derivation.
Match (fitness) should be quantified on the lowest level of comparison, - this makes the selection process more incremental & the subsequent search exponentially more efficient.
This lowest level is the comparison between two single-variable inputs, & the match is partial identity: a complimentary of the difference, or the smaller of the comparands. This is also a measure of analog compression: a sum of bitwise AND between uncompressed comparands (represented by strings of ones). This adds a whole new dimension to Bayesian logic: I quantify partial match or occurrence (a micro-dimension of prediction), just like Bayesian logic adds quantified partial probability (a macro-dimension of prediction) to classical logic.
To speedup, the search algorithm must incorporate increasingly more complex shortcuts to discover better predictions. The speed is what it’s all about, otherwise we can just sit back & let the biological evolution do the job. More complex predictions (patterns) & pattern discovery algorithms (meta-patterns) are derived from the past inputs with incremental comparison range & derivation depth. This includes introspective cognition: reinput & comparison among the higher derivatives.
The most basic predictive shortcuts are based on the assumption that the environment is not random:
- past patterns are decreasingly predictive with the distance from subsequent inputs.
- each pattern is increasingly predictive with the accumulated match among constituent inputs, & decreasingly predictive with the difference between them.
A core algorithm based on these assumptions would be an iterative step that selectively increases range of search & resulting complexity of the patterns in proportion to their projected cumulative match:
The original inputs are single variables produced by senses, such as pixels in case of visual perception.
Their subsequent comparison by iterative subtraction generates new variables: length & aggregate value of partial match & of partial miss (derivatives) for each variable of the comparands. The inputs are integrated into patterns (higher-level inputs) if the additional projected match is greater than the system's average for the computational resources necessary for additional syntactic complexity. This compressive syntax expands with every new level of search: each variable of an input pattern conditionally forms its own pattern.
On the other hand, if predictive value (projected match) falls below the systems' average, the input pattern is aggregated with adjacent "subcritical" patterns by iterative addition, into a lower-resolution input. Aggregation results in a "fractional" projection range for constituent inputs, as opposed to "multiple" range for matching inputs. By increasing magnitude of the input it also increases average projected match: a subset of the magnitude. Aggregation generates averages to evaluate match of the future inputs, & to determine their coordinates & resolution. So, alternative integrated|aggregated representations of inputs are produced by iterative subtraction|addition (the neural analogs are inhibition & excitation). Again, the choice between the two is determined by prior accumulated comparison of the respective inputs.
Thus, cognition is a form of evolution where variation does not proceed by altering the inputs directly: a prediction can only be derived from experience. Rather, variation redefines the source & resolution of inputs, & generates derivatives (higher syntax / higher level inputs) by comparing them. Such variation is incremental rather than random: the resolution & syntactic complexity of inputs increase or decrease in proportion to their relative cumulative match which is the selection criterion. I consider this to be a higher phase of metaevolution, a subject of my other knol. Cognition is driven by predictive fitness, where the patterns themselves are dispensable, compared to biological evolution driven by reproductive fitness, where the patterns (genomes) are the end in themselves.
The biggest hangup people usually have is that this kind of algorithm is obviously very simple, while working intelligence is obviously very complex. But, as I tried to explain, additional complexity is learnable and should only improve speed, rather than change the "direction" of cognitive evolution (although it may save some eons). The main criterion for such algorithm is the ratio of benefit: predictive power, to cost: complexity.
I would summarize such algorithm as Comparison-Projection, - a more constructive analog to Jeff Hawkins' Memory-Prediction. Hope this makes sense. I have a far more advanced work-in-progress, but need a feedback on these core premises first. If correct, they are already way ahead of any other approach to my knowledge.





Richard Bunker
Invite as author
How to filter out the improbable seems to me to be the key
In any case, it would be a delight to hear from you Vitya/Burya. rick at bunkerplanet dot com.
PS: I am boris"."k"@"verizon"
EditSaveCancelDeleteDeleteBlock this userReport abusive commentHide report window
David Fendrich
Invite as author
AGI
I have written a few articles on self improving AI here: http://seedai.blogsp
In those, I agree with much of what is written here, for example "If we want to talk about improving programs, we have to define what it means to improve one's intelligence, and thus what it means to be intelligent. We want intelligent systems to be useful. Useful intelligence is, just as science, about prediction, planning and pattern recognition. These are all so intertwined as to be more or less the same thing."
You are very welcome to read and post your thoughts on my articles.
You're right, it sounds very similar on a high level, & I am sure there are many people who'd agree with the definition But I don't know of anyone who used it to derive a universal, low-level, quantitative criterion to select inputs & algorithms. The key is to start from the beginning: raw sensory inputs, & "test" their predictive value, in the process discovering more & more complex patterns. That's what scalability is all about, if you can't evaluate pixels, it'll be super-exponentially more difficult to start from more complex data. That's why I think Cyc, NLP, & high-level approaches in general are hopeless for AGI.
I am sorry, but your "Intelligence test" idea, besides being entirely hypothetical & presumably externally administered, has it exactly backwards. Just like many Algorithmic Learning approaches, you want to generate patterns & algorithms, instead of discovering them in a real world. Quite simply, we predict from experience, these patterns & algorithms will have *no* predictive value beyond mere chance, unless they're derived from the environment. Notice that the difference between patterns & algorithms is strictly in the origin: the former are discovered & the later are "invented".
EditSaveCancelDeleteDeleteBlock this userReport abusive commentHide report window
Francesco Lentini
Invite as author
How about semantics?
Semantics(meaning) must be learned from experience, starting from sensory inputs. What I suggest a conditionally iterative learning algorithm, & syntax here is simply a record of operations perfomed by this algorithm on a given set of inputs. Such record is necessary to maintain comparability(readab
Thanks for the pointer, I'll take a look.
EditSaveCancelDeleteDeleteBlock this userReport abusive commentHide report window
Returning now to your general intelligence definition, the focal point is the criterion of improvement. Can you explain better which this criterion should be, and/or can you furnish a practical example?
EditSaveCancelDeleteDeleteBlock this userReport abusive commentHide report window
Appreciate your interest in my "focal point". The criterion for intelligence is *predictive correspondence concentration*, or relative cumulative match of expectations to the following inputs. I've defined match on the lowest, single-variable, level. It's the same on higher levels, where inputs are multi-variable sequences. As long as you synchronize the syntax of the comparand sequences, the total match is the sum of corresponding variables' matches between the sequences. I suppose you're looking for NL-level examples, & that's where it gets extremely ambiguous. That sort of data went through a huge number of process iterations, & you have to rely on intuition to track it.
Take a look at "On Intelligence" by Jeff Hawkins, he is a lot better at high-level examples than I am.
EditSaveCancelDeleteDeleteBlock this userReport abusive commentHide report window