## Profile

Deen Abiola
5,028 followers|362,470 views

## Stream

### Deen Abiola

Shared publicly  -

We might as well start thinking about the proper treatment of agents that are less able than us (whatever we may be). This paper presents the case for subjective experience and the basis of consciousness as present in insects.

http://www.pnas.org/content/early/2016/04/13/1520084113.full.pdf

Generalized Ethics.

Stronger reason to view those remote controlled roaches as unethical. Argument: though they are not sophisticated enough to value their life (which is different from avoiding death), they have sophisticated enough mental representations to have simple plans and goals [+]. Thus robotic control implements impinge on their agency. Axiom: All the worst crimes on conscious entities are attacks on their agency. Combining all this, it is thus worse to control the roaches than to kill them.

[+] Several insect species have been shown to be able to plot novel routes based on learned landmarks and goals, evidencing a spatial relation of landmark information (98, 99)."19.08697039

"Motivational factors will influence the prioritization of targets and therefore action selection, but the location of targets will also influence which is selected, and which actions should be taken (8)."

214 "These findings show how the brain responses to environmental stimuli in insects are not driven simply by the primary sensory input, but rather by egocentric characteristics: whether the context of the stimulus needs to be updated or whether the stimulus is a point of immediate navigational reference (101, 102)."26.5

101 "For humans at least, this spatial “model” is further enhanced by processing within the subcortical dorsal pulvinar (one of the thalamic nuclei, part of the basal ganglia) (32), which adds color, three-dimensionality, and an egocentric first-person perspective to the human conscious experience of space." 26.13621885
====

[+] "A compelling demonstration of the command function of the insect brain for the total behavioral system of the insect is the effect of focused injection of neurotransmitter agonists and antagonists to the region of the central complex (CX) of the insect brain."

[+] "Although there is some variation in the structure of the CX between insect orders, this structure features in all insects and crustaceans and is likely homologous to the arcuate body of arachnids (87, 135, 136), suggesting a form of the CX evolved before the radiation of insects, crustaceans, and spiders."24.2212795

[+] Action selection in nematodes is driven by
shifts in global brain dynamics (119), and there is sufficient plas-
ticity in the nematode nervous system for their responses to vary
with system state (120). However, there is no evidence that
nematodes can actively hunt for things beyond their immediate
sensory environment. Hungry nematodes respond to starvation
with increased locomotion and dispersal in a random, rather than
directed, search (121, 122). By contrast, hungry rodents, ants, and
bees will navigate to places where they have previously encountered food. Their internal state of hunger triggers a highly di-
rectional and oriented food search focused on locations where
food was previously experienced, even if no food stimuli are
currently present (123–125)﻿
6
2

I agree, see my reply to  where I sketch out that there are situations where curtailing agency in the short term brings about a better outcome in the long run.

In this case I was also careful to carve out the particular cases of indefinite chaining (which does happen) and caging as the real problem.
---

But I do not think your argument is really against the statement: "All the worst crimes on conscious entities are attacks on their agency."

Your examples, they are limiting in the short term but actually open up more opportunities in the long run by for one increasing the chance for survival and remaining healthy. I don't think remote controlling a roach has any of that long run  positive balance.

Yeah, I'm saying that all the worst crimes are attacks on agency not that all limits on agency are the worst crimes.﻿

### Deen Abiola

Shared publicly  -

On learning (simple) programs, using Machine learning to get languages more lenient to errors and with less useless error messages, advantages (and limitations) of restricting to smoother differentiable programs (e.g 'neural' turing machines), some on alpha go, machine learning technical debt and limitations. Interpretability, reuse and composition of models: no good answer.
-------
Take away: still oh so far from AI.﻿
5
2

The Go thing really depended on which expert you were talking to. Often, most of them were not experts in machine learning but some other vaguely related field like physics or regular programming.

Turing predicted that Go and chess and the like (back in 1948) would be easier than language processing. And he was right.

Myself, in 2013, I made a comment:
>  It is why computers can do all sorts of thinking things (finance, trading, prediction, calculations, chess - soon go and poker) much better than us but find kicking, throwing and vision so difficult.

And in June last year I sketched out the design of what might go into a Go playing bot (which isn't impressive as Norvig points out, their approach is what anyone current would have guessed but it's one thing to sketch something out and another thing altogether to actually figure it out in full!).

>  But computers, they have the ability to learn from self play and an immense amount of perfect memory. <br /><br />It&#39;s only a matter of time till someone combines a clever iteration of many-layer neural nets with monte carlo tree search and Go goes on the pile of things that don&#39;t count as requiring intelligence anymore.

> Go will likely be next and after that Poker ﻿

### Deen Abiola

Shared publicly  -

The blue whale heart is huge but not as large as has been commonly depicted.

More fascinating to me about whales is not their size but that their size increase does not come with a concomitant increase in cancer rates.

This is known as Peto's Paradox and  "is the observation, due to  Richard Peto , that at the species level, the  incidence  of  cancer  does not appear to correlate with the number of  cells  in an organism. [ 1 ]  For example, the incidence of cancer in humans is much higher than the incidence of cancer in  whales . [ 2 ]  This is despite the fact that a whale has many more cells than a human. If the probability of  carcinogenesis  were constant across cells, one would expect whales to have a higher incidence of cancer than humans".

Puzzle: Amongst the warm blooded, how come this relationship between size and robustness doesn't hold as much for birds in general? (Birds--parrots especially--can be very small and yet have life-spans in the range of humans. Bats and NMRs buck this trend in mammals too). What's going on there?

8
1

Whale anatomy is just bizarre.  No other animal can survive the changes in compression involved.  A blue whale can dive 500 meters and resurface without getting the bends.  That's a water pressure of over 5 megapascals, about 750 PSI.  Some whales are capable of diving 3000 meters for a total time of 138 minutes.  That's about 40 megapascals, almost 6000 PSI.  The air in the whale's lungs is compressed to less than a tenth of its original size - along with everything else.  Just amazes me....﻿

### Deen Abiola

Shared publicly  -

#The Current Best Hypothesis is that the Brain is Computable

There are many (most?) people who dispute the idea that the brain is computable—there is something different and special about the human brain, they say. It is not possible to dispute this for now, but my own stance is a basic one: You may be right that the brain is somehow magical but my position is simpler and, all things being equal, more likely to end up as the correct one.

The argument that the brain is not a machine broadly rests on three ideas: those who lean to science and say: something, something quantum or another (quantum gravity if you want to be really fancy), or those who think it something magical, such as possessing a soul. The final group simply argue that the brain is not computable.

##The Brain is not Computable

It is not uncommon to see the argument put forward that the brain is not computable, that what computers do is mere mechanistic cranking of mathematical algorithms. This is true, but who's to say the brain is also not doing this?

Occam's razor, Bayesian smoothing and regularization are all tools to keep one from over-fitting the evidence and failing to generalize. They are not laws, but tools to help you minimize your regret—make the fewest learning mistakes—over time. They do not say your idea must be simple, only that it does not say more than is possible given the data. The idea that the brain is computable fits within this regime as the hypothesis that is the simplest fit to the data. Why?

I often hear the point made that since people once compared the brain to clockwork and steam engines—comparisons we now know to be false—what makes you think an equivalence (and not just analogy) with computers won't show the same failing in time? Small aside: steam engines and the brain, thanks to the link between thermodynamics and information, is actually more interesting than what one might at first think.

###Universal Turing Machines

Turing Machines are, unlike a clock, universal. They can emulate any machine or procedure that is "effectively calculable". Our physical theories might use crutches such as real numbers or infinities but are, at the end of the day, only testable using computable procedures and numbers. This is what sets Turing Machines apart: any testable quantitative theory about the universe we can expect to devise will be simulatable (given enough time) on a Turing Machine (note: this is not the same thing as The Church Turing Thesis, instead of placing the restriction on the universe as CT does, it places it on any testable theory that compresses data. That is, more than a map from observation to expected outcome).

Even for the case that some physical things like the brain cannot be computed, it is simpler to believe that whatever non-computability the brain exploits is not unique to the exact biochemical make up of brains.

##Machines cannot have Souls

Interestingly, Occam's Razor applies here too, and my argument is short. Even if Souls are a property of the universe unexplainable by science, it is still simpler to believe that the pattern and arrangement of matter that ends up with things acquiring souls is not unique to a gelatin soup of fats and proteins. Something that thinks and acts as if it is conscious, is (in essence, I drop the extra requirement that the object must also be an organic human brain like thing). That, in a nutshell, is also Turing's argument.

But what is fascinating is that computer science has made the idea of a soul a scientific and testable hypothesis. If we do build intelligence (and maybe some of them will be more intelligent than humans in every way measureable) and yet they never wake up or attain consciousness or anything resembling (that is, nothing ever passes for consistently conscious but humans), then this is very suggestive of something unique and special about human beings. Until then, that hypothesis is unnecessarily complex.

##Quantum Mechanics

Quantum mechanics is the go to argument for people who want to appear scientific even while talking nonsense. However, it is possible that the brain does something that our current machines cannot.

It is overwhelmingly unlikely that the brain is a *Quantum Computer*. What we know about quantum mechanics makes this highly unlikely considering how wet, noisy and hot the brain is. It is implausible that coherent and entangled states could remain in such a situation. Additionally, humans do poorly at things we expect Quantum Computers will be good at (things such as factoring, perceiving quantum interactions intuitively—simulating quantum evolution). In fact, regular Turing Machines already outpace us in many areas; we don't focus as much on the fact that we're terrible at deductive reasoning, arithmetic or enumerating the possibilities of a large search space; for those things, it did not take long for computers to surpass human ability.

But, suppose the brain was not quantum mechanical but still leveraged quantum mechanical artifacts for its functioning—artifacts unavailable to our machines—then it is possible that current efforts will not lead to AGI.

In a certain trivial sense everything is quantum mechanical in that an agent adhering to predictions based on the theory will be able to explain the world with the highest accuracy. Of course, with such a broad definition then even the computer you are currently reading this on is a Quantum one. Not at all a helpful distinction.

Yet there is also a non-trivial sense in which quantum effects can be leveraged. We see this with our current processors; part of the difficulty with getting higher speeds and lower power is that (amongst other reasons) quantum tunneling effects are getting in the way. Biological homing mechanisms and photosynthesis have also been implicated with taking advantage of quantum effects.

Evolution is extremely powerful at coming up with unexpected uses to subtle phenomenon. Consider the following, from a fascinating [article](http://cs.nyu.edu/courses/fall11/CSCI-GA.2965-001/geneticalgex):

>A program is a sequence of logic instructions that the computer applies to the 1s and 0s as they pass through its circuitry.  So the evolution that is driven by genetic algorithms happens only in the virtual world of a programming language. What would happen, Thompson asked, if it were possible to strip away the digital constraints and apply evolution directly to the hardware?  Would evolution be able to exploit all the electronic properties of silicon components in the same way that it has exploited the biochemical structures of the organic world?
>
>In order to ensure that his circuit came up with a unique result, Thompson deliberately left a clock out of the primordial soup of components from which the circuit evolved.  Of course, a clock could have evolved. The simplest would probably be a "ring oscillator"-—a circle of cells that change their output every time a signal passes through.
>
> But Thompson reckoned that a ring oscillator was unlikely to evolve because only 100 cells were available.  So how did evolution do it—and without a clock? When he looked at the final circuit, Thompson found the input signal routed through a complex assortment of feedback loops.  He believes that these probably create modified and time-delayed versions of the signal that interfere with the original signal in a way that enables the circuit to discriminate between the two tones. "But really, I don't have the faintest idea how it works," he says.  One thing is certain: the FPGA is working in an analogue manner.
>
>Up until the final version, the circuits were producing analogue waveforms, not the neat digital outputs of 0 volts and 5 volts.  Thompson says the feedback loops in the final circuit are unlikely to sustain the 0 and 1 logic levels of a digital circuit. "Evolution has been free to explore the full repertoire of behaviours available from the silicon resources," says Thompson.
>
>Although the configuration program specified tasks for all 100 cells, it transpired that only 32 were essential to the circuit's operation.  Thompson could bypass the other cells without affecting it. A further five cells appeared to serve no logical purpose at all—there was no route of connections by which they could influence the output.  And yet if he disconnected them, the circuit stopped working. It appears that evolution made use of some physical property of these cells—possibly a capacitive effect or electromagnetic inductance—to influence a signal passing nearby.  Somehow, it seized on this subtle effect and incorporated it into the solution.
>
>But how well would that design travel?  To test this, Thompson downloaded the fittest configuration program onto another 10 by 10 array on the FPGA. The resulting circuit was unreliable. Another challenge is to make the circuit work over a wide temperature range. On this score, the human digital scheme proves its worth.  Conventional microprocessors typically work between -20 0C and 80 0C. Thompson's evolved circuit only works over a 10 0C range—the temperature range in the laboratory during the experiment.  This is probably because the temperature changes the capacitance, resistance or some other property of the circuit's components.

Although this is the result of a genetic algorithm, a similarity with its natural counterpart is found: the exploitation of subtle effects and specificity to the environment it was evolved within. The article shows us two things: how evolution is not bounded by man's windowed creativity but also, that, even if our current designs do not leverage some subtle effect while brains do, there's no reason why we could not build a process that searches over hardware to leverage similar powerful processes. The search could be more guided; instead of random mutations, we have something else that is learning via reinforcement what actions to take for a given state of components and connections (we could have another suggesting components to inject freshness) then we select the best performing programs from the pool as the basis of the next round and appropriately reward the proposal generators.

Returning to the quantum, what, if there were something subtle about ion-channels or neuron vesicles, that allowed more powerful computation than one might expect. Perhaps something akin to a very noisy quantum annealing process is available to all animal brain's optimization and problem solving processes? The advantage need not even be quantum it might even be that perhaps subtle electromagnetic effects or whatever are leveraged in a way that allows more efficient computation per unit time. This argument is one I've never seen made—yet, still, it consists of much extra speculation. Plausible though it is, I will only shift the weight of my hypotheses in that direction if we hit some insurmountable wall in our attempts to build thinking machines. For now, after seeing how very inherently mathematical the operations we perform with [our language are](http://www.iro.umontreal.ca/~memisevr/dlss2015/talk_Montreal_part2_pdf.pdf#page=28) (some may dispute that this is cherry picking but that is irrelevant because the point is the fact that this is possible at all is highly suggestive and strongly favors moving away from skepticism and), it is premature to hold such (and other) needlessly complex hypotheses on the uniqueness of the human brain.

##Conclude

I have not argued against the soul or that the brain is incomputable or somehow special, instead I've argued that such hypotheses are unnecessary given what we know today. And even indirectly, when we look at history, we see one where assumptions of specialness have tended not to hold. The Earth is not the center of the universe, the speed of light is finite, simultaneity is undefined, what can be formally proven in any given theory is limited, a universal optimal learner is impossible, most things are computationally intractable, entropy is impossible to escape, most things are incomputable, most things are unlearnable (and not interesting), there is only a finite amount of information that can be stored within a particular volume (which is dependent on surface area and not volume), the universe is expanding, baryonic matter makes up only a fraction of the universe, earth like planets are common, some animals are capable of some mental feats that humans are not, the universe is fundamentally limited to being knowable by probabilistic means (this is not the same thing as the universe is non-deterministic)!

While one cannot directly draw any conclusions on the brain from these, when constructing our prior (beliefs) it perhaps behooves us to take these as evidence suggesting a weighting away from hypotheses reliant on exception and special clauses.

http://imgs.xkcd.com/comics/turing_test.png﻿
48
13

Bleh, the sad fucker isn't worth the time.﻿

### Deen Abiola

Shared publicly  -

This is a really pretty visualization of how Decision trees work. It's less about machine learning proper, which is actually a strength since it can be that much more concrete.

My only super tiny quibble is with the overview 1). I'll say instead that drawing boundaries applies to discriminative learners only and not to more probabilistic methods (also, not all learners are statistical but apparently there's a duality between sampling and search which messies neat divisions).

I'd also further characterize over-fitting as memorizing the data. Where, model complexity/number of parameters is unjustified given data/outmatches available data. It stems from a lack of smoothing, which is when you don't filter out noise but instead just explain every little detail using some really impressive bat deduction [1]. Humans do this when concocting conspiracies or reasoning based on stereotypes, initial impressions and anecdotes.

[1] http://tvtropes.org/pmwiki/pmwiki.php/Main/BatDeduction

A Visual Introduction to Machine Learning

If you haven't seen this yet, it's pretty awesome!﻿
What is machine learning? See how it works with our animated data visualization.
3
3

### Deen Abiola

Shared publicly  -

I think there's a third option: wake* the road network up. Children playing on the road—anyone crossing—should have ear pieces hooked up to a giant distributed computation from cars running simulations on the next 5 seconds and planning accordingly. It's not far out tech to have that anyone thinking to cross the street must have access to some whispering bayesian network that looks at the conditions of all the cars for some radius and suggests an optimal time to cross, if at all. This will all but disappear the already rare trolley problem. This system would also be able to learn if we put black boxes in cars to gather lots of data on the various more plausible accident scenarios. Assuming car ownership is even still a thing and the car hasn't been hacked, it might decide to not even turn on (thanks to whatever models learned from in-car black boxes) after deeming the probability of loss of control too high.

*I use wake in the loose sense of the emergence of interesting long range coherent oscillations. And something like a kindly giant that moves people out of the way so they're less likely to get stepped on.

I think a big part of the solution will be to stop thinking of cars as individual units and instead start realizing that traffic consisting of self driving cars will be a single extended thing unto itself.

Ever since seeing this article a few days ago, it's been bugging me. We know that self-driving cars will have to solve real-life "trolley problems:" those favorite hypotheticals of Philosophy 101 classes wherein you have to make a choice between saving, say, one person's life or five, or saving five people's lives by pushing another person off a bridge, or things like that. And ethicists (and even more so, the media) have spent a lot of time talking about how impossible it will be to ever trust computers with such decisions, and why, therefore, autonomous machines are frightening.

What bugs me about this is that we make these kinds of decisions all the time. There are plenty of concrete, real-world cases that actually happen: do you swerve into a tree rather than hit a pedestrian? (That's greatly increasing the risk to your life -- and your passengers' -- to save another person)

I think that part of the reason that we're so nervous about computerizing these ethical decisions is not so much that they're hard, as that doing this would require us to be very explicit about how we want these decisions made -- and people tend to talk around that very explicit decision, because when they do, it tends to reveal that their actual preferences aren't the same as the ones they want their neighbors to think they have.

For example: I suspect that most people, if driving alone in a vehicle, will go to fairly significant lengths to avoid hitting a pedestrian, including putting themselves at risk by hitting a tree or running into a ditch. I suspect that if the pedestrian is pushing a stroller with a baby, they'll feel even more strongly this way. But as soon as you have passengers in the car, things change: what if it's your spouse? Your children? What if you don't particularly like your spouse?

Oddly, if you think about how we would feel about such decisions being made by a human taxi driver, people's reactions seem different, even though there's the same loss of autonomy, and now instead of a rule you can understand, you're subject to the driver's secret decisions.

I suspect that the truth is this:

Most people would go to more lengths than they expect to save a life that they in some way cared about.

Most people would go to more lengths than they are willing to admit to save their own life: their actual balance, in the clinch, between protecting themselves and protecting others isn't the one they say it is. And most people secretly suspect that this is true, which is why the notion of the car "being programmed to kill you" in order to save other people's lives -- taking away that last chance to change your mind -- is frightening.

Most people's calculus about the lives in question is actually fairly complex, and may vary from day to day. But people's immediate conscious thoughts -- who they're happy with, who they're mad at -- may not accurately reflect what they would end up doing.

And so what's frightening about this isn't that the decision would be made by a third party, but that even if we ourselves individually made the decision, setting the knobs and dials of our car's Ethics-O-Meter every morning, we would be forcing ourselves to explicitly state what we really wanted to happen, and commit ourselves, staking our own lives and those of others on it. The opportunity to have a private calculus of life and death would go away.

As a side note, for cars this is less actually relevant, because there are actually very few cases in which you would have to choose between hitting a pedestrian and crashing into a tree which didn't come from driver inattention or other unsafe driving behaviors leading to loss of vehicle control -- precisely the sorts of things which self-driving cars don't have. So these mortal cases would be vanishingly rarer than they are in our daily lives, which is precisely where the advantage of self-driving cars comes from.

For robotic weapons such as armed drones, of course, these questions happen all the time. But in that case, we have a simple ethical answer as well: if you program a drone to kill everyone matching a certain pattern in a certain area, and it does so, then the moral fault lies with the person who launched it; the device may be more complex (and trigger our subconscious identification of it as being a "sort-of animate entity," as our minds tend to do), but ultimately it's no more a moral or ethical decision agent than a spear that we've thrown at someone, once it's left our hand and is on its mortal flight.

With the cars, the choice of the programming of ethics is the point at which these decisions are made. This programming may be erroneous, or it may fail in circumstances beyond those which were originally foreseen (and what planning for life and death doesn't?), but ultimately, ethical programming is just like any other kind of programming: you tell it you want X, and it will deliver X for you. If X was not what you really wanted, that's because you were dishonest with the computer.

The real challenge is this: if we agree on a standard ethical programming for cars, we have to agree and deal with the fact that we don't all want the same thing. If we each program our own car's ethical bounds, then we each have that individual responsibility. And in either case, these cars give us the practical requirement to be completely explicit and precise about what we do, and don't, want to happen when faced with a real-life trolley problem.﻿
The computer brains inside autonomous vehicles will be fast enough to make life-or-death decisions. But should they? A bioethicist weighs in on a thorny problem of the dawning robot age.
11
1

...The whole point is to move the domain of discourse out of these false dilemmas and start thinking about what sort of new paradigms could evolve. An area with lots of children will result in the network adjusting itself such that the manner of driving makes trolley problems a close to nil occurrence.

It's simple. Self driving cars will make these sort of issues less common (it will also make them more visible). Communicating self driving cars will further reduce these issues. Communicating self driving cars that include humans in their planning and actions over a wide radius, as well as models learnt from the past--all leading to something that functions holistically is best of all.

(Oh and self-aware cars. As in a car that is reluctant to drive based on its level of injury.)﻿
In his circles
367 people
Have him in circles
5,028 people

### Deen Abiola

Shared publicly  -

> Although one might have thought that the relationship be- tween language and math would depend strongly on the domain of mathematics under consideration, we found no support for this hypothesis. Except for a small additional activation in pos- terior inferotemporal and posterior parietal cortex for geometry statements, all problems in algebra, analysis, topology, and ge- ometry induced correlated and overlapping activations that sys- tematically spared language areas.

> Using elementary algebraic and arithmetic stimuli, previous fMRI and neuropsychological research in nonmathematicians also revealed a dissociation be- tween mathematical and syntactic knowledge (19, 22, 26, 45). Together, those results are inconsistent with the hypothesis that language syntax plays a specific role in the algebraic abilities of expert adults.

> Importantly, however, they do not exclude a transient role for these areas in the acquisition of mathematical concepts in children (10). Imaging studies of the learning process would be needed to resolve this point

> Our main goal was to explore the relationships between high- level mathematics, language, and core number networks.

> In mathematicians, we found essentially no overlap of the math- responsive network with the areas activated by sentence com- prehension and general semantic knowledge. We observed, however, a strong overlap and within-subject similarity of the math-responsive network with parietal and inferior temporal areas activated during arithmetic calculation and number rec- ognition ( SI Appendix , Table S7 ).

> In particular, bilateral ventral inferior temporal areas corresponding to the visual number form area (18, 37) were activated by high-level mathematics as well as by the mere sight of numbers and mathematical formulas. The latter activations were enhanced in mathematicians.

> Corre- spondingly, a reduced activation to faces was seen in the right fusiform gyrus. Those results are analogous to previous findings on literacy, showing that the acquisition of expertise in reading shifts the responses of left ventral visual cortex toward letters and away from faces (38 – 40)

> Our results should not be taken to imply that the IPS, IT, and PFC areas that activated during mathematical reflection are spe- cific to mathematics. In fact, they coincide with regions previously associated with a “ multiple-demand ” system (29) active in many effortful problem-solving tasks (30) and dissociable from language- related areas (46).

> Some have suggested that these regions form a “ general problem solving ” or “ general purpose network ” active in all effortful cognitive tasks (47). Several arguments, however, question the idea that this network is fully domain-general.

> First, we found no activation of this network during equally difficult reasoning with nonmathematical s emantic knowledge. In fact, the easiest mathematical problems caused more activation than the most difficult nonmathematical problems (Fig. 5), and even meaningless mathematical pro blems caused more activation than meaningful general-knowle dge problems (Fig. 4).

> Second, other studies have found a dissociation between tightly matched conditions of linguistic versus logical or arithmetical problem solving (19, 48). Overall the existing literature suggests that the network we identified engages in a variety of flexible, abstract, and novel reasoning processes that lie at the core of mathe- matical thinking, while contributing little to other forms of reasoning or problem solving based on stored linguistic or semantic knowledge

http://www.pnas.org/content/early/2016/04/06/1603205113.full.pdf

Marie Amalrica and Stanislas Dehaenea﻿
(Medical Xpress)—A pair of researchers with Université Paris-Sud and Université Paris-Saclay has found via fMRI human brain studies that the neural networks used to process mathematics are different from those that are used to process language. In their paper published in Proceedings of the National Academy of Sciences, Marie Amalric and Stanislas Dehaene describe experiments they conducted with volunteers willing to undergo fMRI scans while enga...
6
2

To do: Look into faces and right fusiform gyrus. Lower Priority: intersect [ IPS, IT, and PFC ]﻿

### Deen Abiola

Shared publicly  -

#Analogies as (Endo)Functors, an Experiment | http://metarecursive.com/lab/Analogy_as_Functors.htm

What does intelligence mean? I've for some time been thinking about a framework within which one could treat questions on intelligence—whether of human or AI or corporation or evolution—in the spirit of Leibniz. Instead of rambling discussions where everyone brings lots of extravagant baggage, I wish to some day be able to think through (even if not calculate) answers under the framework and ask, do you have something better? (This is not rhetorical, if you do then I can stop wasting my time right away).

The framework starts from intuition, guided by experience of what framing is most fruitful when the task is principled generation of hypotheses. The framework is also grounded mathematically—I achieve this by starting from computational principles and working backwards to what basic math adequately expresses the concepts of interest.  Although the framework is broad; based on computing theory and the link between information theory/thermodynamics, I'll zoom in and focus this article on analogy because I was able to perform an experiment around it.

##Analogical Reasoning

One of the most important aspects of intelligence is analogical reasoning. Analogical reasoning is when concepts from one space are mapped to concepts in another space.  For example, I might ask: what is the biological construct most similar to this physics concept? Or, I might say: what do I know that is like this new thing I am seeing?

By noticing how one thing is similar to another thing, I accomplish two things: first is that I pave the way towards being able to code both concepts in terms of something more general and thus achieving a better compression of both. Second is that by leveraging knowledge of the known structure, I can more quickly learn, probe and design experiments to better understand the new structure. The most creative individuals are really good at exploiting connections between things. Analogies do have the downside that, if the structure is not actually shared across the full space, you end up with a worse understanding of the new space than if you had not tried to accelerate by leveraging analogy.

###Analogies as Functors

Thinking about analogy, the most obvious representation is that it's a structure preserving map from one space to another (although in reality, either in the brain or practically in an AI, I doubt it is ever so direct). In category theoretic language, an analogy can be thought of as a Functor from one category to another. If you know A and wish to learn about B, then you must first learn F, a functor from $A \to B$  and then using knowledge about A, and your guesses about the structure of F, you design experiments with F(A) to further learn about F and B (and how B diverges from A).

From this inspiration, I considered a simpler scenario for a computable experiment. Instead of functors and categories, which I have only a rudimentary understanding of, I use vector spaces and linear transformations. Then we can consider, suppose you have a vector space model *A*, of one topic and you learn a model *B* for another topic, then what you want is a some *X* such that *AX=B*. In this model scenario, the linear transformation *X* is the analogical mapping/functor and can be solved for easily enough using QR factorization. The problem now is how to construct said spaces and then testing if *X* really is an analogical mapping.

This too is easily done, there are a multitude of vector space models: Latent semantic indexing, random indexing, GloVe, Holographic Reduced Representations and word2vec. For my own purposes; across the requirements of speed, memory use, data requirements, online (learning cleanly on the go) and accuracy, random indexing dominates all other choices. The curious thing about random projections is that they are "anti-fragile", they actually turn the curse of dimensionality into a boon. There are even some neuroscience models that hypothesize the brain makes extensive use of random projections. In our case, because random vectors in high dimensions are almost orthogonal to each other, even small nudges are enough to get useful results. In the case of small data this is useful.

###Digression: When word2vec analogies are not useful

You might have heard of word2vec, a shallow neural network based method that approximately "reduces/factorizes" point wise mutual information on words, using co-occurrence stats; its ability to handle analogies using vector math seems impressive at first but upon closer inspection, is found out as ultimately a narrow gimmick with limited applicability.

The word2vec analogies fail quite often beyond well represented examples e.g. king:man :: woman:... vs baseball : sabermetrics::basketball:.... They work best for obvious i.e. 1 step analogies on single words which in general, is not useful. Those analogies are too simple and limted to serve as a demonstration of analogies as mappings across (sub)spaces.

(Aside: All vector space models based on context cannot learn to distinguish antonyms from synonyms.)

###Digression 2: When Big Data is worse than Small Data

Big data is not always best—for example, when I wish to narrowly focus on two topics, I do not care for vectors that have been trained on the entirety of Wikipedia. They will fail to surface the connections I am most interested in. Consider the case when I want to find analogies between learning and evolution, even when I simplify the problem to cute SAT style analogies, the results I get are completely useless.

##Space

The analogies I build are based on mapping from one space to another space. The source of the vectors can be anything (including word2vec, my criticism is for the analogy demos not the quality of generated "concept vectors") but I use random indexing vectors since, as previously stated, they are pareto-optimal for my requirements (e.g. speed). For this demonstration, I proceed by extracting and building vector space models of the wikipedia page for evolution and reinforcement learning separately.  The vectors have dimensionality 250. After this, I now 'solve' for the analogy F from reinforcement learning to evolution which is represented by a 250x250 matrix.

Because the vector dimensionality is 250, I need at least 250 sample vectors in order that the equation not be underspecified. Then, if I want to look at concepts in evolution in terms of reinforcement learning or vice versa, I simply multiply the vectors by the appropriate solved for Matrix. To my surprise, it worked (why was I surprised? because I generated a matrix randomly filled with 1,0 and -1s, performed a slipshod approximation of sparse matrix multiplication against co-occurrence frequencies,  ran QR factorization on that, multiplied the solved for matrix with my random matrix and it worked out interesting analogies! As I will emphasize again and again, "Human unique" Intelligence is over-estimated). You can scroll to the bottom to see some examples or check the full results [here](http://metarecursive.com/lab/snippets/analogy0-ev-to-RL.html) and [here](http://metarecursive.com/lab/snippets/analogy0-RL-to-ev.html)—it's almost all interesting.

There are a couple impressive examples I'd like to highlight. Selection (from natural selection) is mapped to search, iteration, improvement, gradient and evaluation. Those all are extremely on point! Survival is mapped to states, actions, regret and moves. One of those is a non-trivial insight as it was a link only recently suggested (more in part 2). It also maps survival to non-episodic, this is correct and a link I'd never made before. Genes too are well mapped, with history a low scoring (appropriate that it should be a low) match.

In the other direction, from reinforcement to evolution, the map is also has some stand out samples. Policy in particular is another observation I've never considered before. Policies are what an RL agent learns in order to behave optimally at each particular state. What's the most appropriate match? The genome and the phenotype of course—alleles, survival and gene are also all linked and pretty good matches. Reinforcement is matched with breeding, existence and elimination; learning to adapted and surviving; exploration to species, individuals, population and mutation.

Function too, is a curious one and highlights what I mean by advantages of small data. F(function) is judged as very close to hypothesis, idea and observations. In reinforcement learning function is in the sense of value function or action value function, and indeed value functions are much closer to hypothesis and observations than to purpose, graph or _chart_—which is what something based on more data would have reckoned.

These are all rather impressive and together, especially when one considers cosine similarity values and vector dimensionality, virtually impossible to have all occurred by chance.

##Conclusion

So what am I to make of this? Haphazardly multiplying co-occurrence counts from a couple wiki pages with a random matrix is enough to make associations that would give many humans trouble. Why? I hypothesize two reasons working in conjunction. Text is not random, and is structured for the purpose passing along information—I wonder if a concept as Potential Intelligence inspired by potential energy is appropriate. The other, I think, is that this is another instance of the moravec paradox. If you list the things that you intuitively would think require the most computational power and intelligence, you'd probably have written that list upside down with respect to the correct order. If you want to understand intelligence—in animals, in humans and in AI, you would do well to take a look at Moravec's Paradox. After you've done that tell me if you think the conventional presentation of the "Clever Hans Effect" makes even a tiny bit of sense.

###What's Next

In this post I explicitly computed an "analogy" as a map between two spaces. But although—to my surprise—the method worked, I doubt that such a map is ever explicitly made in the brain. In Project Int.Aug, this method is not used, it's too rigid, it works only on two concepts when, usually I'm considering multiple concepts at a time. Secondly, although the map works alright, especially for well represented words, it outputs nonsense in many instances. In a future post I'll show the practical method in use, including various ways of constructing subspaces and also composition, that go beyond single words and two concepts at a time while also possessing increased accuracy.

In my next post I talk about how AI and intelligence amplification differ in their treatment of A,F and B. I'll go over my choice of these two topics (and why not others? Well, one aspect is that a recurring theme in my writing is evolution is as learning and so I'm usually trying to amplify my understanding of related topics by e.g., mapping between reinforcement learning <=> regret <=> evolution).

Meta: I hope that if before, you had not a clear idea of reinforcement learning but had knowledge of evolution (which is more widely known), the analogies (not mine!) have put some of the concepts in context.

##Examples

In the below, the words in all caps are the source (evolution) and the table following are the target (RL) nearest neighbors—remember, the word vectors have been learned completely separately and are being linked using the matrix F.

SELECTION

Name           | Sim
-------------- | ---------
search         | 0.9078385
iteration      | 0.5983979
improvement    | 0.5010358
evaluation     | 0.434157

EVOLUTION

Name        | Sim
----------- | ---------
computing   | 0.8000988
maintaining | 0.4766024
finding     | 0.4171107

SURVIVAL

Name         | Sim
------------ | ---------
states       | 0.7352338
actions      | 0.4152263
non-episodic | 0.4131915
samples      | 0.4096077
*regret       | 0.4069343*
moves        | 0.3897748

KNOWN

Name         | Sim
------------ | ---------
improve      | 0.8170525
demonstrate  | 0.6549885
change       | 0.6319368
observe      | 0.6311502
compute      | 0.6282258
unify        | 0.619693

GENES

Name        | Sim
----------- | ---------
evaluation  | 0.8402829
improvement | 0.6233622
space       | 0.4338689
iteration   | 0.4281315
search      | 0.407939
history     | 0.3024744

ENVIRONMENT

Name                | Sim
------------------- | ---------
computation         | 0.8335705
domain              | 0.6284672
expectation         | 0.6255608
class               | 0.6169248
set                 | 0.6092064
description         | 0.6053985
--------

The examples in the other direction, from RL to evolution:

_POLICY_

Name           | Sim
-------------- | ---------
principle      | 0.8019225
environment    | 0.5997837
phenotype      | 0.5960799
Survival       | 0.5955663
genome         | 0.5697417
gene           | 0.5480887
alleles        | 0.5475575

_METHODS_

Name         | Sim
------------ | ---------
offspring    | 0.8382673
genes        | 0.4183966
mutations    | 0.4166025
produce      | 0.3901573
individuals  | 0.3897389

BASED

Name            | Sim
--------------- | ---------
According       | 0.880974
Due             | 0.7900774
due             | 0.7368965
appears         | 0.7197356

REINFORCEMENT

Name          | Sim
------------- | ---------
breeding      | 0.8270694
policies      | 0.5013691
existence     | 0.4515101
elimination   | 0.4443123
mates         | 0.4295801

LEARNING

Name            | Sim
--------------- | ---------
applied         | 0.480039
surviving       | 0.4398851

EXPLORATION

Name          | Sim
------------- | ---------
species       | 0.4411014
vestigial     | 0.4271055
sexually      | 0.4088307
region        | 0.3941363
genes         | 0.3930252
individuals   | 0.3822998
population    | 0.375715
mutations     | 0.3743682﻿
5
3

### Deen Abiola

Shared publicly  -

It is very easy to come up with a feature, test it on a handful of cases or a toy data-set and then declare success. What is difficult and immensely frustrating is throwing away features which ultimately end up being more effort than they're worth. For example, take the case of a summarization; if I find myself spending more time trying to decipher its meaning than it would have taken to read the actual text itself, then the feature has failed. The summary should give a good enough idea of the text, with a high probability, in a manner that doesn't lead to frustration. This is a high threshold that has led to my throwing away the vast majority of ideas.

##The Tiers of a Feature

I group features into five tiers: Mud, Plastic, Pyrite, Gold and Palladium.

*Mud features* are noise, they're the most abundant class of feature I think up; they make everything worse with their mere existence. Testing ideas can sometimes be depressing because most of them will turn out to be mud. They're too numerous to enumerate. On the positive side, I seem to have gained the ability to automatically detect and cut short such ideas.

*Pyrite* These are novelty ideas that look promising, showing enormous potential, only to fall flat during practical application. They aren't failures per se, as I for example, count not fast enough, or works but is ultimately a gimmick, amongst these. I'll say the other majority of ideas fall in this category. One example is the phrase based summaries I showcased in the previous article, I will talk about how it fails later in this one.

*Plastic features* are borderline useful but not memorable or worth it. They're relatively reliable but likely, you would not care much if they were gone. Another common reason a feature doesn't make the cut is if its runtime is too slow or its algorithmic complexity is too high to work in real-time on an average machine. Many webpages can have tens of thousands of words (as grounding, that's about 80 textbook pages), and there will be instances where you end up eating 3 or 4 such webpages in parallel while browsing normally, and you want results in not more than a second.

Another scenario might be analyzing dozens or more of pages at a time but still not going over a few seconds without results. Meeting those constraints has occupied much of my time, and has been the cause of my throwing away a lot of ideas—getting something that's both fast and actually reliable is difficult. Later in this article, I'll talk about escalation and my approach with UX to get around this where possible.

A different requirement on some algorithms is that they be able to  learn, in realtime, using minimal memory. These last two eliminate both cutting edge ideas such as recurrent neural networks, which I spent a couple weeks experimenting on, and old ideas disguised as new—such as word2vec. Even Conditional Random Fields and my Hidden Markov Model implementation proved too slow for speedy use. A corollary to this is that turn around time on ideas is much slower with say Deep Neural Networks.

This limits the rate of experimentation and since most ideas are mud, and there is a great deal more to do than figure out an appropriate architecture, and the tech is not yet sufficiently better to be worth the cost, I decided to drop that branch of the tree (after 14 hours nursing an RNN—the scenario is not unlike obsessing over a graphics engine when you're trying to build a game). Hopefully, Moore's Law will address this in time but for now, they're useless for the sort of tasks one will encounter in an intelligence amplifier setting.

*Gold features* are extremely useful and reliable but perhaps only truly shine in a few contexts. An example would be the graphs of my previous post. It's not often I use such features but when I do, they're very useful for quickly getting some idea of a long and complex piece—papers are one example.

*Platinum/Palladium features* are rare and pivotal, they're what make the software something you'd want to incorporate into your daily routine. Some you use everywhere, others are used in only a handful of (but still important) scenarios.

##Escalation

There are two senses in which I use escalation, both of them inspired by games: a) the software must be useful at all skill levels and most often by b) not overloading the user (or symbiote) with options. As the post has grown too long, I've decided to split the discussion here. A future post will discuss a).

Typically, today, your only choices when given a text are to either read it now, never read it or save it to never read it later. What Project Int.Aug does is introduce layers below and above (or to the side of that). You can get a few words and topics, look into people, places, locations, sections of emphasis and concepts. You can read a summary at different levels of detail, you can read the text or you can explore a network representation of the text. The last one, the network, I am not certain that exploring it in full detail is actually any faster than reading the text but I've (and hopefully you too will) have found it a more enjoyable way to approach texts.

##What Games are Best at

The best games are really good at escalating difficulty, gradually introducing complexity and well utilizing a contextual interface that responds intelligently to your situation. Most software is not like that. In the next article I'll talk about how I try to emulate that, but here I'll focus on my attempt to escalate and hide away things yet keeping them highly accessible.

While using different applications or browsing, you can invoke a ~500x300 transparent window (for single screen folk, annoyingness is still not completely worked out, perhaps shift in favor of pop up). The window is purposely kept small as its meant to be taken in at a glance (there's the option to move analysis to a full window). Then, the easiest to parse features should be most quickly computed and displayed. This includes key word extraction, top nouns, top verbs—but how is this useful? Consider the choice of visiting a link today. It's a very wasteful task that involves invoking a new browser tab or window instance, skimming or looking at the title and then deciding that this was a waste of the last 20 seconds of your life. Trying to predict the content of a link is too inaccurate. However, being able to quickly peek at a handful of words from the text is an excellent compromise. Incidentally—later on, I realized that it's much harder to skim when you're blind, so the ability to extract key sections or query a document—my approximation of non-linear reading—is useful there too.

That last is an example of the guiding principle of this software. If an application is going to have any chance of being part of a larger system of amplified intelligence, then it needs to minimize friction. Minimizing friction is required before the illusion of an extended self can even be considered. There are, I believe, two parts to friction. Latency: things need to happen at speeds below the conscious threshold, or if not possible, meaningful feedback needs to occur at similar speeds. The other important aspect is prediction, but more specifically, preconscious prediction. When interacting with any system in the world, we're constantly making predictions on how it will respond to our actions; in the case of software, features which are difficult to quickly learn to predict at a preconscious level induce too much friction (our brains do not like this). This is not the same as saying the software must be dumbed down, only that it be easy to use, easy to learn and easy to grow with (essentially, be useful at all levels of skill—hard things will have some threshold you can't go below but let there be useful easy things too). Having to constantly guess what the software will do, and only being able to do so with an accuracy of < 100% is an absolute failure.

Less (but still) important than friction is that the cost of utilizing the feature be lower than the gained value, and that it is unambiguously better than what it is replacing. Consider a method that constantly offers irrelevant keywords, misclassifies people as locations at too high a rate or a word similarity function that induces more cognitive noise than clarity (even if it works perfectly). Finding out what is helpful in day to day use has not been easy. Consider that speed and accuracy are at odds with each other (always defer in favor of speed, a few percentage points gain is just not worth it if you're going to lose even more than a hundred milliseconds per instance, because scale).

##Measures are Useless

In machine learning papers unsupervised methods are typically scored under some measure. In reality, I've found such results as useless for actually gauging the real life utility of a method. The only real way to see how well a method works is to incorporate it into my daily activity and note if it relieves or adds to cognitive overhead.

##The Interface

In building Project Int.Aug I have roughly 5 key goals:

* Augment ability to recall sites, papers, etc. that I have read, visited etc. I should not have to remember the exact wording. This solves the problem of too many tabs and bookmarks.
* Augment association by displaying contextually useful definitions when called upon; can be clippings, parts of a paper etc. to my current document, site or copied selection.
* Augment ability to research and search. Show useful associations between topics of what I'm researching and reduce my ramp up time. Consider a "more like this" feature, across personal documents and search in general. Allow querying of multiple pages and useful search agents to map out a few branches out of a search tree.  For example, someone claims a new mathematical result—is it really? Document and concept vectors across a broad swathe of papers. This system should be able to, with you, interactively refine possible prior work, even if you do not have it on your machine.
* Reduce the amount of reading I have to do (unless I'm reading for entertainment or edification, reading is a waste of time because I'm only going to remember a few words anyways so), get me those words that would be the only things I would have remembered had I read this piece anyways.
* Make use of the data and trails (as well as ability to share these) we all generate while going through our day to day activity, in a way beneficial to us (corporations are already very good at this though mainly for their personal benefit).

I'll focus here on reducing required reading. Sometimes I forget that I'm trying to build an IA and not an AI. This means that spending too much time trying to get some piece perfect is counter-productive in the face of all what needs to be done; finding a high enough rate of signal over noise and UX to filter through these is more important. Our brains should not be passive in this relationship, they have an incredible and so far unique ability to just cut through so much of a search space: this is our form of creativity. On the other hand, we're not very good at considering alternatives that we deem counter-intuitive (I believe this to be a victim of our tendency towards confirmation bias), however computers can be good at this and that is their form of creativity. Combining those two with a good interface creates something formidable indeed.

An example of poor results is the association based summaries; they can be very hit or miss:

*Sample 1*

> Drugs that inhibit this molecule are currently routinely used to protect: attack the parasites that cause them using small molecule drugs/is used/run experiments using laboratory robotics; attack the parasites that cause them using small molecule drugs: make it more economical/to find a new antimalarial that targets DHFR/To improve this process; the robot can help identify promising new drug candidates: demonstrating a new approach/independently discover new scientific knowledge/increases the probability; an anti-cancer drug inhibits a key molecule known: say researchers writing/to automate/can be generated much faster; an artificially-intelligent 'robot scientist' could make drug discovery faster: select compounds that have a high probability/does not have the ability to synthesise such compounds/has the potential to improve the lives;

*Sample 2*:

> The more such Internet users deploy “ do not track ” software: to make their users more valuable/assimilate more learning material/creating more flexible scheduling options and opportunities; is..far fetched Robotic caregiving makes far more sense: to use an adjective that makes sense only/change our sense/would the robotic seal appear a far less comparatively; is..less refined any more humane social order could arise: changing social norms/enables “ social networking ”/making some striking comparisons; one more incremental step: amasses around one person ’s account/want high ones/needs anchors; Data scientists create these new human kinds even: to create it/to create certain kinds/perfecting a new science; is..virtuous or vicious

Both of these are distillations of a much longer (with the second being that of a very long and complex) piece. Trying to make sense of these is difficult and ultimately makes this a feature I consider pyrite (the difficulty lies in the fact that the type of similarity it surfaces is not appropriate for this task). However, utility is task dependent; while it is not sufficiently useful for a single article, I have a hypothesis that it will work better, as a kind of broad overview, when searching many multiple pages at a time. The same failing as a single piece summarizer is true for the single word version of the "association" based summaries:

>ai develop/comprehend/detect is...specific, good, former
Similar: ai, researcher, arm, decline, hundred
>
Similar: facebook, memory-based, weston, arm, boss
>
>memory use/see/discern is...central, neural, biological
Similar: memory, use, reason, understanding, over
>
Similar: google, ai, university, baidu, try
>
>computer give/develop/detect is...implicit, top, brainy
Similar: computer, world, give, pattern, journal

It is easy to get stuck in a track trying to fix this rather than remaining focused on the bigger picture. For example, one option would be to build an n-order markov chain specific to the text and then a general language model of sentences to try to generate the shortest, most likely sentence expansion of these phrases. But why? A big part of this project has been learning how to do the least amount of work to get something good enough for what I want else scrap it—due to how much needs doing. Sometimes the simplest thing is complex but often times, especially if you've made things modular and composable ahead of time, the method might have a surprisingly simple implementation (some might point out that composability hides complexity; which is exactly the point).

On the other hand, there are features which work really rather well: topic and keyword extraction, extracted summaries, concept and directional vectors. The named entity recognition aspect is more a plastic feature, it's okay but will more than serve as the basis of a question answering system (for example it sometimes labels books, papers, websites or genetic loci as locations which actually makes a lot of sense). You can see for yourself the output of the analysis of 7 randomly selected websites of varying complexity. The summaries in particular are surprisingly good; most extractive summaries work best for simple news pieces but completely fall apart with interviews, forums, papers or long narrative reads: this method degrades gracefully from simple news articles to interviews and thread posts. You can see some examples [in this link](http://sir-deenicus.github.io/home/nlp.log0.html), under the Full Summary sections. There are two methods to generate summaries, one using phrases and another sentences. Sometimes the phrases are better (in particular for short or news pieces) but the fuller sentence based summaries are more consistently better:

*Example of a better phrase based summary*
>Artificially intelligent robot scientist 'Eve' could boost search. Drugs that inhibit this molecule are currently routinely used to protect. Eve is designed to automate early-stage drug design. a compound shown to have anti-cancer properties might also be used. an anti-cancer drug inhibits a key molecule known. an artificially-intelligent 'robot scientist' could make drug discovery faster. attack the parasites that cause them using small molecule drugs. new drugs is becoming increasingly more urgent. the robot can help identify promising new drug candidates

*Example of topics*:
> brain-based physiology of creativity, the human cerebellum, monkey cerebellum
>
> global poverty, AI risk, computer science, effective altruists, effective altruism, billion people, Repugnant Conclusion : the idea
>
> artificial intelligence, last year, few months, common sense, memory-based AI, Facebook AI researcher, Facebook AI boss, crusade for the thinking machine
>
> drug discovery, mass screening, machine learning, Robot scientists, robot scientist, fight against malaria
>
> feedback and control mechanisms of Big Data, Blog Theory : Feedback and Capture in the, sociotechnical system : Particular political economies, effect of “ bombshell ” surveillance

*Examples from Directional vectors*.

These vectors capture some directionality (which provides some refinements in capturing context), as such you can recover common antecedent or succedent words.

>Similar to drug: drug
>
>Top 3 preceedings for drug: Concepts: exist, choose, early-stage | Index: compound, positives., early-stage
>
>Top 3 post/next words for drug: Concepts: target., design., discovery | Index: discovery, candidate
>
> ==================
>
>Similar to scientist: scientist
>
>Top 3 preceedings for scientist: Concepts: robot | Index: robot, clinical
>
>Top 3 post/next words for scientist: Concepts: 'eve' | Index: be, them
>
> ==================
>
>Similar to self: self, tool
>
>Top 3 preceedings for self: Concepts: construct, ‘data, algorithmic | Index: algorithmic, network, premack
>
> Top 3 post/next words for self: Concepts: balkinization, commit, comprehensively | Index: setting
>
> ==================
>
> Similar to risk: risk, researcher, obsession.
>
> Top 3 preceedings for risk: Concepts: existential, ai, recoil | Index: ai, existential, human
>
> Top 3 post/next words for risk: Concepts: panel, estimate, charity | Index: of
>
> ==================
>
>Similar to altruist: altruist, intervention., altruism
>
>Top 3 preceedings for altruist: Concepts: effective, lethality. | Index: effective, maximum
>
>Top 3 post/next words for altruist: Concepts: groups., explain, don | Index: potential, though
>
> ==================
>
> Similar to people: people
>
> Top 3 preceedings for people: Concepts: serious, marginalize, part | Index:
>
> Top 3 post/next words for people: Concepts: seek | Index: who, in, seek

Sometimes the result is less than ideal but this is where UX can help. For example consider a sentence starting with "They", your first question will no doubt be: who? One way to fix this is to allow one to hover over a text and get an inline display showing the context of the sentence. However, hovering only works when popups are sparse, otherwise the interaction becomes very annoying with things popping up with every mouse move. Instead, I've resorted to selecting text triggering a context search. Another is, sometimes stories are too short and can be improved with length—you don't want too many options, however—so there are two modes, a set of parameters that give good results for long and short for a broad set of articles (the examples are all "short").

The interface consists of three tabs: one for topics, one for gists/summaries and a final one for entities. The gists are further separated into phrase and sentences (though if over the next few days I find phrase induces too much cognitive overhead I'll drop it), entities to people, locations, orgs, etc. (a literal etc.), you can easily use keyboard navigation or have the summaries read to you at high speed. You can select text for more context.

There is lots that needs to be done per text (generate document specific vectors, tokenize, tag parts of speech, generate chunks, extract enitites, extract key words, generate summaries) each of these occur in milliseconds for the average document but can rise up to 1-3 seconds for really long texts (a thread with 1000+ replies) but, updating asynchronously, with the most important (keywords) displayed first works around this speed issue. The important bit is because we are not building AI, we have more room for error so long as signal overwhelms noise and we have good friction removing tools to work around them. In this way you can choose to go into as little or as much depth as you want—escalation—and unlike the case with skimming, the probability of hitting the important bits is significantly better than random.

Sometimes all I can see are the failings and shortfalls, then I feel down because things seem so far from the imagined ideal. But then I ask myself, if two people were trying to learn something new, one with Project Int.Aug and the other with browsers and Google, then without a doubt, I know with certainty that the person using tools like Project Int.Aug is exceedingly better equipped. I might spin in circles, continuously replacing internal algorithms for something better, forever chasing after perfection but if the goal is to move forward to motor and even hoverbikes of the mind, I've got to release something, get outside input.

But right now, In a world of walkers, Project Int.Aug is an electric bicycle for the mind*.

---

*If you're working on something like this too, please let me know!

Here you can look at the performance of the methods across a [random sample of 7 websites](http://sir-deenicus.github.io/home/nlp.log0.html)

The network for the [sample image](http://sir-deenicus.github.io/home/crow_sents.html):

![alt text](images/birdbrain.png)﻿
8
1

Yeah, I appreciate the complexity and difficulty, and I'm very vaguely aware of the large network of existing knowledge sitting in my head that I can tap on whim to make associations and comparisons with some new advance and providing me with a convenient filter that I never have to think about of course that helps draw out the important bits.

But still, it'd be great to have an automated system that "put me out of a job" for this task so to speak :)﻿

### Deen Abiola

Shared publicly  -

#Summarization via Visualization and Graphs.

Also, there's a typo (okay, at least one) in the Iran Agreement (well, in the version posted on medium and as of this writing...).

_G+ note, inferior duplication of medium version posted here: https://medium.com/@sir.deenicus/summarization-via-visualization-and-graphs-4b33454db3d6)_

Ah, this is not part of the order of posting I planned, but...it's not everyday you get to analyze (and find a trivial mistake) in a government document. Since May, I've been writing a really fast, thread safe, fully parallel NLP library because everything else I've tried is either too bloated, too slow to run or train, not thread-safe, too academic, too license encumbered or utilizes too much memory.

More pertinently, I've also been on a life-long quest to figure out some way to effectively summarize documents. Unfortunately, technology is as yet, too far away for my dream intelligent abstract summarizer—every single one of my apparently clever ideas have been unmasked as impostors and pretenders, always self-annihilating in exasperating puffs of failure. Sigh.

However, I have been able to combine ideas that work efficiently on today's machines to arrive at a compromise (plenty more on that in the future). One key idea has been representing text at a layer above just strings, think Google's word2vec but requiring orders of magnitude less computation and data for good results (to be more specific, I use reflective random indexing and directional vectors—which go just a bit beyond bag of words).

Once vectors have been generated (it took my machine 500 ms to do this) and sentences have been tagged with parts of speech, interesting possibilities open up. For example, the magnitude of a vector is an indication of how important a word is, it's similar to word count but orders words in a way that better reflects a word's importance (counts, once you remove common stopwords, are actually infuriatingly good at this already—infuriating because it can be hard to come up with something both better and less dumb). It can also work when few words are repeated, so it's more flexible. Applying this to the Iran document I get as the top 10 most important nouns:

> "iran, iaea, fuel, year, centrifuge, reactor, uranium, enrichment, research, joint"

And for verbs:

> "include, test, verify, modernise, permit, fabricate, redesign, monitor, intend, store"

This is useful and, being able to select a link, press a hot key and get a small window displaying a similar result for any page will, I think, be a useful capability to have in one's daily information processing toolkit. However, such a summary is limited. One idea is to take the top nouns, find their nearest neighbors but limit them to verbs and adjectives. Here's what I get:

> "iran: include/produce/keep is...future, subsequent, consistent
>
>year: keep/conduct/initiate is...more, future, consistent
>
> iaea: monitor/verify/permit is...necessary, regular, daily
>
> fuel: fabricate/intend/meet is...non-destructive, ready, international
>
> uranium: seek/enter/intend is...future, natural, initial
>
> reactor: modernise/redesign/support is...iranian, international, light
>
> centrifuge: occur/remain/continue is...single, small, same
>
> production: include/need/produce is...current, future, consistent
>
>use: include/produce/meeting is...subsequent, initial, destructive
>
>arak: modernise/redesign/support is...light, iranian, international
>
>research: modernise/redesign/support is...international, appropriate, light
>
>jcpoa: declare/implement/verify is...necessary, consistent, continuous

Reading this, I see the results are almost interpretable. There's the IAEA who will monitor Iran and JCPOA too, or something...I'm guessing. There's lots of emphasis on Iran's future and modernization, as well as limitations on uranium production and instruments—centrifuges in particular—in use (at this point, I'd like to point out that I've absolutely not even looked at the original document and don't ever plan to). I don't know if this method will ultimately prove useful; a lot of work involves experimenting with what actually works in day to day use. Some features are simply not worth the cognitive overhead of even just knowing they exist.

It was at this point I decided to graph the result. The basic idea is: connect all the words with the edge weights computed from pairwise cosine similarities but limit connections to be of the type VERB=>NOUN=>VERB, then apply a maximum spanning tree to prune the edges and make it actually readable. The idea being, instead of just grouping words by similarity we impose some grammatical structure then hopefully, we get something a bit more structured.

It was while browsing that graph I found the typo:

![alt text](images/chennals_iran.png)

I'm fairly certain that "Chennals" is not some fancy Nuclear Engineering jargon.

![alt text](images/whitehouse-typo.png)

##Network Examples

I also built a graph using an algorithm utilizing inputs from a phrase chunker, which then tries to build short understandable phrases (verb dominant phrases can only link to noun phrases), another on sentences and another from paragraphs. The gray shaded and golden edge nodes tend to be most important and are worth zooming into. Around those will be all the most similar phrases/sentences/paragraphs.

##Click for: [Single Words Example](http://sir-deenicus.github.io/home/test_single_word_vbnouns.html)

![alt text](images/QKy5v9bhf5.gif)

Although this graph visualization was originally meant to compare and contrast (via orthogonal vectors) two or more documents, it works well enough as a summarization tool. In case you're curious, the graph visualization toolkit I'm using is the excellent vis.js (I welcome any suggestions that'll improve on the sometimes cluttered layout).

##Click for: [Phrases Example Network](http://sir-deenicus.github.io/home/test_phrases.html)

The Phrases example is clearly more comprehensible than the single word approach but is not without flaws—there are incomplete thoughts and redundancies. On the other hand, we see that similar phrases are grouped together. It's worth noting that each phrase is represented by a single (200D) vector, hence the groupings are not based on string similarities. And, despite the algorithm not lowcasing all words, the method still groups different cased words together, suggesting that it captures something more than: these words tend to be near each other. It also groups conjugations and phrases in a non-trivial sense, as seen with higher level groupings like:

* produce fuel assemblies/fuel core reloads/fuel will be exhausted/spent fuel
* can be used/future use

Those are not just cherry picked samples, as you can see for yourself in the link above. The method holds generally in all documents I've tried. Additionally, it's worth remembering that nodes aren't just grouped by similarity but also must meet the very basic noun phrase-ish =>verb phrase-ish structure I mentioned. The goal is to get something sufficiently comprehensible while being non-linear and more exploratory. By zooming in and out and hiding irrelevant nodes, I can go into more or less depth as I please. This, together with basic question answering on arbitrary text form my very basic approximation of non-linear reading/knowledge acquisition. You can think of skimming as a far distant ancestor of this approach.

##[Sentences](http://sir-deenicus.github.io/home/test_sents.html)

[Paragraphs Example](http://sir-deenicus.github.io/home/test_paras.html)

Zooming out is, I've found, important when dealing with longer text items (removes clutter). Then, you can click a node, which disapears anything not in its neighborhood, making it easier to read when zoomed in. Other useful features are: the ability to search for a word as well as the ability to hover over nodes to get at their text.

![alt text](images/summary_0.png)

##Text Summaries

Similar to connecting verbs and nouns, I tried connecting augmented noun phrases (very, very simple rule on how to join phrases to maximize coherence and the same for) verb phrases. With that, for the top 5 phrases, I got:

>"2. Iran will modernise the Arak heavy water research reactor to support peaceful nuclear research and radioisotopes production:
to be a multi-purpose research reactor comprising radio-isotope production/to support its peaceful nuclear research and production needs and purposes/to monitor Iran ’s production
>
>Iran ’s uranium isotope separation-related research and development or production activities will be exclusively based:
to any other future uranium conversion facility which Iran might decide to build/to verify the production/to minimise the production
>
>Iran ’s enrichment and enrichment R&D activities are:
to meet the enrichment and enrichment R&D requirements/conducting R&D/to enable future R&D activities
>
>Iran will maintain no more than 1044 IR-1 centrifuge machines:
will use no more than 348 IR-1 centrifuges/are only used to replace failed or damaged centrifuges/balancing these IR-1 centrifuges
>
>Iran will permit the IAEA to implement continuous monitoring:
will permit the IAEA to implement continuous monitoring/will permit the IAEA to verify the inventory/will allow the IAEA to monitor the quantities

This, I think, is actually a pretty decent summary. It's far from perfect but I've got a much better idea of what's in the document despite it being fairly short. It's also not a verbatim extractive summarizer (since it's constructing and combining phrases which incidentally, also ends up compressing sentences. Although...if a proper generalizing summarizer was a human, this would be like the last common ancestor of humans and mice. Or maybe lice. sigh).

Closer to more typical extractive methods is a very simple method I came up with that generates vectors for sentences using RRI. The method takes the largest magnitude sentence and then finds the nearest sentence that get's within x% of its magnitude (I have x=50%). A sum of all met vectors is kept and a sentence must have > 0.7 similarity with this memory vector. This is repeated for all sentence. I've found that this method tends to create far more fluid summaries than is typical for extractive summarizers while working on almost all document types (even doing a fair job on complex papers and Forum threads). For this Agreement, we get the below at 10% the original document length:

----------
##More Fluid Extracted Summary:

"Destructive and non-destructive testing of this fuel including Post-Irradiation-Examination (PIE) will take place in one of the participating countries outside of Iran and that country will work with Iran to license the subsequent fuel fabricated in Iran for the use in the redesigned reactor under IAEA monitoring.

Iran will not produce or test natural uranium pellets, fuel pins or fuel assemblies, which are specifically designed for the support of the originally designed Arak reactor, designated by the IAEA as IR-40. Iran will store under IAEA continuous monitoring all existing natural uranium pellets and IR-40 fuel assemblies until the modernised Arak reactor becomes operational, at which point these natural uranium pellets and IR-40 fuel assemblies will be converted to UNH, or exchanged with an equivalent quantity of natural uranium.

Iran will continue testing of the IR-6 on single centrifuge machines and its intermediate cascades and will commence testing of up to 30 centrifuge machines from one and a half years before the end of year 10. Iran will proceed from single centrifuge machines and small cascades to intermediate cascades in a logical sequence.

Iran will commence, upon start of implementation of the JCPOA, testing of the IR- 8 on single centrifuge machines and its intermediate cascades and will commence the testing of up to 30 centrifuges machines from one and a half years before the end of year 10. Iran will proceed from single centrifuges to small cascades to intermediate cascades in a logical sequence.

In case of future supply of 19.75% enriched uranium oxide (U3O8) for TRR fuel plates fabrication, all scrap oxide and other forms not in plates that cannot be fabricated into TRR fuel plates, containing uranium enriched to between 5% and 20%, will be transferred, based on a commercial transaction, outside of Iran or diluted to an enrichment level of 3.67% or less within 6 months of its production.

Enriched uranium in fabricated fuel assemblies from other sources outside of Iran for use in Iran’s nuclear research and power reactors, including those which will be fabricated outside of Iran for the initial fuel load of the modernised Arak research reactor, which are certified by the fuel supplier and the appropriate Iranian authority to meet international standards, will not count against the 300 kg UF6 stockpile limit.

This Technical Working Group will also, within one year, work to develop objective technical criteria for assessing whether fabricated fuel and its intermediate products can be readily converted to UF6. Enriched uranium in fabricated fuel assemblies and its intermediate products manufactured in Iran and certified to meet international standards, including those for the modernised Arak research reactor, will not count against the 300 kg UF6 stockpile limit provided the Technical Working Group of the Joint Commission approves that such fuel assemblies and their intermediate products cannot be readily reconverted into UF6. This could for instance be achieved through impurities (e.g.  burnable poisons or otherwise) contained in fuels or through the fuel being in a chemical form such that direct conversion back to UF6 would be technically difficult without dissolution and purification.

Iran will permit the IAEA to monitor, through agreed measures that will include containment and surveillance measures, for 25 years, that all uranium ore concentrate produced in Iran or obtained from any other source, is transferred to the uranium conversion facility (UCF) in Esfahan or to any other future uranium conversion facility which Iran might decide to build in Iran within this period.

If the absence of undeclared nuclear materials and activities or activities inconsistent with the JCPOA cannot be verified after the implementation of the alternative arrangements agreed by Iran and the IAEA, or if the two sides are unable to reach satisfactory arrangements to verify the absence of undeclared nuclear materials and activities or activities inconsistent with the JCPOA at the specified locations within 14 days of the IAEA’s original request for access, Iran, in consultation with the members of the Joint Commission, would resolve the IAEA’s concerns through necessary means agreed between Iran and the IAEA. " ﻿
2 photos
16
3

Thanks  it means a lot to hear that. I've been working on and off for the past year but it's only the past 3 months that things have been stable enough for me to really be able to focus on it.﻿

### Deen Abiola

Shared publicly  -

This is a really pretty visualization of how Decision trees work. It's less about machine learning proper, which is actually a strength since it can be that much more concrete.

My only super tiny quibble is with the overview 1). I'll say instead that drawing boundaries applies to discriminative learners only and not to more probabilistic methods (also, not all learners are statistical but apparently there's a duality between sampling and search which messies neat divisions).

I'd also further characterize over-fitting as memorizing the data. Where, model complexity/number of parameters is unjustified given data/outmatches available data. It stems from a lack of smoothing, which is when you don't filter out noise but instead just explain every little detail using some really impressive bat deduction [1]. Humans do this when concocting conspiracies or reasoning based on stereotypes, initial impressions and anecdotes.

[1] http://tvtropes.org/pmwiki/pmwiki.php/Main/BatDeduction

A Visual Introduction to Machine Learning

If you haven't seen this yet, it's pretty awesome!﻿
What is machine learning? See how it works with our animated data visualization.
8
4

### Deen Abiola

Shared publicly  -

Just took a few days off after the DARPA Robotics Challenge. In case anyone is interested, here are the sideline reports that I sent back to CSAIL each evening, along with some video summaries.
Day 1:
MIT had a great (though not perfect) run yesterday, and I couldn't be prouder.
Long story short, we made an human operator error when transitioning the robot from the driving mode to the "egress" mode, and forgot to turn the driving controller off. This conspired through a series of events into a tragic faceplant out of the car into the asphalt. Our right arm was broken as were a few of our key sensors (an arm encoder). We called a reset -- taking a 10 min penalty -- got the robot back up and ready to go... But our right arm was hanging completely limp. That was unfortunate because we were planning on doing all of the tasks right-handed.
In an incredible display of poise and cleverness from the team, and an impressive showing from the algorithms, we were able to adapt and perform almost all of the tasks left handed. The only point we had to skip was the drill (we need both hands to turn the drill on). Even the walking on terrain and stairs looked fantastic despite having at 10kg flopping passively at the end of one arm.
After the officials review of the video, we were awarded the egress point and are in 4th place (the best of the non-wheeled robots). The robot is fixed and we know that we are capable of beating the top scores from yesterday in our run today. It's scheduled for 1:30pm pacific. Wish us luck!
- Russ
Day 2:
Day 2 was a roller coaster. Boston Dynamics was able to repair the robot damage from day one in the evening of Day 1 -- they are amazing. But when we got in to test the robot very early on Day 2, the robot powered down after just a minute or two of operation. It turned out that a small problem with the coolant lines overheated the PDB and main pump motor. The next 8 hours was chalked full of high stress robot debugging by boston dynamics and MIT (the heat caused collateral damage to the cpu bios and harddisks). Even at the start line we had a complete wrist failure and last minute actuator hot swap. I can only speak for myself, but i was physically and emotionally exhausted.
We finally started our run 30 min late. It started fantastically well. We actually passed the other top teams that were running on the parallel courses but had started up to 30 min earlier. We drove, egressed, walked through the door, turned the valve, picked up the drill, turned it on. And then... We pushed a little too hard into the wall. The wrist temperature was rising -- if we tripped the temperature fault then the wrist would have shut off completely (not good when you're holding a drill in a wall). We had to back off before the cut. Then we started cutting but the bit slipped out of the wall during the cut. The operators saw it and tried to go back to fix, but the drill has a 5 min automatic shutoff. Once off, it's extremely hard to turn back on. Our very real opportunity to win the entire competition slipped away from us in an instant.
We knew we had to get all of the points (and quickly) to win, so we tried the only thing we could. We told the robot to punch the wall. The drywall didn't fall. After a few tries something happened -- it looked like a lightning bolt hit the robot, some sort of fault caused the robot to fall. Our recovery and bracing planner kicked in automatically and the robot fell gently to the ground. But we had to pull it off the course to stand it up and start again.
With the win now out of reach, we decided to finish strong by doing the rough terrain and stairs (two of our favorites). They were beautiful to watch.
Our team had far more perception and planning autonomy than any of the other teams i was able to observe (most used teleop; ultimately the tasks were too easy). Our tools and our team were definitely capable of winning. There was just too much luck involved, and it wasn't our day.
We're incredibly disappointed, but I couldn't be prouder of our team and the tools. The amount if adversity that they overcame even this week is incredible. They did it with brains and class.
- Russ
https://youtu.be/2eBVsByQs4E… (tells the story)
https://youtu.be/GA-M1pMtANs… (shows the robot and our interface in action)﻿
4

Interesting view into the challenges and work that go on behind the scenes of these demanding competitions. Congrats to the teams and their impressive results - thanks for sharing!﻿
Deen's Collections
People
In his circles
367 people
Have him in circles
5,028 people
Work
Skills
Information Synthesist
Story
Tagline
For me, building software is like sculpting. I know what is there but I just need to get rid of all the annoying rock that is in the way
Introduction
I like trying to write

I post now, mostly as a duplicated devlog on a project of mine whose goal is an intelligence amplification tool as inspired by the visions of Engelbert, Vannevar Bush and Licklider. I am, in order of skill, interested in:
1. Functional Programming
2. Machine Learning,
3. Artificial Intelligence
4. Mathematics
5. Computation Theory
6. Complexity Theory
7. bioinformatics
8. Physics
9. neurobiology
I also super interested in sustainable Energy, synthetic biology and the use of technology to improve human living.

I believe the proper way to understand quantum mechanics is in terms of a Bayesian probability theory and that the many world interpretation is the way it applies to the universe physically. Still trying to find a philosophically synergistic combo.

I also do bballing and bboying/breaking/"breakdance".

I have some "hippie" beliefs like Dolphins are persons. All dolphins, whales great apes, elephants and pigs should not be eaten, murdered or kept in captivity. I would really like to see the results of giving dolphins an appropriate interface to internet access.

Spent some time solving bioinformatics problems on Rosalind. It's a Project Euler for bioinformatics. Try it out if you enjoy algorithms and what to get some idea of biotech http://rosalind.info/users/deen.abiola/

Favourite Books: Chronicles of Amber, Schild's Ladder, Diaspora, Permutation City, Blindsight, Ventus, Peace Wars, Marooned in Realtime, A Fire Upon Deep, Accelerando, Deathgate Cycle, MythAdventures, A Wizard of Earthsea, Tawny Man Trilogy, The Mallorean, The Riftwar Cycle  and Harry Potter

Basic Information
Gender
Male
Deen Abiola's +1's are the things they like, agree with, or want to recommend.
 DUAL TRACE [creative studio]plus.google.comWe create games and apps for all platforms. We create art and music. We express ourselves. We try to make the world a more beautiful place.
 Mutant flu paper is finally published, reveals pandemic potential of wil...feedproxy.google.comEvolution | It’s finally out. After months of will-they-won’t they and should-they-shouldn’t-they deliberations, Nature has finally publishe
 A duplicated gene shaped human brain evolution… and why the genome proje...feedproxy.google.comEvolution | The Human Genome Project was officially completed in 2003, but our version of the genome is far from truly complete. Scientists
 A review of openSNP, a platform to share genetic data « Genomes Unzippedfeedproxy.google.comI initially came across openSNP when the team won in late 2011 the PLoS/Mendeley binary battle. This competition was open to software that i
 I’ve got your missing heritability right here…wiringthebrain.blogspot.comThis blog will highlight and comment on current research and hypotheses relating to how the brain wires itself up during development, how th
 Startup lets you start your own cell phone company, in minuteswww.seattlepi.comBusiness is slow so far: Since the April launch, Farthing has signed up two subscribers, himself and his son. If I get up to 50, I'll be hap
 Dolphins and Whales Engage in Rare Interspecies Play (Video)www.treehugger.comBiologists have recorded several incidents of what appears to be wild humpback whales and bottlenose dolphins getting together for some play
 New paper on repetition priming and suppressionsciencehouse.wordpress.comA new paper by Steve Gotts, myself, and Alex Martin has officially been published in the journal Cognitive Neuroscience: Stephen J. Gotts, C