## Profile

Deen Abiola
4,711 followers|321,075 views

## Stream

### Deen Abiola

Shared publicly  -

This is a plus great article pointing out the dangers of getting most of your information on the state of AI (or anything) from press releases. Examples always show the best case scenario and rarely ever acknowledge the existence of pathological cases highlighting how far the methodologies have to go (this is something I don't do, either my examples are representative or I'll mention the limitations). The papers themselves are almost always more balanced.

The Deepmind games for example, are worth looking at in detail: you'll get a more grounded idea of what's possible (limited control/pole balancing games), what's novel (a better value prediction model, good!) and what's merely prestige fodder (cherry picked examples, misleading statements on who's first to first base).

But there is one section where I disagree with the essay.  Quoting:

> To sum up, CNN+RNN technology doesn’t understand numbers or colors. It doesn’t understand meaning of words. It’s nowhere near a real AI - probably closer to ten thousand monkeys striking keys at random in an attempt to replicate Shakespeare’s works.

That's a really unfair assessment. It's absolutely not closer to chaotic playwright monkeys. Random search would not get you to those results in a billion years. So it's closer to intelligence. Animal intelligence. But also very alien and very limited.

What's been learned is a mapping from pixels to words (vectors) to sequences of words. Ultimately, giant computations on some set of functions. But, but there is real understanding, and even though it's not at all focused on what we would view as most salient or important, it has hooked on some meaningful set of discriminatory features which allow it to reason and make good predictions on examples from the same distribution as the training set. It's also achieved a compression of the data. That compression is a measure of understanding (see Brahe vs. Kepler). Alas, the errors tell us that even this style of understanding could be greatly improved, that the discriminatory features--alien though they might be--are still far from optimal.

I think though, that what people mean when they say it doesn't understand is that it hasn't learned a generative model but also, it hasn't learned a model from which non-trivial differences from the example set can be generated (Kepler vs Newton). In other words, if it really understood then it could tell stories, answer questions and infer non-visible states. Ultimately, though these machine learning models might be able to learn, they can't reason their way out of a paper bag. They're really rather inflexible. It is in that way that they can be both intelligent and incredibly stupid.﻿
What you wanted to know about AI 2015-03-16 Recently a number of famous people, including Bill Gates, Stephen Hawking and Elon Musk, warned …
10
1

Totally agree with your rebuttal of the million monkeys claim, btw.﻿

### Deen Abiola

Shared publicly  -

So uhm Re-sharing to issue a correction/clarification by  who weighed in*, stating:

> What I believe I said (around 1970) was “Intelligence is whatever machines haven't done yet”. If and when artificial intelligence surpasses human intelligence, people might conclude, as you propose, that there is no such thing as intelligence. Or they might simply redefine intelligence as "whatever humans haven't done yet” as they try to catch up with AI.﻿

//I'm thinking though, that the humans are not going to be very happy with such a state of affairs, preferring instead, to point out that intelligence is completely overrated anyways. Was it even ever good for anything?

* So yeah that was totally unexpected. Still getting used to the idea that thanks to the internet, pioneers, once only found in books and archives, can temporarily exist as real people too.

Tesler's Theorem states that "AI is whatever hasn't been done yet." From this we can deduce that once AI reaches human parity we will have to conclude that there is no such thing as intelligence.﻿
8

### Deen Abiola

Shared publicly  -

I’m Nobody! Who are you?
Are you – Nobody – too?
Then there’s a pair of us!
Don’t tell! they’d advertise – you know!

How dreary – to be – Somebody!
How public – like a Frog –
To tell one’s name – the livelong June –

-- Emily Dickinson

20
1

Yes, it does on mine.﻿

### Deen Abiola

Shared publicly  -

#Machine Learning is not Magic: Some useful Intuitions.

One thing I find amusing is when people talk about Machine Learning as if it's some kind of magic pixie dust you sprinkle over your program thus giving it special intelligence powers. When really, Machine Learned models are, as typically used, scripts in a simple language. The previous sentence needs some unraveling: what I mean by magic and what I mean by scripting.

Magic

People often talk as if you can throw machine learning at any problem and have it magically figure things out. This is very much like the zoom-enhance trope. Actually, it is exactly like zoom-enhance since scaling up is itself a kind of inference. Just as you can't fill in details that aren't there, you can't learn something that is either unapproachably complex (incompressible) or whose dynamics aren't stationary. For example, you can't throw machine learning at market data and think it'll just work. Sure it'll learn something but that something is almost certainly a quirky coincidence of that sample. Even if it tests well out of band, it would only mean the dynamics are as yet unchanged. Another example: you can't throw data at an algorithm and have it figure out how a viral outbreak is going to progress. Similarly, complexity wise, you can't throw a genome at an ML algorithm and have it try to predict physical attributes. In this case, the data just isn't there, or more accurately, each current stage acts as data to feed the next state -- you'd have to literally compute the full organism to get it right. This is the kind of thing Wolfram calls Computational irreducability.

On the other hand, there are lots of useful problems that are stationary or close enough to (speech, image, translation) and lots that even if they aren't stationary, we should in principle be able to build algorithms that can adapt in time (edge of current feasibility). Then there are complex seeming problems that might not be as impossible as they seem. Take protein folding, what trick has evolution figured out? Protein folding is NP-complete, even a quantum computer shouldn't be able to help there. So what's going on, how can biological systems make such short work of it? Would a suitably advanced algorithm -- something beyond deep learning; able reify its abstractions, perform deductions as well as induction -- be able to figure out the hidden pattern, the hidden shortcut? I think so. But once again it's important to remember that AI isn't magic, these are the same sort of computations that happen when you query a database (search) and save a jpeg or mp3 (compression).

Scripting

The most important thing to keep in mind is that the amazing 'neural network' or what have you is running on a computer. That is, it is bounded to be no more powerful than a Turing Machine and in particular, is almost always less powerful as a computing substrate than most programming languages. In principle, that Support Vector Machine or Random Forest could have been hand-coded. There is nothing special going on there and in fact, many learning algorithms operate in essentially a propositional calculus, having no quantifiers. The models, being fixed, function exactly as scripts would.

Tarpits

Turing Completeness is an attribute (for machines at least) where if you've attained it, then nothing can 'think' things beyond you. Quoting Wiki: "In any Turing complete language, it is possible to write any computer program, so in a very rigorous sense nearly all programming languages are equally capable. Turing tarpits show that theoretical ability is not the same as usefulness in practice"

One can think of various Machine Learning algorithms in an analogical manner. For example, a Neural Network with one hidden layer is universal as an approximation of continuous functions from one finite space to another. But a shallow network dwells in the depths of the computational learning equivalent of  Turing's Tarpit. The big deal about Deep learning is more layers; which lead to large increases in expressivity, much like the difference between [Brainfuck](http://en.wikipedia.org/wiki/Brainfuck) and BASIC. Further on that, recent papers have found that shallow networks can in fact represent, with good fidelity, the same functions as deeper ones. This tells us that, if there's anything to be said about deep learning, it's that it is a way to imbue structure to a problem in such a way as to simplify search. Nascent abstractions which improve search by biasing toward more promising paths.

Okay, so the important takeaways are : 1) paying attention to complexity and especially, how quickly underlying dynamics change is vital -- most modern algorithms, deep learning included, don't do well with rapidly changing problems with (or without) higher order complexity 2) Machine Learning Algorithms are really no more than Turing Machines and often are less. 3) An e.g., Neural Network that is universal as an approximator is still not very effective due to the lack of expressivity and the big deal of deep learning with respect to NNs can be viewed as providing Neural Nets (which represent programs as tables of numbers) better tools with which to program themselves; like going from Machine Language to Assembly Language.

And so with 3) we hit the utility of Machine Learning. They are a particular form of Auto-programming.  Learned models are functions which compute maps from one space to another in such a way that distances and structures are as close to preserved as possible. What separates a good learner from a bad learner is how complex the sort of regularities it can identify are and how liable it is to get stuck at local optima. Generalization is done by having these functions exploit structure in the problem so that future instances are correctly mapped.

You can imagine a map from pixel intensity values or waveforms to vectors representing words (just numbers!). Maps from sequences to sequences where the elements just happen to capture word senses and contexts: implicit but not deep meaning, though still enough to provide a great deal of utility. Since they do not require a full table be memorized, they can be viewed as computing a particular kind of compression. The compression represents understanding of the patterns in use without caring for a deeper why. This friction actually underlies what people mean when they say AI has no true understanding. It does but its concerns are very narrow.

The incredible philosophical consequences of learning as exactly a form of programming to follow.

comic 1: http://tvtropes.org/pmwiki/pmwiki.php/Main/EnhanceButton source: http://www.phdcomics.com/comics.php?f=1156

2 photos
20
9

Finally got around to a second and third read
You must be rather steeped in the machine / deep learning community; seems I only get exposed to a tiny fraction of stuff (decent / credible advances) that percolate through and so haven't really suffered much of the pixie-dust effect that bedevils you.

RE Protein Folding nature has of course had billions of years with which to conduct a search of structures across the protein landscape, conducting a search experiment every 20 minutes in bacteria at least, multiplied by the number of individuals in the species. But that is just finding structures; raw chemistry folds them. And regarding folding I think specific, discrete short cuts are possible but (for what it's worth) probably not the NP shortcut you refer to. Sequences that are known to routinely form standard structural subunits such as beta-sheets or alpha-helicies should allow short-cuts to the final structure. However, sequence only gives you the immediate protein structure, which is fine if that is the functional protein, but requires an additional level of calculation if that is only a subunit itself in which multiple subunits must self-assemble to produce the final functional protein (e.g. many membrane channel proteins are comprised of 5, 7, or 9 identical protein subunits.)

RE The overall weave. Simply great work weaving together the concepts of machine learning, tarpits, compression, scripts, search, auto-programming, and mapping. Your linking together of these ideas provides a very useful framework to keep in mind when thinking about any of these topics. Also, your comment They require harnessing real numbers and infinite precision measurements, with the latter running against QM intuitions. is something I've also been thinking about recently.

On to the next post!

I also appreciated ' comment about comparisons of practical vs theoretical limitations and thoughts on being able to do more, practically, with ML than explicit programming alone.

from memory - and I did read and re-read your comments a lot - you never fully articulated the points you hint at here in the integrated consciousness discussions that I've seen you in. I would love to hear, and I'm sure many others would benefit also, from you expanding on these in a clear manner. What are the theoretical implications that we remain ignorant of? This is also the first I've seen you raise the nature of time in the discussion - surely Turing Complete is Turing Complete? Why does time - faster or slower processing - matter? I thought you maintained the position that consciousness is time-invariant, something that I was disputing in those other posts, but your comment here seems to imply that is not the case. Feel free to reply, or not - I'm sure we'll revisit at a later date regardless. ﻿

### Deen Abiola

Shared publicly  -

Distilling meaning to a number between 0 and 65,536

What is a concept? What does a word mean? A good reader quickly learns that meaning does not come from a word but rather, from the words around it, the words it tends to keep company with. That is the key motivation behind the Distributional hypothesis where:

> The basic idea of distributional semantics can be summed up in the so-called Distributional hypothesis: linguistic items with similar distributions have similar meanings.

There are many ways to leverage this, one of the oldest is something called Latent Semantic indexing where Singular Value Decomposition (SVD) is used to find the associations between words in similar contexts; words that tend to fill the same idea shaped hole. The problem is that it's slow - for my needs anything slower than linear is almost always unacceptable.

There's another idea, Random Indexing which also has the added benefit of being online - where each new document or word does not start the learning process again from scratch. The idea is to keep extremely high dimensional but sparse random vectors for each word. These word vectors are then used to update a context vector as the model grazes on various sentences. The detailed how of this is something I will save for another post but the same kind of semantic indexing as SVD is achieved but with a many several order of magnitude reduction in cost. There are many ways I use this, including document similarity, summarization, paragraph segmentation and query expansion but the simplest example is in finding similar words. Say I put in 'thus' then it, without explicit training knows similar words from having analyzed text. Then I get a result like: "(thus, 1), (then, 0.95), (therefore, 0.94), (hence, 0.92)". This is useful when dealing with new jargon and you wish to know where to go next (i.e. interactively searching dozens of pages at a time).

Problem is, I now have this 1000 dimensional vector and I'd like to find words which have similar meanings - usages (or contexts) to this new word I've never seen before. Tricks like kd-trees are not going to help here. Random Hyper planes to the rescue.

The idea is a form of local sensitivity hashing (another post) where similar things hash down to the same bucket. So I generate a hash function for my vector by generating a random vector, r, where a bitmap is generated such that presence of a bit is decided by if <r, v> >= 0 then 1 else 0. And with say 16 of these, the probability of collision should also double as a sort of similarity function. 16 of these also means I can represent it as a single 16 bit number. So two things happen here. I store the semantic dictionary with a single 16 bit number as the key, words with similar contexts will tend to fall in the same bucket. That single number represents a particular concept in my dictionary. And also, even with a linear search I can do bit twiddling to get the hamming distance of a 16 or 32 bit number much more quickly than calculating the cosine similarity of two extremely high dimensional vectors.

Then by scanning through and changing just one bit of the bitmap the word's concept vector hashes down to, I also get a pretty good neighborhood of word's similar to my vector. I can then do the more expensive (with magnitude pre-calculated) cosine similarity operation on this much reduced space. My tests show a 95-99% reduction in the space searched and acts as a sort of parameter-less approximate nearest neighbor (not all near neighbors are returned). Reducing the bits in the key so there are more collisions, results in a more thorough but still efficient search. For example, using 8 bit keys results in still surprising sensible divisions - I had to search only 10% of the space. This method allows a very quick use of context to get an idea of what a never before met word might mean - almost like what humans do.

An awesome corollary is that I could then take a document and reduce it to its key topic words, take their average and then hash that down to a single integer. Pretty nifty eh? And now for a twist.

Imagine two entities that have taken this concept to its logical extreme. To communicate entire documents they spit out single numbers with perfect extraction and hashing. Vectors are shared such that each number is decompressed to its topics, the topics are automatically generated to full ideas and expanded out to trees. Communication is really dense, involving code numbers to shared references and compressed thought trees...While the communicating using streams of numbers thing is not really tenable I do think something like communicating thought trees is possible. More on that later.

##Appended

Examples below use 16 bit keys showing various clusters from: http://nplusonemag.com/issue-13/essays/stupidity-of-computers/ (I have vectors derived from all my papers and notes but it's easier to analyze single pieces of text)

[You can see the full list using an 8 bit key and a hamming distance of 1 at: http://sir-deenicus.github.io/home/rvec1.htm]

31990 -> ["situations"; "perceptions"; "our"; "ontologies"]
22775 -> ["pseudocode"; "algorithm"; "aggressive"]

## from a computer go article

30924 -> ["rematch"; "loses"; "1965"]
2910 -> ["winner"; "second"; "favorite"; "event"]
52 -> ["tokyo"; "japanese"]

Sometimes the words are not synonyms as in 22778 -> ["remained"; "meaningful"; "ambiguities"]﻿
18
12

Excellent point. But assigning meaning is just a more general version of ability to use context: http://ramscar.wordpress.com/2014/06/23/the-errors-in-my-answer-to-darwin/

With this algorithm you can actually count without a reference to glabr and you can search for the word with the closest meaning and have an idea of what the word is*. Without any assignments this algorithm can already use language in a non-trivial manner. We humans need to bootstrap from experience but operating just from a graph built from context has much utility.

The point is that the meaning cannot be learned from the structure of the word itself. Maybe you can sometimes extract parts of speech but that's about it. Meaning is mostly from use and context.

See more here: http://en.wikipedia.org/wiki/Distributional_semantics; There is more to meaning than that of course. But that's a solid basis, better theories will generalize Distributional Semantics instead of invalidate it.

(* In actuality you can start from zero, track contexts and know what words mean on a very basic level, getting to human level understanding is more than just knowing what words have similar shaped holes. Words are assigned to objects but those, like spelling, are mostly arbitrary. The bulk of importance is in where a word is used. Repeating, one can imagine a form of intelligence that has no need for word assignment to perform high level language tasks)﻿

### Deen Abiola

Shared publicly  -

A lot of people think this is amusing, silly even; they couldn't be more wrong. I suspect sarcasm detection is pretty close to AI-complete. Solving it would require many powerful advancements in NLParsing.

Difficult sarcasm is usually global to the text (that is, you can't just look at a specific sentence, you have to be able to use the sentence structure to judge if the divergence between this sentence and the mood of the text at large is legit)...

Or even more difficult - requires common sense and some level of cultural embeddedness (e.g. inside jokes, shared cultural assumptions).

Kind of like how proving a silly arithmetic conjecture could lead to revolutionary advances in mathematics.

The Secret Service wants software that detects social media sarcasm. "Our objective is to automate our social media monitoring process. Twitter is what we analyze. This is real time stream analysis. The ability to detect sarcasm and false positives is just one of 16 or 18 things we are looking at. We are looking for the ability to quantity our social media outreach." Some of the other things the agency wants are "the ability to identify social media influencers, analyze data streams in real time, access old Twitter data and use heat maps. And it wants the software to be compatible with Internet Explorer 8."﻿
Think you're up to the job? You're probably not, but the Secret Service is accepting proposals.
6

: And when other countries' Three-Letter Agencies will catch on, you'll get the advanced self-referential game of Dear Internet Oracle, How Many Spooks Could Spooks Spook If Spooks Could Spook Spooks?﻿
In his circles
345 people
Have him in circles
4,711 people

### Deen Abiola

Shared publicly  -

The abstract of DeepMind's recent publication in Nature [2] on learning to play video games claims: "While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces.” It also claims to bridge "the divide between high-dimensional sensory inputs and actions.” Similarly, the first sentence of the abstract of the earlier tech report version [1] of the article [2] claims to "present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning.”

However, the first such system [3] was created earlier at the Swiss AI Lab IDSIA, former affiliation of three authors of the Nature paper [2].

The system [3] indeed was able to "learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning” (quote from the abstract [2]), without any unsupervised pre-training. It was successfully applied to various problems such as video game-based race car driving from raw high-dimensional visual input streams.

It uses recent compressed recurrent neural networks [4] to deal with sequential video inputs in partially observable environments, while DeepMind's system [2] uses more limited feedforward networks for fully observable environments and other techniques from over two decades ago, namely, CNNs [5,6], experience replay [7], and temporal difference-based game playing like in the famous self-teaching backgammon player [8], which 20 years ago already achieved the level of human world champions (while the Nature paper [2] reports "more than 75% of the human score on more than half of the games”).

Neuroevolution also successfully learned to play Atari games [9].

The article [2] also claims "the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks”. Since other learning systems also can solve quite diverse tasks, this claim seems debatable at least.

Numerous additional relevant references can be found in Sec. 6 on "Deep Reinforcement Learning” in a recent survey [10]. A recent TED talk [11] suggests that the system [1,2] was a reason why Google bought DeepMind, indicating commercial relevance of this topic.

References

[1] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, M. Riedmiller. Playing Atari with Deep Reinforcement Learning. Tech Report, 19 Dec. 2013, http://arxiv.org/abs/1312.5602

[2] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S.  Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. Hassabis. Human-level control through deep reinforcement learning. Nature, vol. 518, p 1529, 26 Feb. 2015.
http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html

[3] J. Koutnik, G. Cuccu, J. Schmidhuber, F. Gomez. Evolving Large-Scale Neural Networks for Vision-Based Reinforcement Learning. In Proc. Genetic and Evolutionary Computation Conference (GECCO), Amsterdam, July 2013. http://people.idsia.ch/~juergen/gecco2013torcs.pdf
Overview: http://people.idsia.ch/~juergen/compressednetworksearch.html

[4] J. Koutnik, F. Gomez, J. Schmidhuber. Evolving Neural Networks in Compressed Weight Space. In Proc. Genetic and Evolutionary Computation Conference (GECCO-2010), Portland, 2010. http://people.idsia.ch/~juergen/gecco2010koutnik.pdf

[5] K. Fukushima, K. (1979). Neural network model for a mechanism of pattern recognition unaffected by shift in position - Neocognitron. Trans. IECE, J62-A(10):658–665.

[6] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel. Back-propagation applied to handwritten zip code recognition. Neural Computation, 1(4):541–551, 1989

[7] L. Lin. Reinforcement Learning for Robots Using Neural Networks. PhD thesis, Carnegie Mellon University, Pittsburgh, 1993.

[8]  G. Tesauro. TD-gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6(2):215–219, 1994.

[9] M. Hausknecht, J. Lehman, R. Miikkulainen, P. Stone. A Neuroevolution Approach to General Atari Game Playing. IEEE Transactions on Computational Intelligence and AI in Games, 16 Dec. 2013.

[10] J. Schmidhuber. Deep Learning in Neural Networks: An Overview. Neural Networks, vol. 61, 85-117, 2015 (888 references, published online in 2014). http://people.idsia.ch/~juergen/deep-learning-overview.html

[11] L. Page. Where’s Google going next? Transcript of TED event, 2014

#machinelearning
#artificialintelligence
#computervision
#deeplearning
http://people.idsia.ch/~juergen/naturedeepmind.html﻿
9
1

Thanks! But this was Juergen Schmidhuber's post. He is a pioneer in the field and you can understand him being miffed by the lack of acknowledgement of prior art.

I found this annoying myself, the misleading statements that, for example, essentially ignored the really impressive early work by Gerald Tesauro on TD-Backgammon.﻿

### Deen Abiola

Shared publicly  -

Best of all (to me at least) is the future section. I really liked that and I'm glad he stressed the power and importance in interface design. I spend a lot of time thinking about that and it is in many ways much harder than implementing ML algorithms -- it's less well traveled and much harder to tame. Ari Gesher of Palantir, Lickleider and Vinge are the only other ones I know who have truly emphasized the potential there.

Quoting Vinge, 1993:

> Note that I am not proposing that AI research be ignored or less funded. What goes on with AI will often have applications in IA, and vice versa. I am suggesting that we recognize that in network and interface research there is something as profound (and potential wild) as Artificial Intelligence. With that insight, we may see projects that are not as directly applicable as conventional interface and network design work, but which serve to advance us toward the Singularity along the IA path.

All signs seem to point that this is the path we're barreling down. (note: singularity here, just boils down to awesome problem solving ability  far, far beyond what we have today. There's nothing magical or rapturous about it. I'm betting most people will barely even notice.)

Understanding Deep Neural Networks

Over the last few years, Deep Neural Networks (DNNs) have been increasingly used in various applications, from speech recognition (http://goo.gl/vVVCPT) to computer vision and image classification (http://goo.gl/1pNzn4).

But what exactly are DNNs, and how do they work?

Former Google intern  has written a blog post that uses dimensionality reduction and interactive visualizations to help understand what exactly is happening “under the hood”, giving one an intuition on the internal operations of DNNs.﻿
In a previous post, we explored techniques for visualizing high-dimensional data. Trying to visualize high dimensional data is, by itself, very interesting, but my real goal is something else. I think these techniques form a set of basic building blocks to try and understand machine learning, ...
10
3

Thanks for the clarification Clearly I've over weighted the "dimensionality reducing" properties of the SOM and inferred too widely. Indeed a SOM is both a clustering method, and a projection.﻿

### Deen Abiola

Shared publicly  -

#Learning and Computation

Not only are we constantly modeling the world mathematically, animal brains are also constantly proving things about it. The proofs are consistent even if not necessarily sound nor complete. I've not seen it spelled out anywhere before but it's an incredible consequence of the fact that in essence, machine learning is a method to search for programs. For some people this might all read out as obvious and this is a good thing I think. For me, this is a good case study for my philosophy of not keeping things compartmentalized but instead trying to think in terms of bridges from many things to many other things.

In a previous post (Machine Learning is not Magic), I tried to lay the ground for this essay by building some intuitions on learning algorithms; emphasizing them as functions or maps and computations, talking about compression and generalization and identifying causality vs being satisfied with correlations (not in so many words, deserving of its own essay). This essay will serve to more closely link learning with search, logic and computation. It's also worthwhile to spend some time thinking about why machine learning is so desirable: namely because most of the useful things we do are not conscious.

In fact, one of the most interesting things to come out of attempts to build AI is what's known as Moravec's Paradox. Moravec's Paradox is where all the simple stuff we take for granted: vision, speech, walking etc. were expected to be easy while stuff like chess, math, puzzles were thought of as the harder things to implement. Instead it turned out to be the opposite, not just because concepts like common sense are less available and so harder, but because they take a great deal more computational resources while being of higher algorithmic complexity. It's easy to forget how amazingly impressive what virtually every human baby is capable of in a relatively short amount of time; learning the incredibly complex sequences and patterns behind language as well as states of mind they communicate, learning vision, sounds, walking, building a physics model of the world, modeling other humans -- all with minimal guidance (and without being conscious too... mostly...gets in the way?†)!

The bulk of our intelligence is not in our rudimentary puzzle solving abilities; trying to build an AI has taught us that actually, most of the things we thought required minimal intelligence are actually some of the most complex things we do in an objective sense. Moravec also suggested that evolution never got round to optimizing (or there was little benefit for) the deeper logical reasoning computers find easy. This makes sense but I also think there's another more important issue. Reasoning with exact numbers and the extremely large error free scratch spaces computers have requires a great deal of energy while evolution's priority was one of minimizing energy usage above all else. While computers are excellent at forward reasoning/search, humans are (for now) better at pattern matching and recognition since the latter does not require maintaining a massive state space.

There's another consequence of the paradox. It's common to think  that in so far that automation is a problem for jobs, more education will fix it. But Moravec's paradox suggests to us that the trade jobs and the jobs requiring the kind of higher order pattern matching that so sets us apart will be safest. It's the management jobs, the rote jobs and entry level jobs across large swathes of industry that will fall. In law, in science, in computing, and yes in manufacture; as long as your job doesn't require constant novel problems difficult to automate, you're a target. This means that it's the graduates, often young, who will be the most affected ones. Fixing the problem will require something more significant than pushing everyone through university.

##Iceberg

Suppose I wanted to write a program that could recognize cats or faces or whatever; to write this program would effectively be impossible because a lot of human intelligence is not surfaced. Much of what we do is not made available to us -- it's unconscious -- hence for example, we don't know how it is we tell one person's voice from another's, even with only a very short and distorted listening time. This means that even if there were a simple program behind it, it would not be possible for us to write it.

So what machine learning really is, is auto-programming. Indeed much of how machine learning is currently used -- batch training -- is exactly like scripting in the sense that you run something external to your code that's fixed at development time and hopefully increases flexibility and appearance of intelligence at runtime. In other words, the end products should be considered as little and limited programs and not something magical. But limited in what sense you might ask? Well, the biggest one is that the programs most models learn, even the infamous feed-forward deep learning ones, are not Turing Complete. This is a good thing mostly, it makes search and reasoning about them much easier.

But how do we find these programs automatically? The first thing is to recognize is that the space of programs is mind bogglingly huge; you need some way to quickly guide you to the specific program you're looking for. And the way this is typically done is, paired with some algorithm that uses errors to decide on direction, you feed in lots of data. Data which help divide/categorize the space of examples while also constraining the space of programs to a particular locale. The model which gets output, together with the ML algorithm, specify a program that is able to hopefully accomplish the task we set it to. Barring exceptions like genetic programming -- and also decision trees/forests, I'd argue -- the model itself is not the program. You can't run the model, they're parameters for the algorithm. The algorithm is a function which when run on input data gives us the desired output most of the time. End result is, thanks to machine learning, programs get written tackling tasks which have insufficient conscious availability to have allowed us to have written them ourselves.

It's evident then, but not often remarked upon, that the output of a learning algorithm is (or at least parameters for) a program. This means that learning is a particular kind of search, a search whose end point should be a program that is also a compression of the visited samples, aka generalization. Some people make a distinction between search and optimization but I view this as artificial, serving only to bury opportunities to make connections under trivial details. Optimization might typically only happen on differentiable manifolds -- guided by gradients -- but all that is, is a very specific kind of search, more principled but also more limited (relatively, still huge, basically all state of the art is done in that kind of space) and likely not how the brain implements its specifics.

Models as Programs?

For something like a Neural Network or Support Vector Machine it's not immediately obvious how training is like searching for a program. Consider a Neural Network, you can think for our purposes, of it as tables of numbers which represent the connections between nodes of a network. Learning involves tweaking those numbers so that when an input vector, a column of data, is fed in, gates (functions like: f (x) = log (1 + exp x)) at each node turn on or not in such a way that input/output pairs similar to those seen in training time are generated. It's a function, a functional program. But how is training like search? What are the parameters?

The output parameters are not the program, instead the parameters tune the model's evaluation algorithm such that it becomes a function specialized to the training data. What I mean by specialization is hard to find a metaphor for and the best I can point to is from computer science itself. The notion of how a general learning algorithm is specialized to a particular function by parameters can be loosely compared to something like the relationship between a regular expression engine and a state machine. The input data specialize and tune the parameters to a program in a similar way that a Levenstein automaton or a depth first automaton might specialize regular expression matching. That's a rough analogy and if it makes no sense then an example based on decision trees might prove clearer.

Decision trees are more obviously programs. They're programs like if x = 3 Or y < 2 then if x = 0 then Car else... etc; they serve to partition the problem space and the search is guided by trying to maximize information gain of each if else split in the data. A bunch of nested if then else statements are more obviously seen as a program and so the search for the best tree is a search, guided by concepts from information theory, for a program which best partitions and explains the observed data. Decision Trees I'd argue, especially because of their interpretability, are closer to how brains represent knowledge (on a conceptual level) than Neural Networks. (Genetic programming is without any ambiguity, a search for programs)

Programs and proofs

Training a model is exactly a search for a program. One of the most profound concepts in Computer Science is the Curry-Howard Isomorphism. The Curry-Howard Isomorphism links the type systems of computer programs to proofs in logic. The link is not a trivial one, it's rich and deep and has lead to powerful theorem provers but here it suffices to draw the correspondence from types to theorems and programs to proofs (evaluation to implication expansion). What's interesting here is that a machine learning algorithm is an algorithm which searches for programs and programs are proofs hence the act of learning is equivalent to searching for a proof in some non-trivial sense. But proofs of what? It's not clear exactly, since nothing like types are specified. But that doesn't matter since types can be inferred and often, the systems in use are simply typed and decidable. The types are not interesting, being usually very general [e.g. List of< string|number > implies Member of Set{a,b,c}], it's the proofs that are of interest. And since the types are so general, the proofs will not necessarily cover the phenomena even if the theorem is satisfied. But embodied in the programs will be partial proofs on the set of observations being visited.

And as plenty of often cruel experiments have taught us, even things we take for granted†††, like vision and hearing are actually mostly learned. So there's this beautiful observation that in learning to see, every child is learning/searching for a program/building a proof about the behavior of light and objects out in the world. The same is true for hearing, language, walking etc. Not to mention higher order learning. And having learned things, animals end up carrying lots of little programs, evaluating the world and thus carrying out proofs on every observation (e.g. modus ponens) while also, (for the ones that can learn) constantly searching for proofs.

Rounding it all Up

* Learning algorithms are searching for functions
* For Turing machines, functions are programs and programs are proofs
* Learning algorithms are searching for proofs
* Learned things are programs hence proofs
* Running a program is, roughly, expanding and going through a proof.
* Every learning animal is constantly searching for proofs and proving things about the world
* In order that each program be useful, it should also have less descriptive complexity than a table matching all observations to some output. As such, this program can be viewed as having acted to compress the table of observations††.

Evolution as Learning

There is a way in which this gets really interesting when applied to Evolution -- not just the program aspect but especially the learning angle. I'll assume for now that the Church Turing Thesis is valid -- not that the universe is a computer but that everything is computing itself. So a rock could be replaced with a Turing Machine that computes a rock as a byproduct or output or whatever without any loss.

Now consider that evolution is a learning algorithm. This is well known and no longer controversial, you can look at it like this: Imagine traits for some population; number of legs, color of fur, height, scale or fur etc. Each organism is placed in an environment. Over time the ones that die will shift the distribution of traits and how this happens is very much like an algorithm computing probability distributions over traits in a way similar to what happens in machine learning (you can link it to Bayesian learning or for sexual evolution, to game playing with weighted majority the manner in which the distribution is balanced/evolved). Mutation is not the main story, it's random and serves mainly to introduce the raw material for evolution. The traits are far less specific of course, corresponding more to genetic units/alleles but the key idea remains.

So we have evolution as search/learning. But what is being learned? I don't know but what is interesting to look at is the driving goal of evolution at all scales: replication. From simple RNA viruses all the way up to humans, replicating close to the current arrangement of atoms is a (the?) major driver.

Then, with evolution as learning and assuming everything can be swapped out without loss with an equivalent Turing machine (or less), then effectively, each organism is the result of a search for a proof related to what it takes to maintain and replicate states in this universe!

But is assuming a Turing machine too strong? Okay, suppose the Church Turing Thesis is wrong. Next, consider that a Turing machine can efficiently simulate, with arbitrary precision, all classical systems. Allow extensions (e.g. quantum Turing machine) and taking into account that everything a Turing machine can do, the universe can as well but the opposite is not true, then: Turing machines are a subset of the universe. So this argument should still hold with the only weakness being if the essence of evolution and cognition require not just incomputable things, but physically unharnessable, no incomputable things only harnessable by brains and the building blocks of living things. This is a very strong assumption and piles on unnecessary complexity, violating Occam's razor.

The common argument, that we used to analogize via clocks etc., is incredibly flawed by taking into account that Clockwork is not universal in the same way a Universal Turing Machine is. Here is what Gödel had to say:

>"It may also be shown that a function which is computable ['reckonable'] in one of the systems Si, or even in a system of transfinite type, is already computable [reckonable] in S1. Thus the concept 'computable' ['reckonable'] is in a certain definite sense 'absolute', while practically all other familiar metamathematical concepts (e.g. provable, definable, etc.) depend quite essentially on the system to which they are defined"

________________

Consciousness

I'm not sure why people put self-consciousness as some kind of pinnacle. I suppose as a seemingly unique human trait, it's placed on a pedestal and worshipped as something that separates us from all other kinds of intelligences. Yet if you look closer, it's not difficult to see that consciousness is very over-rated. 1) As I pointed out above, some of the most incredible feats of intelligence are performed by barely or not even yet conscious human babies. 2) A lot of wisdom, Zen quotes and quotes on mastery are all about shutting down your conscious mind. A novice dancer or martial artist is conscious of all their movements. They're jerky and awkward, where as graceful and fluid movements are only doable once the knowledge has been transferred to the unconscious. This is not just true for automatic movements but for the highest levels of creativity too. 3) The state of flow, where conscious awareness is dimmed and the boundary between self and task is lessened results in the highest levels of performance. 4) People speak often of sleeping on a problem, not thinking about it consciously and having the solution (the brilliant Poincare wrote extensively of this as his method) come to them seeming spontaneously. It seems that all the effortless, graceful and masterful acts are done by the non-conscious part of the brain.

Consciousness is also limited. A large distributed entity would forgo it due to latency and an entity wanting to maintain parallel levels of awareness and threads of cognition will likely think the serial aspect of consciousness too limiting. An entity does not need to be conscious to allow the goals of others to affect which actions it selects nor to model itself against the background of some environment. Consciousness is a tool only; a bookkeeping, blame assigning, goal maintaining tool that's some how morphed into this pointy headed boss that seeks to claim credit for everything that happens in the brain. In fact, I'm having trouble imagining a selfish, jealous, spiteful and petty non-conscious entity.

††  Unreasonable  Effectiveness of  Mathematics

Assuming the Church Turing Thesis allows all sorts of beautiful links between evolution, learning, logic, proofs and physics to fall out. With compression as generalization we also get links between entropy,  Kolmogorov complexity and learning. The physics comes from the link from programs to cartesian closed or monoidal categories. As time goes on, I'm starting to find the Effectiveness of  Mathematics not just very reasonable but also, almost...tautological.

†††  You can see the experiments on the poor kittens here: [Development of the Brain depends on the Visual Environment](http://www.nature.com/nature/journal/v228/n5270/abs/228477a0.html), <https://computervisionblog.wordpress.com/2013/06/01/cats-and-vision-is-vision-acquired-or-innate/>.

The wikipedia article on the critical period hypothesis is also worth a look: <http://en.wikipedia.org/wiki/Critical_period_hypothesis#Deaf_and_feral_children>

But my favorite example is that the Müller-Lyer illusion is sensitive to whether someone grew up in a city or a desert.

>It has been shown that perception of the Müller-Lyer illusion varies across cultures and age groups.

> Segall, Campbell and Herskovitz[4] compared susceptibility to four different visual illusions in three population samples of Caucasians, twelve of Africans, and one from the Philippines. For the Müller-Lyer illusion, the mean fractional misperception of the length of the line segments varied from 1.4% to 20.3%. The three European-derived samples were the three most susceptible samples, while the San foragers of the Kalahari desert were the least susceptible.

(I like the image below because of it's clockwork like aspect but cannot find any attribution. My least favorite part about the internet)﻿
2 photos
23
11

Very nice summary.  I'm in basic agreement.  I want to clear up one thing.  I don't think that natural selection is in any way random; just the opposite.  What I was saying (still am) is that if mutation is the only source of new form then the evolution of new form will be modeled by models based on the idea of randomness.  (BTW, I have, in my own thinking, banned "random" as a concept--so far utterly without any functional loss.)﻿

### Deen Abiola

Shared publicly  -

[★★★★★★★★★★]

Our social networks are broken. Here's how to fix them.

//   1.

You can't really blame us for building Facebook the way we have. By “we” I mean we billion-plus Facebook users, because of course we are the ones who built Facebook. Zuckerberg Inc. might take all the credit (and profit) from Facebook's success, but all the content and contacts on Facebook-- you know, the part of the service we users actually find valuable-- was produced, curated, and distributed by us: by you, and me, and our vast network of friends. So You can’t blame us for how things turned out.  We really had no idea what we were doing when we built this thing. None of us had ever built a network this big and important before. The digital age is still mostly uncharted territory.

To be fair, we've done a genuinely impressive job given what we had to work with. Facebook is already the digital home to a significant fraction of the global human population. Whatever you think of the service, its size is nothing to scoff at. The population of Facebook users today is about the same as the global human population just 200 years ago. Human communities of this scale are more than just rare: they are historically unprecedented. We have accomplished something truly amazing. Good work, people. We have every right to be proud of ourselves.

But pride shouldn't prevent us from being honest about these things we build--it shouldn’t make us complacent, or turn us blind to the flaws in our creation. Our digital social networks are broken. They don't work the way we had hoped they would; they don't work for us. This problem isn't unique to Facebook, so throwing stones at only the biggest of silicon giants won’t solve it. The problem is with the way we are thinking about the task of social networking itself. To use a very American analogy, our existing social networking tools suffer from the equivalence of a transmission failure: we can get the engine running, but we are struggling to put that power to work.  We see the potential of the internet, but we're at a loss as to how we can direct all this activity into a genuinely positive social change. What little social organization the internet has made possible is fleeting and unreliable, more likely to raise money for potato salad than it is to confront (much less solve) any serious social problem. Arguably, our biggest coordinated online success to date has been the Ice Bucket Challenge; even if we grant the meme has had a positive impact, what change to the social order has come with it? What new infrastructure or social conscience was left in its wake? In terms of social utility, the IBC was like a twitching finger from an otherwise comatose patient: it may give us some hope, but who knows what else.

Of course, many opportunists have found clever ways to capitalize on the existing network structure, and a few have made a lot of money in the process. The economy is certainly not blind to the latent power of the internet. But as a rule, these digital opportunities are leveraged for purely private gain. The best the public can hope for is that successful digital businesses will turn out cheap services that we can shackle ourselves to like domesticated animals. There have been enough major successes of this model that in the year 2014 we’ve come to accept our fate as unpaid digital domestic labor. There is no longer any hope of using the internet to reorganize the people from post-capitalist consumers into fully empowered digital citizens, because it has become clear that our digital tools have simply been used to standardize the post-capitalist consumer lifestyle on a global scale.

We need to realize that a half a million human bodies walking down a street with cell phones and hand written signs still have more political power than 10+ million strong Facebook groups or Twitter streams. We still live in an age where an afternoon walk with a few like-minded people can outrun the social influence of a digital collective an order of magnitude larger. You might have expected a digital population to overwhelm our naked ancestors, but if anything the opposite has proven true. When TwitchPlaysPokemon rallied 1.16 million people to beat Pokemon in 16 days, everyone who participated recognized that we accomplished an amazing thing. But we also had to acknowledge, without any cognitive dissonance, that each of us could beat the game ourselves in about a day and a half.

Okay, okay, so our social networks are broken, and we haven’t even begun to count the ways. There are niche digital communities accomplishing amazing feats of cooperation, but all of us with all our gadgets are not yet as strong as some of us plain old boring people, doing the things we've been doing for centuries like voting and assembling. Why not?

2.

Our social networks were originally designed to function like an interactive digital rolodex: a system for managing and engaging a list of social and professional contacts. To someone thinking about life in the digital age around the turn of the century, the idea made a lot of sense: how else would we find our friends in a place as wild and disorganized as the internet without a book of contacts?  Social networks today vary only slightly from this original design.  Some networks emphasize interpersonal relationships and others emphasize content engagement, but the differences in networking tools ultimately have little to do with the liveliness of the communities they serve. Users are willing to put up with a lot of UI nonsense in order to engage with the communities they care about. A passionate community might thrive on a poorly designed network, and a high-end design might fail to attract any community at all. From the user’s point of view, these communities are attractive for two reasons: its members and their interests. Who is on this network, and what are they talking about?

So if we’re being honest with ourselves then this is the unflinching truth: the growth of social networking happened despite the tools we’ve built, not because of them. We are social creatures; we want to share ourselves with each other. In the age of industry and capital, satisfying this need to share had become almost impossible. When our digital tools offered the promise of overcoming our alienation and reconnecting with each other we jumped at the opportunity. We became refugees fleeing failed states on wifi. As digital immigrants we have suffered through the privacy violations, UI disasters, and the untold hells of political irrelevance that are common to all immigrant stories. And we’ve done it for nothing more than for than the promise to connect with each other, if only to share a picture of our pets.The idea that any one company or service would take credit for the epic digital migration we’ve collectively accomplished over the last decade is ludicrous; we’re the species who figured out how to communicate through tin cans and string. The growth of social networking is what happens when you give the internet to enough huddled masses yearning to breathe free.

But it turns out that we don’t use our social networks like an interactive rolodex. In fact, the relationships we used to have with the people in our rolodex tend to be one-dimensional and alienating in exactly the way we we came to these digital spaces to avoid. Instead of a list-management tool, people’s online behavior appears to require something more like a living room, or (depending on where and how you live) your porch or kitchen table or stoop: a space to visit with each other; where we can showcase our triumphs, complain about our problems, share our hopes, gossip about our friends, and discuss the happenings of the day; where the atmosphere is jovial and hospitable and supportive. In short, we are trying to build a home, in the midst of a community of homes, together with the people we want in our lives. In some homes you can talk about politics or religion, in others you can’t; in some you’ll be subject to hundreds of photos of vacations and babies and pets, and in others you’ll find the accumulated markings and detritus of a real life lived. A relationship planner with multimedia messaging is nice, but what we really want are digital living spaces where we can be together as a community.

It shouldn’t surprise anyone that we’d approach the task of community-building in this way: by carving out spaces for ourselves and our friends around focal objects. This is how we humans have always developed our communities: not by managing lists of people through which to channel our communication, but by organizing spaces that can accommodate all of our activity. Communication is but a tool in service of that cooperation. I’m not just talking about a “digital commons”, in the sense of a public space for cooperating beyond the purview of one’s home. We don’t even have our own spaces managed right; we are still shackled to these monolithic centralized services that own and manage our relationships according to their grand designs. In such an atmosphere we have trouble motivating even our friends and family to action, much less something as quixotic and ethereal as “the public”. Expecting these digital communities to engage seriously in politics is like expecting toddlers to engage seriously in politics.

So let’s be concrete. Now that we’ve all gathered around the digital hearth and can hear each other speak, let’s think again about what is missing from this generation of social networking tools, and what we need to see in the next.

3.

Today’s social networks are centralized. Our homes are decentralized.

The kind of activity and commotion that’s common in one person’s home might be intolerable in another’s. That’s fine; we’re different people entertaining different communities, and we build our homes to accommodate our specific community needs. Any social network must be as sensitive to these variations as we are. A network under central management is forced to ignore the differences between us and homogenize our interactions to maintain order at large scales. This leveling of variation is a necessary feature of any centrally-managed network, and it can have a number of unfortunate consequences for the communities we can form on them. Some standardization is good, even healthy. But the details matter, and the wrong standards can destroy a community. The larger the network, the more likely any central management will simply be insensitive to community-level needs.

Putting our data in the hands of a central network authority also makes it all the more likely that the information will be released without our consent, either deliberately or accidentally, and this alone can be a deciding factor in whether and to what extent a person will participate in a network. But the problem with centralized network management is even more fundamental than privacy. When we share something with a friend on a centralized network, we’re also implicating the central management in that exchange. It is because central management plays a role in every network exchange that they are in a position to violate our privacy in the first place. This ubiquitous presence can become a dominant influence on our interaction, making our relationship develop according to the needs and interests of the network managers, which may diverge arbitrarily from our own. The effect is a little like trying to manage your home with a state official looking over your shoulder and archiving all your activities, filtering not just for legality but also for targeted advertising. As digital immigrants we’ve come to accept that we’re being overseen, but we should also realize that these are not the conditions under which do our best work. When unknown third parties with unknown interests are not only present in our interactions, but can radically disrupt the structure of those relationships without notice, then we’re not very likely to devote serious time and effort to cultivating digital spaces to meet our cooperative needs. As a result, our networks remain flimsy, makeshift, liable to blow away at any second--these are no conditions in which to build a home that we can do anything meaningful in.

Building a community of homes means building spaces that can self-organize in response to the needs of our various overlapping communities without oversight and central control. There is no center to our vast network of friends; there is no vantage point from which to micromanage our relationships but our own. The point is not that our networks cannot be managed; the point is that we need to be the management. We need a network where our data remains ours, and where the terms and conditions of our social lives are set exclusively by us.

With today’s networks, my identity is an option in a pull-down menu. In our homes, we develop who we are through what we do and who we do it with.

A rolodex is a centralized leveling tool: a person’s critical details are made to fit on a small standardized card in a roll of functionally identical cards. It is left to the user to construct the network from these details: to evaluate the strength of the relationship, the relative importance they might have for our projects, and they way they fit into the larger fabric of our social lives. Today’s social network continues the tradition of encouraging people to fit cookie-cutter identities to maximize advertising revenue. No consideration is paid to how these constraints on identity formation might impact our ability to form and sustain a vibrant community. This helps to explain why people mostly use online social networking to manage relationships they began offline, where they have more direct control over their identity and reputation. Exclusively online-only relationships typically take much longer to develop familiarity and trust, simply because we are witness to substantively less activity from the other. Talking to grandma online is easy enough because who we are and what we mean has already been established elsewhere. The same familiarity isn’t available generally: a random internet person could be anyone and want anything. This cannot be the basis for social cooperation.

Functional social networks develop through the construction of differentiated reputations. We each have different strengths and weaknesses, and by working together we learn how we each fit into all our overlapping projects. By forcing us into pre-fab identities, we lose the ability to track how we might best cooperate, or how our identities evolve as a result of what we’ve done together. Instead, we’re left to cobble together a pale imitation of reputation from what little data we have access to, in terms of likes, shares, and followers (or their equivalents), as if the quality and utility of our work depended only on the number of people who saw it. There’s nothing wrong with followers and likes per se, but when these are our only resources for organizing we end up with bizarre distortions of a healthy community. In such an environment we tend overvalue the activity of celebrities and become suspicious of everyone else, simply because we have no other common resources for making finer distinctions. None of these tools reveal how our networks might be put towards our various social ends, because ultimately it is not our ends these networks serve.

Building a community of homes requires building identities with reputations we control through the work we do and the communities we engage with. When we control our identities, and when the feedback we receive reflects the value of the work that we do, then we will we finally feel the responsibility and commitment that only a functional community can generate. We need a social network that can provide the tools for managing our reputations across the many diverse communities we engage with, that understands how these reputations change with context, and how our collective strengths can be stitched together to compose a much greater whole.

Today, a successful social networking campaign achieves virality. A successful home achieve results.

We have no other tools for judging the success of our activity online except in terms of raw audience size. In this degenerate capacity we can conceive of no other strategic goals but virality: spreading a message quickly and widely. The goal of virality admits up front we are powerless to effect change ourselves. Instead, the best we can hope is that prolific exposure through synchronized spamming will bring the message to the feet of the people with the resources to do something about it. As digital immigrants our own voices do not carry far enough, and we are in no position to do anything about the cries of our neighbors. So instead, we’ve relegated ourselves to being the messenger in our own social networks, delivering notes between the already-powerful and pretending to live in the same communities as them.

In a functioning community, the strength of the signal tends to correlate with the urgency of the message. The messages that spread the fastest are usually the biggest emergencies requiring the most immediate attention. The messages that spread the widest tend to be the information people need for coordinating their activities across great distances. But most of our cooperation is local and not terribly urgent, and therefore doesn’t depend on raw signal strength. Viral appeals to our collective attention cannot be the only tool in our kit for getting our messages across.

Although our attention is among the most precious of our limited resources, we nevertheless produce it continuously and nearly without effort. We eagerly give it away to the things we find interesting and worthwhile without expecting anything in return. It is by paying attention that we imbue our world with structure and meaning; this is ultimately what we are all here to do. Our collective attention is distributed across an enormous variety of projects and communities, and that distribution reflects our self-organized division of labor and value: what we consider worthwhile enough to spend our time doing. We use that distribution to decide where we will spend our attention next, and through this collective management of attention we are capable of organizing all of our productive social systems. So when we engage each other on existing social networks, when we each like and share according to our own interests and tastes, we expect that the resulting community will reflect some consensus of our participation-- that the network will be “better”, according to the standards of “better” as indicated by our contributions.

Existing social networks don’t function that way at all: our engagement is harvested for advertisers, and whatever feedback it generates is lost in the noise of the greater economy. There’s no reason whatsoever to hope that our networks will develop in response to our activity and values, because we know that they are responding to other values and using our activity for other purposes. They’ve hijacked the spaces we’ve selected for our homes and they are exploiting us for all we’re willing to give. Meanwhile, all the attention we pay goes to waste, utterly failing to secure the expected return on investment, having been traded away for dollar of ad space. We still dismiss hashtag campaigns as slacktivism, as if our impotence were a character flaw. The truth is we’re doing the best with the tools we’re given, and ultimately these social networks were never built to work that way in the first place.

Building a community of homes, one that really works for all of us, requires a whole new approach to the economy of attention, one that understands how the organization of the system emerges from the activity of its many distinct parts. We need new tools for networking, not just to make connections but to hook the right communities up in the right way so that we can all accomplish what none of us could alone.

The digital homes described above do not yet exist; we have yet to build them. As digital immigrants we’ve been tossed between halfway homes for years, so the significance of this challenge might not have fully registered. Partial solutions exist, but only piecemeal and scattershot across the available networks; no solution has met these problems with the elegance and comprehension necessary to bring social networking into a new era.

But that’s about to change. People are obviously thinking about the next generation of social networking, and for the last few weeks I’ve been working with a team of developers on a distributed networking service built on the block chain, one that bakes security, reputation, and community management directly into the basic feature set. We’re set to announce within the next few days, when I hope to tell you much more about the details of the project. Until then I hope the comments here give some insight into our philosophical approach to the design.

#attentioneconomy   #digitalpolitics   #socialnetworking

// You can engage a public GDocs version of this essay here: http://goo.gl/hBtQHi﻿
7
1

?﻿

### Deen Abiola

Shared publicly  -

There really is something to be said about achieving genuine augmented recall. Imagine putting a lot of time and energy into communicating non-trivial thoughts and then having them disappear into some practically unreachable place - never to be encountered again; with the only sign of having gone through the process being perhaps some slight unconscious rearrangement of your mind.

That sad state of being is actually how we've lived for thousands of years. With usually little difference between things done and those not. It seems silly that this continues within the digital arena. So much emptied time.

For example.

*

Google+ Comments take a bit of rube goldberg maneuvering to retrieve. The API does not provide a means to easily list all the comments of a user (nor does take out). I assume that this might be because they're stored in a manner such that comments are not easily linked with users. And although there is an easy way to retrieve your posts, I expect I'm not alone in believing that comments contain equal if not more value than posts. I've made almost 200 posts but nearly 2000 comments; in terms of time commitment my posts are almost noise.

Happily, I was able to retrieve the comments by: 1) searching my name using the api; this returns a list of activities (posts) which I then 2) I transform by taking the id and returning the list of comments for that post and filtering all the comments which are not mine. This method is not perfect however - there are some posts returned by search where the api retrieves no comments by me. Even though, when I do manually check the posts I do in fact have comments on them. There were about 200 of these (I assume some kind of history error). Nonetheless, I retain 90% of my comments so is ok, I guess. (also, I'm not bothered enough to check but I think search results keeps returning tokens for next page even if results are empty)

*

Why would I want my comments? Well for one, they become valuable, less time wasting effort in creating them if they do not automatically evaporate once written. But the real value is in order: 1) As a source for a remembrance agent. 4) For search. 100) Archival purposes.

Expanding:

100) Who knows how long G+ will persist.

4) Although G+ allows you to search comments, the interface is terrible. To search requires either opening G+ (expensive action), or even if it's already opened, the search box has a noticeable delay before displaying low information density results. As well, you still have to do a quadratic search for your comment in all the posts' comments. Super inefficient and violates Zipf's http://en.wikipedia.org/wiki/Principle_of_least_effort.

But most of all 1). A remembrance agent. Is a tool which uses your information context to suggest relevant related material. Thus far my sources are evernote clippings, exomind clippings, pdfs, the title of any website I've ever visited, automatic topic extraction of certain sites (level of detail between titles and clipping), my notes and g+ comments. In the specific case of comments I can cease repeating myself and automatically have on hand anything I've ever said related to the current topic of interest.

It still amazes me that in this day and age people still offer only poorly recalled details about something they've read online. This tells me that our intellect augmenting artifacts could be much improved.

**********

The code to get the comments relies on the awesome F# Data Type Provider for json inference: https://gist.github.com/sir-deenicus/9a61fd067a50c59e62bd

searchGplus "20" None ["query", "Deen+Abiola"]
|> Seq.toArray
|> Array.collect (fst4 >> getComments >> Seq.toArray >> Array.filter (fst4 >> strcontains "Deen Abiola"))﻿
History. The principle was studied by linguist George Kingsley Zipf who wrote Human Behaviour and the Principle of Least Effort: An Introduction to Human Ecology, first published in 1949. He theorised that the distribution of word use was due to tendency to communicate efficiently with ...
18
3

:  Lots of good stuff here, and quite a bit I agree with.

I recently noted that I'd love to have a browser-oriented search feature which searched recently visited Web pages for keywords as I constantly find myself at the end of paths that I've forgotten the origins of.

There's a longer set of frustrations and suggestions for future Web clients as well:  http://redd.it/256lxu

On G+ search:  it's fast and comprehensive, but the results-set presentation stinks:  you cannot expand/collapse all content, it's an 'endless page' that cannot be back-navigated to.  And the syntax is pathetic:  no user/author specification, comment/post distinction, community subsetting (without first navigating to that community), date ranges, and more.

If I could marry the speed and comprehensiveness of G+ search with the syntax of Reddit I'd be in heaven.  Or find something else to bitch about.

One practice I've made is of creating posts based of my more significant comments.  And, generally, relocating those to Reddit (which has much better post search features).

I'd really like to have a browser that's based on a bibliographic tool (zotero, calibre), has messaging capabilities (usenet-ish), RSS, and some degree of note-taking.  Actually, KDE's Kontact PIM might be a related foundation.

I actually don't want something that remembers everything (there's too much drek), but key pieces would be nice.

Chrome history search is indeed pathetic (I've got an extension and it still sucks).  Funny how they're encouraging you to put yet more information online...

:  your "funnel" / storage model is fairly close to what I'm starting to do, though only imperfectly (this response is getting composed in an external editor session only so I can view comments on the thread and respond to them, but not being saved).  For my primary "thoughts and ideas" site on reddit, I have a local "Drafts" directory with ... about 30 articles I'm working on to some extent or another.  That constitutes a bit of a local archive.

For the larger project that's based around, I've got both a Notes directory with ~150 entries ranging from 4 lines to 1000 (mean: 100, median, 51).  And a box of offline index cards (100s) for bibliographic and other notes.  It's interesting the affordances each approach offers.  And for a digression, see Ted Nelson's rant on how word-processing "cut and paste" got it wrong ...

If I could force myself to always use the local funnel that would be great, and one of the benefits of tools such as Usenet or mailing lists is that content was also presumed locally originated such that the store was inherently created or collectable.

A failure of online messaging / discussion systems is that they're designed with the interests (and constraints) of the provider in mind, not of authors and users.

: Damnit, now you've triggered another half-memory of decay states that I can't recall the source of...﻿

### Deen Abiola

Shared publicly  -

Text Dump; Cooperative Learning Preview. Note: Automatically generated, I originally intended this to be titled guest post but.. if I claim these set of algorithms an extension of a mind then surely an inappropriate use of third person!? You wish to find an informative path from "amplituhedron"  to "dirichlet distribution"

*****************
I think Projective space because in the approach, the [[On shell and off shell|on-shell]] scattering process "tree" is described by a positive [[Grassmannian]], a structure in [[algebraic geometry]] analogous to a [[convex polytope]], that generalizes the idea of a [[simplex]] in [[projective space]]. | 0.3

*****************
I think Principle of locality because amplituhedron theory challenges the notion that space-time [[Principle of locality|locality]] and [[unitarity (physics)|unitarity]] are necessary components of a model of particle interactions. | 0.29

*****************
I think Subatomic particles because when the volume of the amplituhedron is calculated in the [[1/N expansion|planar limit]] of  [[N=4_super_Yang-Mills |''N''&nbsp;=&nbsp;4 ''D''&nbsp;=&nbsp;4 supersymmetric Yang–Mills theory]], it describes the [[scattering amplitude]]s of [[subatomic particles]]. | 0.29

*****************
I think Toy theory because since the planar limit of the ''N''&nbsp;=&nbsp;4 supersymmetric Yang–Mills theory is a [[toy theory]] that does not describe the real world, the relevance of this technique for more realistic quantum field theories is currently unknown, but it provides promising directions for research into theories about the real world. | 0.27

Trying: Projective space,
Trying: On shell and off shell,
Trying: Polytope,
Trying: Geometry,
Trying: Convex polytope,
Trying: Simplex,
Trying: Algebraic geometry,
Trying: Principle of locality,
Trying: Unitarity (physics),
Trying: Subatomic particles,
Trying: 1/N expansion,
Trying: Toy theory,
Choices: (Simplex, 0.735697974634171)

I pick "Simplex"

*****Trying: Simplex*****

*****************
I think Categorical distribution because in probability theory, the points of the standard ''n''-simplex in $(n+1)$-space are the space of possible parameters (probabilities) of the [[categorical distribution]] on ''n''+1 possible outcomes. | 0.37

*****************
I think Hasse diagram because the [[Hasse diagram]] of the face lattice of an ''n''-simplex is isomorphic to the graph of the (''n''+1)-[[hypercube]]'s edges, with the hypercube's vertices mapping to each of the ''n''-simplex's elements, including the entire simplex and the null polytope as the extreme points of the lattice (mapped to two opposite vertices on the hypercube). | 0.3

*****************
I think Convex hull because
Specifically, a '''''k''-simplex''' is a ''k''-dimensional [[polytope]] which is the [[convex hull]] of its ''k''&nbsp;+&nbsp;1 [[Vertex (geometry)|vertices]]. | 0.3

*****************
I think Polytope because
Specifically, a '''''k''-simplex''' is a ''k''-dimensional [[polytope]] which is the [[convex hull]] of its ''k''&nbsp;+&nbsp;1 [[Vertex (geometry)|vertices]]. | 0.3

*****************
I think Determinant because where each column of the ''n''&nbsp;×&nbsp;''n'' [[determinant]] is the difference between the [[vector (geometry)|vectors]] representing two vertices. | 0.3

*****************
I think Pythagorean theorem because  It can be calculated from the first property using the [[Pythagorean theorem]] (choose any of the two square roots), and so the second vector can be completed:
| 0.3

*****************
I think Probability theory because especially in numerical applications of [[probability theory]] a [[Graphical projection|projection]] onto the standard simplex is of interest. | 0.29

*****************
I think Graphical projection because especially in numerical applications of [[probability theory]] a [[Graphical projection|projection]] onto the standard simplex is of interest. | 0.29

*****************
I think Generalized barycentric coordinates because these are known as [[generalized barycentric coordinates]], and express every polytope as the ''image'' of a simplex: $\Delta^{n-1} \twoheadrightarrow P. | 0.29 ***************** I think Dot product because The second property means the [[dot product]] between any pair of the vectors is [itex]-1/n$. | 0.28

Trying: Categorical distribution,
Trying: Hasse diagram,
Trying: Convex hull,
Trying: Vertex (geometry),
Trying: Polytope,
Trying: Vector (geometry),
Trying: Determinant,
Trying: Pythagorean theorem,
Trying: Probability theory,
Trying: Graphical projection,
Trying: Generalized barycentric coordinates,
Trying: Dot product,
Choices: (Categorical distribution, 0.936616301012039)

I pick "Categorical distribution"

*****Trying: Categorical distribution*****

*****************
I think Statistical inference because this distribution plays an important role in [[hierarchical Bayesian model]]s, because when doing [[statistical inference|inference]] over such models using methods such as [[Gibbs sampling]] or [[variational Bayes]], Dirichlet prior distributions are often marginalized out. | 0.44

*****************
I think Variational Bayes because this distribution plays an important role in [[hierarchical Bayesian model]]s, because when doing [[statistical inference|inference]] over such models using methods such as [[Gibbs sampling]] or [[variational Bayes]], Dirichlet prior distributions are often marginalized out. | 0.44

*****************
I think Prior distribution because   This means that in a model consisting of a data point having a categorical distribution with unknown parameter vector '''p''', and (in standard Bayesian style) we choose to treat this parameter as a [[random variable]] and give it a [[prior distribution]] defined using a [[Dirichlet distribution]], then the [[posterior distribution]] of the parameter, after incorporating the knowledge gained from the observed data, is also a Dirichlet. | 0.44

*****************
I think Mixture model because  [[mixture model]]s and models including mixture components), the Dirichlet distributions are often "collapsed out" ([[marginal distribution|marginalized out]]) of the network, which introduces dependencies among the various categorical nodes dependent on a given prior (specifically, their [[joint distribution]] is a [[Dirichlet-multinomial distribution]]). | 0.41

*****************
I think Uniform distribution (continuous) because   This reflects the fact that a Dirichlet distribution with $\boldsymbol\alpha = (1,1,\ldots)$ has a completely flat shape — essentially, a [[uniform distribution (continuous)|uniform distribution]] over the [[simplex]] of possible values of '''p'''. | 0.41

*****************
I think Binomial distribution because   The [[joint distribution]] of the same variables with the same Dirichlet-multinomial distribution has two different forms depending on whether it is characterized as a distribution whose domain is over individual categorical nodes or over multinomial-style counts of nodes in each particular category (similar to the distinction between a set of [[Bernoulli distribution|Bernoulli-distributed]] nodes and a single [[binomial distribution|binomial-distributed]] node). | 0.41

*****************
I think Joint distribution because   The [[joint distribution]] of the same variables with the same Dirichlet-multinomial distribution has two different forms depending on whether it is characterized as a distribution whose domain is over individual categorical nodes or over multinomial-style counts of nodes in each particular category (similar to the distinction between a set of [[Bernoulli distribution|Bernoulli-distributed]] nodes and a single [[binomial distribution|binomial-distributed]] node). | 0.41

*****************
I think Multinomial distribution because </ref> This imprecise usage stems from the fact that it is sometimes convenient to express the outcome of a categorical distribution as a "1-of-K" vector (a vector with one element containing a 1 and all other elements containing a 0) rather than as an integer in the range 1 to ''K''; in this form, a categorical distribution is equivalent to a multinomial distribution for a single observation (see below). | 0.4

*****************
I think Posterior predictive distribution because the [[posterior predictive distribution]] of a new observation in the above model is the distribution that a new observation $\tilde{x}$ would take given the set $\mathbb{X}$ of ''N'' categorical observations. | 0.4

*****************
I think Gibbs sampling because   For example, in a [[Dirichlet-multinomial distribution]], which arises commonly in natural language processing models (although not usually with this name) as a result of [[collapsed Gibbs sampling]] where [[Dirichlet distribution]]s are collapsed out of a [[Hierarchical Bayesian model]], it is very important to distinguish categorical from multinomial. | 0.4

*****************
I think Dirichlet distribution because   For example, in a [[Dirichlet-multinomial distribution]], which arises commonly in natural language processing models (although not usually with this name) as a result of [[collapsed Gibbs sampling]] where [[Dirichlet distribution]]s are collapsed out of a [[Hierarchical Bayesian model]], it is very important to distinguish categorical from multinomial. | 0.4

*****************
I think Collapsed Gibbs sampling because   For example, in a [[Dirichlet-multinomial distribution]], which arises commonly in natural language processing models (although not usually with this name) as a result of [[collapsed Gibbs sampling]] where [[Dirichlet distribution]]s are collapsed out of a [[Hierarchical Bayesian model]], it is very important to distinguish categorical from multinomial. | 0.4

Trying: Statistical inference,
Trying: Variational Bayes,
Trying: Prior distribution,
Trying: Mixture model,
Trying: Uniform distribution (continuous),
Trying: Binomial distribution,
Trying: Joint distribution,
Trying: Multinomial distribution,
Trying: Posterior predictive distribution,
Trying: Gibbs sampling,
Trying: Dirichlet distribution,
Trying: Collapsed Gibbs sampling,
Choices: (Dirichlet distribution, 1.94965859296322)

I pick "Dirichlet distribution"

*****Trying: Dirichlet distribution*****

stopping here﻿
4

I'd say always better to be working on it, but I'm guilty of the same﻿
People
In his circles
345 people
Have him in circles
4,711 people
Work
Skills
Information Synthesist
Story
Tagline
For me, building software is like sculpting. I know what is there but I just need to get rid of all the annoying rock that is in the way
Introduction
I like trying to write

I post now, mostly as a duplicated devlog on a project of mine whose goal is an intelligence amplification tool as inspired by the visions of Engelbert, Vannevar Bush and Licklider. I am, in order of skill, interested in:
1. Functional Programming
2. Machine Learning,
3. Artificial Intelligence
4. Mathematics
5. Computation Theory
6. Complexity Theory
7. bioinformatics
8. Physics
9. neurobiology
I also super interested in sustainable Energy, synthetic biology and the use of technology to improve human living.

I believe the proper way to understand quantum mechanics is in terms of a Bayesian probability theory and that the many world interpretation is the way it applies to the universe physically. Still trying to find a philosophically synergistic combo.

I also do bballing and bboying/breaking/"breakdance".

I have some "hippie" beliefs like Dolphins are persons. All dolphins, whales great apes, elephants and pigs should not be eaten, murdered or kept in captivity. I would really like to see the results of giving dolphins an appropriate interface to internet access.

Spent some time solving bioinformatics problems on Rosalind. It's a Project Euler for bioinformatics. Try it out if you enjoy algorithms and what to get some idea of biotech http://rosalind.info/users/deen.abiola/

Favourite Books: Chronicles of Amber, Schild's Ladder, Diaspora, Permutation City, Blindsight, Ventus, Peace Wars, Marooned in Realtime, A Fire Upon Deep, Accelerando, Deathgate Cycle, MythAdventures, A Wizard of Earthsea, Tawny Man Trilogy, The Mallorean, The Riftwar Cycle  and Harry Potter
Basic Information
Gender
Male
Deen Abiola's +1's are the things they like, agree with, or want to recommend.
 DUAL TRACE [creative studio]plus.google.comWe create games and apps for all platforms. We create art and music. We express ourselves. We try to make the world a more beautiful place.
 Mutant flu paper is finally published, reveals pandemic potential of wil...feedproxy.google.comEvolution | It’s finally out. After months of will-they-won’t they and should-they-shouldn’t-they deliberations, Nature has finally publishe
 A duplicated gene shaped human brain evolution… and why the genome proje...feedproxy.google.comEvolution | The Human Genome Project was officially completed in 2003, but our version of the genome is far from truly complete. Scientists
 A review of openSNP, a platform to share genetic data « Genomes Unzippedfeedproxy.google.comI initially came across openSNP when the team won in late 2011 the PLoS/Mendeley binary battle. This competition was open to software that i
 I’ve got your missing heritability right here…wiringthebrain.blogspot.comThis blog will highlight and comment on current research and hypotheses relating to how the brain wires itself up during development, how th
 Startup lets you start your own cell phone company, in minuteswww.seattlepi.comBusiness is slow so far: Since the April launch, Farthing has signed up two subscribers, himself and his son. If I get up to 50, I'll be hap
 Dolphins and Whales Engage in Rare Interspecies Play (Video)www.treehugger.comBiologists have recorded several incidents of what appears to be wild humpback whales and bottlenose dolphins getting together for some play
 New paper on repetition priming and suppressionsciencehouse.wordpress.comA new paper by Steve Gotts, myself, and Alex Martin has officially been published in the journal Cognitive Neuroscience: Stephen J. Gotts, C