Profile cover photo
Profile photo
Russ Abbott
10,733 followers -
Applying CS concepts to questions in philosphy.
Applying CS concepts to questions in philosphy.

10,733 followers
About
Posts

Post has shared content
This is terrific.
This is one of the most gameable things I have ever seen.

Not only could this be a trap, it could theoretically be an entire biome.

https://www.youtube.com/watch?v=My4RA5I0FKs
Add a comment...

Post has attachment
“In retrospect Sandy Hook marked the end of the US gun control debate,” Dan Hodges, a British journalist, wrote in a post on Twitter two years ago, referring to the 2012 attack that killed 20 young students at an elementary school in Connecticut. “Once America decided killing children was bearable, it was over.”

Via +Craig Froehle via +Kee Hinckley
Add a comment...

Post has shared content
Add a comment...

Post has shared content

Post has attachment
This graphs X's progress with a fixed alpha parameter of 0.5. In the previous trial (https://plus.google.com/+RussAbbott1/posts/BtUtmsj3RB5), alpha started out high and then decreased as training progressed. For this problem a fixed alpha of 0.5 is much better.

Alpha is the learning rate. It controls how much to weigh new information vs old information. A high alpha say that new information should be given more weight than old information , even though the old information may be based on thousands of games and the new information is just the current game.

Perhaps the reason the current game counts as much as all past games is that the current game is played given the information already drawn from past games. So the current game generates information in the context of a now better player.

After a bit more experimenting it seems that 0.5 is the best constant alpha for this problem. Higher or lower is worse.
Photo
Add a comment...

Post has attachment
This is a continuation of my previous post (https://goo.gl/fhLLXr) on Reinforcement Learning for tic-tac-toe..

This is a graph of the running average of the X player. The red line averages the most recent 250 games. The blue line averages the most recent 2500 games. As you can see the score is converging to 50, which means X won half its games (100) and tied the other half (0). It's interesting to see the acceleration in learning at about 3,000 games.

A follow-up is here: https://plus.google.com/+RussAbbott1/posts/4iKZj1u41Ek.
Photo
Add a comment...

A first reinforcement learning exercise.

As I mentioned, I'm impressed enough with reinforcement learning that I decided to teach a class in it this Fall. OpenAI has what it calls a Gym for playing with reinforcement learning examples. It's free and open source. One writes Python programs to use it. Since I've never used it and since it's been a long time since I've written any Python I decided to start getting used to it. (I think I mentioned this in an earlier post about sys-admin hell.)

In any event, an initial learning example now works. It's a joint effort by me and +Luis Fisher, one of our MS students.

Before getting into the program itself, I want to mention my surprise at how much Python has changed since I last used it. I remember it as a simple, intuitive, easy-to-use language. The language has morphed into a monster. It feels like it was extended and extended so that it now includes everything anyone liked from other languages. All the features are stuck together in a way that appears superficially consistent but has no underlying language principles. In some ways that's nice, but in other ways, it's over-powering. That's quite a contrast from Haskell, which is so much simpler and spare. For example, I found it quite difficult to get around in the documentation. It's not that there is not enough documentation, it's that it's not easy to find stuff. In one case I wanted to look up the join() function. The search system didn't help. I found it by doing a Google search for Python join.

About the program itself, It plays tic-tac-toe. I know, a trivial game, but an interesting first challenge. Reinforcement Learning can be done without neural nets. So to avoid having to work with two new technologies at once, that's how we did it. Let me show you one of the interesting games. In the following, the RL system is playing O, an ad hoc bit of code plays X. It doesn't try to be perfect. It just wins when possible and blocks when necessary.

During training a player makes a move in one of the cells, given the contents of the other cells. The cells are numbered 0 .. 8. The state is listed simply as an array of X, O or blank. The player is not told that it's playing tic-tac-toe, that the board is a square, or anything else. It just moves, is given feedback after a move, and updates its estimates of moves based on that feedback. All it knows is that the state is any array as indicated and that its job is to select a number 0 .. 8 as a move. It's not told, for example, that it must select a number that corresponds to a blank.

Here's a trace of a game. It was played after about 50,000 training games.

Only O's moves are shown. For each move, you see:

a) The Q-values, which indicate the RL system's estimate of the value of the move based on feedback from previous games.

b) The selected move along with the RL parameter values.

c) The board.

The game starts with X making a move, in 3.

The Q-values are shown for the move that O is about to make. They shows that cell 0 has an estimated value of 14.6, cell 1 has an estimated value of 0.6, cell 2 has an estimated value of 6.5. Cell 3 has an estimated value of -74.7 (because it is already taken. In previous games, selecting a cell that is taken results in a large negative reward.). Etc. Cell 6 has the highest value: 33.9. So O moves in cell 6. Then X (foolishly) replies in cell 0.

Q-values: {0: 14.6, 1: 0.8, 2: 6.5, 3: -74.7, 4: 5.5, 5: -1.5, 6: 33.9, 7: 2.5, 8: 2.2}
O {'move': 6, 'alpha': 0.0, 'eps': 0.1, 'reward': -1}
[['X' ' ' ' ']
['X' ' ' ' ']
['O' ' ' ' ']]

It's now Os turn to move. Based on the Q values, you can see that the best move for O is cell 7, which it takes.The fact that cell 7 has a Q-value of 79.1 means that O estimates this to be a good move. (The Q-values range from -100 to +100.) X then takes cell 8 to block O.

Q-values: {0: -97.0, 1: -9.3, 2: -31.9, 3: -85.2, 4: -1.6, 5: -1.5, 6: -62.5, 7: 79.1, 8: 0.5}
O {'move': 7, 'alpha': 0.0, 'eps': 0.1, 'reward': -1}
[['X' ' ' ' ']
['X' ' ' ' ']
['O' 'O' 'X']]

O's best move now is cell 4, with a Q-value of 89. You can see that after this move O has guaranteed a win. (In fact O guaranteed a win with its move in cell 6.) X blocks in cell 1.
Q-values: {0: -51.6, 1: -95.0, 2: -72.3, 3: -57.0, 4: 89.0, 5: -63.3, 6: -46.0, 7: -45.0, 8: -42.9}
O {'move': 4, 'alpha': 0.0, 'eps': 0.1, 'reward': -1}
[['X' 'X' ' ']
['X' 'O' ' ']
['O' 'O' 'X']]

O now wins with cell 2. (It has a Q-value of 100. Since this move wins for O, the reward is 100. All previous rewards were -1 since those moves scored no points.
Q-values: {0: -62.3, 1: -30.8, 2: 100.0, 3: -1.2, 4: 0, 5: 0, 6: -5.2, 7: 0, 8: -10.2}
O {'move': 2, 'alpha': 0.0, 'eps': 0.1, 'reward': 100}
[['X' 'X' 'O']
['X' 'O' ' ']
['O' 'O' 'X']]

What I like about this game is that even though playing O is generally not a likely win, O found a way to win after X wasted its second move.

Since the preceding was all text, it's probably hard to follow what was going on.

I can say, though, that this first experiment turned out quite well. The next post (https://plus.google.com/+RussAbbott1/posts/BtUtmsj3RB5) contains a graph showing how the X player improved over time. (I originally shortened the link to (https://goo.gl/mM8eT4) but Google's link shortening system rejected it!
Add a comment...

Post has attachment
For how many states does the Trump Organization’s online store, TrumpStore.com, collect taxes? 3

For how many states does Amazon collect taxes? For all 45 states that have sales tax. (Five states — Alaska, Delaware, Montana, New Hampshire and Oregon — do not have a sales tax.) Amazon collects tax for the District of Columbia as well.

Some states, including Indiana, Maine and Ohio, have passed legislation requiring online retailers to collect sales tax regardless of physical presence. The Trump Organization [does not] collect tax for orders sent to those states. It therefore appears to be in violation of the laws of these states.

Looks like Trump should be tweeting insults at his own company.
washingtonpost
washingtonpost
washingtonpost.com
Add a comment...

Post has shared content
Good question.

Well, Trump, Is this guy an animal? What do you say?
Add a comment...

Post has attachment
Michael Pollan, who is known for his work on food, has now written a book on the use of psychedelics in psychological therapy.The following is from a long article (click through for the whole thing), which itself is from his book on psychedelics.

The NYT has a (not especially insightful) review of Pollan's book (https://goo.gl/1HSn8v).

I would have preferred to have my own guided psilocybin session aboveground in the reassuring confines of a medical institution, but the teams at Hopkins and N.Y.U. weren’t currently working with so-called healthy normals (do I flatter myself?) — and I could lay claim to none of the serious mental problems they were studying. I wasn’t trying to fix anything big — not that there wasn’t room for improvement. Like many people in late middle age, I had developed a set of fairly dependable mental algorithms for navigating whatever life threw at me, and while these are undeniably useful tools for coping with everyday life and getting things done, they leave little space for surprise or wonder or change. After interviewing several dozen people who had undergone psychedelic therapy, I envied the radical new perspectives they had achieved. I also wasn’t sure I’d ever had a spiritual experience, and time was growing short. The idea of “shaking the snow globe” of my mental life, as one psychedelic researcher put it, had come to seem appealing.

In Mary, I had found an underground guide with whom I felt comfortable. Mary’s approach, in terms of dosage, also happened to approximate the aboveground experience, though she worked with whole mushrooms rather than the capsules of synthetic psilocybin used in the university trials.

Mary, speaking softly, asked if I wanted “a booster.” I sat up to receive another mushroom, for a total of about four grams. Mary was kneeling next to me, the mushroom in her upturned palm, and when I finally looked up into her face, I saw she had turned into María Sabina, the Mazatec curandera whom I had read about. Sixty years ago, Sabina gave psilocybin mushrooms to R. Gordon Wasson, supposedly the first Westerner to try them, in a dirt-floored basement of a thatch-roofed house in the remote mountains of Oaxaca. Mary’s hair was now black; her face, stretched taut over its high cheekbones, was anciently weathered; and she was wearing a simple white peasant dress. I took the desiccated mushroom from the woman’s wrinkled brown hand and looked away as I chewed; I didn’t think I should tell Mary what had happened to her.

When I put my eyeshades back on and lay down, I was disappointed to find myself back in computer world, but something had changed, no doubt a result of the stepped-up dose. Whereas before I navigated this landscape as myself, taking in the scene from a perspective recognizable as my own, with my attitudes intact (highly critical of the music, for instance), now I watched as that familiar self began to fall apart before my eyes, gradually at first and then all at once.

“I” now turned into a sheaf of little papers, no bigger than Post-its, and they were being scattered to the wind. But the “I” taking in this seeming catastrophe had no desire to chase after the slips and pile my old self back together. No desires of any kind, in fact. And then I looked and saw myself out there again, but this time spread over the landscape like paint, or butter, thinly coating a wide expanse of the world with a substance I recognized as me.

But who was this “I” that was able to take in the scene of its own dissolution? Good question. It wasn’t I, exactly. Here the limits of our language become a problem: In order to completely make sense of the divide that had opened up in my perspective, I would need a whole new first-person pronoun. For what was observing the scene was a vantage and mode of awareness entirely distinct from my accustomed self. Where that self had always been a subject encapsulated in this body, this one seemed unbounded by any body, even though I now had access to its perspective. That perspective was supremely indifferent, unperturbed even in the face of what should have been an unmitigated personal disaster. The very category “personal,” however, had been obliterated. Everything I once was and called me, this self six decades in the making, had been liquefied and dispersed over the scene. What had always been a thinking, feeling, perceiving subject based in here was now an object out there. I was paint!

Lots of other things happened in Mary’s room, and in my head, during the course of my journey that day. I gazed into the bathroom mirror and saw the face of my dead grandfather. I trudged through a scorched desert landscape littered with bleached bones and skulls. One by one appeared the faces of the people in my life who had died, relatives and friends and colleagues whom, I was being told, I had failed properly to mourn. I beheld Mary transformed once again, this time into a ravishing young woman in the full radiance of youth; she was so beautiful I had to turn away.
Add a comment...
Wait while more posts are being loaded