Profile cover photo
Profile photo
Matthew Brett
171 followers
171 followers
About
Posts

Post has attachment
The NHS sues a small volunteer organization making a Linux distribution for use by the NHS:

https://www.openhealthhub.org/t/nhos-closedown-the-final-straw/1385
Add a comment...

The language hygiene initiative

You can fine me, if I use any of these ugly phrases:

https://gist.github.com/matthew-brett/639c879af9c7426f0b9d3e1c8952c9c8
Add a comment...


""""
Speaking to a steelworker at the White House in March, Trump informed the man: “Your father, Herman, he’s looking down, and he’s very proud of you right now.”

“Oh, he’s still alive,” the steelworker said.

“Then he’s even more proud of you,” Trump said.
"""

https://www.washingtonpost.com/opinions/trump-hears-from-the-dead-and-they-like-his-tax-policy/2018/07/20/716647b6-8c13-11e8-85ae-511bc1146b0b_story.html?utm_term=.a2e8d5fb34da
Add a comment...

Post has attachment
"""
What is Data Science?

Data Science is about drawing useful conclusions from large and diverse data sets through exploration, prediction, and inference.
"""

https://www.inferentialthinking.com/chapters/01/what-is-data-science

That sounds right to me.
Add a comment...

I call bullshit on this email description of changes in Paypal's legal policy.

"""
We’re making some changes to our Legal Agreements, the documents that govern our relationship with you, so that we can continue to make PayPal even more secure, quick and easy to use.
"""

Now have a look at https://www.paypal.com/gb/webapps/mpp/ua/upcoming-policies-full and see if you think that they can reasonably be summarized as making "Paypal even more secure, quick and easy to use".
Add a comment...

I've been reflecting from time to time on the mystery of Hadley Wickham's "Readings in Applied Data Science" at Stanford:
https://github.com/hadley/stats337

The mystery is the only not-optional reading for the first week on "What the *&!% is data science?". It's a very short blog post with title "Data scientists mostly do arithmetic and that's a good thing": https://m.signalvnoise.com/data-scientists-mostly-just-do-arithmetic-and-that-s-a-good-thing-c6371885f7f6 . Why this short throwaway thing as the main reading, rather than, say, one of the optional readings like Donoho's big and thoughtful "50 Years of Data Analysis" (https://courses.csail.mit.edu/18.337/2015/docs/50YearsDataScience.pdf).

I think I understand it now. The question that the blog post raises is a deep one. If many data scientists are doing arithmetic, what the *&!% are courses on data science going to teach?

At the same time, I finally read "Data scientist: the sexist job of the 21st century" (https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century).

The impression that I came away with, is that data scientists, in industry, are "data hackers", the data equivalent of the hacker movement in computing. They improvise, they invent, they build, they share.

Why did that happen now? Back to Hadley Wickham. It is because we have the tools now - particularly R and Python. It was technically possible to do this before, but it was too difficult and consuming of time and technical effort. The analyst was buried in technical problems, making to harder for them to think, and taking away mind space for working on new problems. These languagues have developed to the stage where they have made these problems vastly easier to solve and to explain. The result is an explosive growth in the range of tasks that data analysts can work on. Data scientists are the people who found these tools, and saw how they could apply them. Now let's work out what we should teach.
Add a comment...

James Shaw Jr. on why he wrestled an assault rifle out of the hands of a man killing indiscriminately in a Tennessee waffle house.

""""
I did that completely out of a selfish act. I was completely doing it just to save myself. Now, me doing that, I did save other people. But I don't want people to think that I was the Terminator, or Superman or anybody like that. It was just, I figured if I was going to die, he was going to have to work for it.
""""

https://www.npr.org/sections/thetwo-way/2018/04/23/604879633/im-not-a-hero-says-james-shaw-jr-acclaimed-hero-of-waffle-house-attack
Add a comment...

Post has attachment
"""
The dirty little secret of the ongoing “data science” boom is that most of what people talk about as being data science isn’t what businesses actually need. Businesses need accurate and actionable information to help them make decisions about how they spend their time and resources. There is a very small subset of business problems that are best solved by machine learning; most of them just need good data and an understanding of what it means that is best gained using simple methods.
"""

https://m.signalvnoise.com/data-scientists-mostly-just-do-arithmetic-and-that-s-a-good-thing-c6371885f7f6
Add a comment...

The Dunning-Kuger effect is "a cognitive bias wherein people of low ability suffer from illusory superiority, mistakenly assessing their cognitive ability as greater than it is".

https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect

The basic observation is that people who objectively rank in the bottom 25% for performance, estimate their rank as above average.

Another consistent finding is that people in the top 75% tend to underestimate their rank.

This is rather satisfying, but it it just an artifact? Joachim Krueger and Ross Mueller [1] point out that there is some randomness in the objective measure of performance, and in the estimate of performance. There is also a well-known "better-than-average" (BTA) effect, where people assess themselves as better than average over a wide range of tasks. The combination of BTA and regression to the mean can explain the Dunning-Kruger effect.

For example, imagine that people are generally very poor at estimating their own performance, so the values that they give are essentially random. Imagine that everyone, regardless of their ability, has a tendency to rate themselves better than average, say, at the 60% centile. On this model, all groups will tend to rate themselves at about 60%. The poor performers rate themselves at 60%, but of course they are making a big overestimate, because of the BTA effect. The top performers rate themselves at 60% but they are making a small underestimate, because of the BTA effect.

Joyce Ehrlinger et al [2] try to address this criticism, but I cannot see how their data can get round this rather fundamental problem.

[1] Krueger, Joachim, and Ross A. Mueller. "Unskilled, unaware, or both? The better-than-average heuristic and statistical regression predict errors in estimates of own performance." Journal of personality and social psychology 82.2 (2002): 180.

[2] Ehrlinger, Joyce, et al. "Why the unskilled are unaware: Further explorations of (absent) self-insight among the incompetent." Organizational behavior and human decision processes 105.1 (2008): 98-121.
Add a comment...

"We know now that we get it wrong an awful lot of the time".

A lawyer explains why the Texas legal system has become more open to the idea of wrongful convictions. It reminded me of discussions about failures to replicate scientific papers.

"""
[Keith Hampton]: I remember the evolution over the past fifteen years or so. It started out with, “Oh, we’ve got an exoneration in Dallas, DNA has cleared this guy.” The reaction from the prosecutors was, “Well, that’s just one.” The second and the third one came, and their reaction was, “This is proof that the system works. We’re done now.” Okay, then there were more—by the time you got to nineteen, the dam broke. Everyone’s like, “I think we’ve got a problem here, because this is an awful lot of innocent people.” And you would be hearing so much more, there would be so many more exonerees, had Houston saved the forensic evidence that Dallas did. I think you’d easily see double, just extrapolating from how often they get it wrong. We know now that we get it wrong an awful lot of the time. So that attitude has changed.
"""

https://www.texasmonthly.com/politics/i-believe-its-a-heroic-calling/
Add a comment...
Wait while more posts are being loaded