Profile

Cover photo
Lex Spoon
Works at Semmle
Attended Clemson University
604 followers|90,596 views
AboutPostsPhotosYouTubeReviews

Stream

Lex Spoon

Shared publicly  - 
 
I used to say this, and I even started converting machines over. Nowadays I'm not so sure.

As Brad says, the sysadmin costs are not just heavy, but error-prone. There's a new risk factor that you will lose data by botching a command somewhere.

Moreover, some of the best ways to improve the sysadmin costs involve making the system no longer be explicitly RAIDed. In particular, you get most of the advantage if you have a RAID inside the drive package, rather than having five separate drive packages that are all independently plugged in. To the extent higher reliability drives become important, I would think the place to start is by buying higher quality drives that have internal redundancy of various kinds.

An additional complication is that you only benefit from the error recovery if you have procedures in place to notice the failure and do something about it. Both the noticing, and the doing something about it, are something a lot of people are not going to bother with in a lot of contexts.

The main issue, though, is that in most cases, drive failure isn't even the biggest source of risk for data loss. It's much more likely that you will issue a stray delete command than that your SSD will fail at the wrong time! To defend against this more likely case, you end up wanting some sort of distributed backup system, such as Gmail for your email, or GitHub for your source code. Once you do that, though, you already have such a good data protection system that you no longer gain much benefit by RAIDing your local storage.

On the flip side, RAID remains excellent for performance. If you want to make your Witcher 3 levels load faster, then RAID could help you.

1
Add a comment...

Lex Spoon

Shared publicly  - 
 
Two little things I wish Java would add
When geeking out about language design, it's tempting to focus on the things that require learning something new to even understand how it works. SAM types require understanding target typing, and type members require understanding path-dependent types. Fun...
When geeking out about language design, it's tempting to focus on the things that require learning something new to even understand how it works. SAM types require understanding target typing, and type members require underst...
2
Mitch Blevins's profile photo
 
Yup, both of those. Plus string interpolation.
Add a comment...

Lex Spoon

Shared publicly  - 
 
"More or less ACID compliant". That tickles me more than it should. In fairness, the authors are probably just more honest than most.
Features. Small, ~100KB .jar file with no dependencies. Top level API similar to working with any Map on to of core index. Thread safe and write locking over multiple JVMs using extended Lucene locks. Optionally transactional. More or less ACID -compliant.
1
Add a comment...

Lex Spoon

Shared publicly  - 
 
The given input on this question is clearly an abuse of the intended use of Unicode, but there's no clear, standards-supported way to either rule out the bad input or to display it in a sensible way. In fact, the authors of the software in question have made their software worse by dutifully following the standard and doing what it says. Now there are Stackoverflow questions about how to bypass this "help" and force text to act like it's supposed to.

I don't have a full answer, unless it's just "don't allow combining marks, and just live with the limitations". When you interact with computer software, you just have to live with its limitations.
 ·  Translate
1
Add a comment...

Lex Spoon

Shared publicly  - 
 
It's curious that being fixed-width (meaning 16-bit) was a defining characteristic of Unicode during its formative decade or so. Nowadays Unicode is not fixed-width in any sense; the 16-bit version has surrogate characters, the 8-bit version has taken over as an interchange format, and the 8-bit format even has a strong showing as an in-memory format. As well, the combining marks are an even deeper violation of the fixed-width principle. It seems like we have exactly the horror that the earlier designers said they were going to fix for everyone.

It's just mind-bending to write correct Unicode-processing code if you worry about surrogates, local-specific comparison, and combining marks. I yearn for a simplified Unicode that can't do everything but that does have standard character codes for most of the world's languages. Computer software processes text all over the place, and it's unfortunate if you have to choose between 7-bit ASCII and a format that feels like a word processor's internals.
Early Years of Unicode. The Unicode® Standard "…begin at 0 and add the next character". Antecedents and The First Year - 1988. The concept of a 16-bit universal code was not new. Even the original 1984 principles of the ISO multi-byte character encoding are directed to that end: ...
1
Brian Slesinsky's profile photoDominic Mitchell's profile photoLex Spoon's profile photo
3 comments
 
I agree UTF8 is awesome. I find combining marks pretty miserable. So I don't simplistically mean that Unicode should have stuck to the fixed-width idea. I just thought it an interesting historical quirk that fixed-width was such a defining characteristic of Unicode compared to its competitors, only to be immediately abandoned by the team that codified Unicode 1.0.
Add a comment...

Lex Spoon

Shared publicly  - 
 
One thing that has changed since then is that UTF-8 has taken over as the standard interchange format.  Because of that, it's no longer a big deal to use the same type for strings and for byte arrays.

Maybe Python 4 can go back to that. In our strange world, it might be better for adoption of Python 4 if it is designed for the 2-to-4 upgrade path rather than the 3-to-4 upgrade path.
Also, clicking the slide images will jump into the full presentation at that point. The Symbola font is included, but will have to be downloaded before some of the special symbols will appear. Pragmatic Unicode~ or ~How Do I Stop the Pain? Hi, I'm Ned Batchelder. I've been writing in Python for ...
1
1
Joel Webber's profile photo
 
This is one thing I think Go got right, mostly by accident. Everything's basically a `[]byte`, with `string` being special only in that it's an immutable UTF-8 `[]byte`. If you don't care about the encoding, you get random access to bytes in the slice. If you do care about encoding, you have to walk the characters in order, but that's trivially easy to do. The Go team made some noise early on about this being "not good enough", but so far I've found it easy and efficient to work with.

The fact that "slice" is built-in, and ubiquitous, makes it easy to deal with these things efficiently, e.g., when getting strings from binary data and you don't want to copy them all over the place. Compare this with, say, the Eclipse CDT, which uses an insane collection of things built on Java `byte[]`s to avoid being overwhelmed by string munging costs.
Add a comment...
Have him in circles
604 people
Thomas Broyer's profile photo
Emily Crutcher's profile photo
shalu kashyap's profile photo
Amit Manjhi's profile photo
Rick Elliott's profile photo
Ariel Feinerman's profile photo
Sean DeNigris's profile photo
Kris Nuttycombe's profile photo
Stefan Marr's profile photo

Lex Spoon

Shared publicly  - 
 
Aside from the immediate problem of long paths being discussed here, it's not good for a module system to mix multiple versions of the same module. Instead of trying to do fancy deduplication to reduce the number of versions of each module that exist, I would be tempted to go cold turkey and say that in the next version of NPM, you get one and only one version of each module.
1
Philippe Lhoste's profile photoLex Spoon's profile photo
2 comments
 
Yes, the conflicts have to be solved somewhere. I believe a major factor in which systems this is easy to do in, and which ones it's crazy hard and never really adequately solved anyway, is whether or not your system includes some kind of "distribution" in it. Distributions serve the important role of helping developers understand which compatibility problems are important to solve; if you look at any given compatibility problem (so could I make this thing work with Lodash 4?), it's usually not too hard.
Add a comment...

Lex Spoon

Shared publicly  - 
 
This will be nice for Semmle if it works as advertized. Aside from the general benefits of compatibility (we can design for Linux and then port), Bash in particular is just a reasonable scripting language. Cmd.exe is not.
1
Brian Slesinsky's profile photoLex Spoon's profile photo
2 comments
 
The main trouble with Powershell is it's Windows specific, so at least for us, it's never been a tempting implementation language for anything. The code samples I've seen online do look nice, though. I'm sure if you write native Windows apps all day, you would find Powershell very tempting and use it all the time.

Also, if by "script" you mean reading in text files, putting them in dictionaries, and running simple computations across them, then Bash is terrible and Python is awesome. Probably Ruby and Lua as well; I just haven't used them much.

There's a certain kind of scripting, though, where you run commands, set optional command-line options, set up configuration files using string replacement, and other CLI kind of things. For those, Bash seems really nice, at least in my experience.
Add a comment...

Lex Spoon

Shared publicly  - 
 
It's a familiar effect. In every conference I've gone to--program analysis, optimization, development tool--every single paper advertises that the documented technique was successful. People steeped in academia could be forgiven if they observe this and think that the rest of the world is just uneducated that all these goodies would not be picked up. Then again, people steeped in academia know full well the kind of shenanigans that go on, so they might be even more cynical than I imagine, and just play the game anyway.

As a familiar example from my world, people working toward a static analysis paper almost always modify the analysis algorithm for a particular code base being studied, and then report the final results on the improved algorithm. You can almost always make an algorithm look really good under conditions like that.

It must be really cheery to live in that world. Every day you come into the office, and in every direction, all that you survey and all that you read about is working great. 11/10--exceeds expectations!
This year's Economic Report of the President has a chapter on improving outcomes for disadvantaged children. It surveys the literature and finds that, in short, everything works. There is not a sin...
3
Aaron Novstrup's profile photoLex Spoon's profile photo
2 comments
 
Ah, well, I have a more easy-going view of it all nowadays. Everyone who is seriously involved in technology growth just doesn't do anything like the academic peer-review process. There's no obvious value to it and lots and lots of obvious problems with it, so they just don't do it.

Meanwhile, schools are just wonderful institutions. How could I not like them? And if professors want to further their own development, and go to conferences to compare notes? Again, that's great. All that said, I no longer like the glamorized version of academic research that I grew up with, any more than I like the glamorized view of real-world journalism that doesn't even seem like an approximation of the real world. It seems better to try and understand what real people are up to than to persist with these pretty myths.
Add a comment...

Lex Spoon

Shared publicly  - 
 
I know that Windows support is hard, but it's a real blow to basic software development if you can't trust your Git checkout to have binary correct files. I really dislike the autocrlf setting.

What I learned today is that one of the authors of msysGit not only agrees, but moreover is unimpressed by the implementation of the feature. That's interesting to me because in my world, msysGit is the main reason people get interested in this setting. When you install msysGit, it strongly encourages you to turn the feature on, even though Linus himself left it off by default. People trust that msysGit is looking out for them and so turn it on.

I'm very curious that msysGit is still doing this, given its authors don't seem in agreement about it. I would think there's a meta-default that if you are repackaging someone else's software (Git), you should leave its defaults alone without a clear reason otherwise. If they don't agree themselves, then the reason for overriding Linus's choice can't be all that clear.

More broadly, I wish there was a more general interest in removing CR pollution from the world. A lot of people just seem to assume that we've inherited this problem and so will always have it. Look what has happened with text file formats, though. Ten years ago, it was just assumed that any file processing text must come along with some out-of-band option to specify the particular text encoding to use. Nowadays, software assumes UTF-8 by default, and it's really caught on.

Speaking of character encodings, would anyone like to have a core.autoutf8 option on Git? After all, CP-1252 is the standard on Windows, isn't it? Maybe I should not ask, because there might be people out there who would say yes, that sounds great.

Digression aside, I think the same could happen with text files. People keep writing code that emits CR characters out of a misguided assumption that it's helping things. At this point, though, Windows has joined the Internet, so even on Windows, to a first approximation it doesn't matter what line endings a given file uses. Most things don't care, and a few things will work better, and a few things will work worse. Generating CR characters now doesn't usually help the software doing it (does it?), but it definitely contributes to an overall problem in the global computer ecosystem.
core.autocrlf considered half-assed. Date: Sat, 6 Mar 2010 00:23:33 +0100 (CET); From: Johannes Schindelin ; Subject: core.autocrlf considered half-assed. Hi, back then, I was not a fan of the core.autocrlf support. But I have to admit that in the meantime, ...
4
Joel Webber's profile photoPhilippe Lhoste's profile photo
2 comments
 
That's something that should be handled by the editor (auto-detect line ending), not by some filter behind the back of the coder... Same for auto-formatting and co.
Add a comment...

Lex Spoon

Shared publicly  - 
 
I quite enjoyed this read. I have often heard the tales about "limers" on English ships eating a lime a day (actually it was lemon juice...), but I had wondered how that policy had gotten established. In fact it was a torturous 40-year process with many turns, and it was later completely undone when the navy penny pinched on its source of lemons and thereby lost the benefits. The particular form of penny pinching was, most curious of all, to switch to actual limes, instead of just lemons, which they had been calling limes.
Atkinson inclined to Almroth Wright's theory that scurvy is due to an acid intoxication of the blood caused by bacteria... There was little scurvy in Nelson's days; but the reason is not clear, since, according to modern research, lime-juice only helps to prevent it. We had, at Cape Evans, ...
2
Add a comment...

Lex Spoon

Shared publicly  - 
 
Here's another thread that makes me think we are all building on top of sand.

If you store a timestamp in UTC, you at least know unambiguously what timestamp is being referred to. You can then fix up the timezone you want in later layers of the software. If you store a timestamp in local time, then you have nowhere to start from. You don't generally know what locale was used to save it, and worse, if that locale used daylight savings, then there are certain time values that refer to two different points in time.

I'm surprised a database implementer would suggest and encourage users to store data in local time. Given all the confusion around this topic, my own included, it's probably best to avoid the SQL "timestamp" type. The number of assumptions you have to vet out before it will work is prohibitive compared to storing a UTC time as an integer.
1
Ray Cromwell's profile photoLex Spoon's profile photoJohn A. Tamplin's profile photoBrian Slesinsky's profile photo
7 comments
 
Even just recording event timestamps can be tricky. How much do you want to trust the client's clock? For consistency, it's often a good idea to always use the database's clock, but this doesn't work for offline operation.
Add a comment...
People
Have him in circles
604 people
Thomas Broyer's profile photo
Emily Crutcher's profile photo
shalu kashyap's profile photo
Amit Manjhi's profile photo
Rick Elliott's profile photo
Ariel Feinerman's profile photo
Sean DeNigris's profile photo
Kris Nuttycombe's profile photo
Stefan Marr's profile photo
Collections Lex is following
Education
  • Clemson University
  • Georgia Institute of Technology
Story
Tagline
Software engineer at Semmle
Work
Occupation
software engineer
Employment
  • Semmle
    2012 - present
  • LogicBlox
    2012
  • Google
  • IBM
  • EPFL
Basic Information
Gender
Male
We've always had good experience with Estes, both for repair and for sales. Thanks to Raymond, Patrick, and Jonathan for excellent repairs and installation, and thanks to Chris on sales for telling me what I need to know but skipping any hard-sell tactics.
Public - a year ago
reviewed a year ago
They have excellent pizza and a good selection of beer. Definitely try the "monster slice". It is cooked to order and is accurately named.
Food: ExcellentDecor: GoodService: Excellent
Public - 2 years ago
reviewed 2 years ago
2 reviews
Map
Map
Map