there is a small pokédex here
Other stuff of interest:
- Enjoy my personal site, and those of the myriad other contributors.
- Toss some ₽ my way, to support veekun development.
- Stop by the IRC channel (instructions) to chat with some cool nerds and complain when the site is broken.
Someone has generously pledged a pile of money and asked me to write about the War on Drugs. Preliminary research reveals that this is not actually anything to do with programming, which confuses and bewilders me, but I’ll give it a try anyway.
My gut reaction is to say “it’s bad”, on the basis of victimless crime and right to private action and all that, but that doesn’t make for a very interesting post, and there are plenty of thinkpieces along those lines anyway. I’ll try to do a teeny bit of research, so I’m not just paraphrasing Wikipedia.
I realize I don’t actually know why the drug war exists. The country started out with zero laws a few centuries ago, and something must have happened in the meantime for any given law to have been written, right? Again, my gut reaction is to handwave it off as moral panic, but that’s boring and uninsightful.
Turns out federal drug policy is only a hundred years old — the first acts of note were a set of three called the Harrison Narcotics Tax Act, which went into effect in 1915. Being split into three parts makes the legalese even harder to follow, but the gist is that they did their best to crack down on opium (and to a lesser extent, cocaine) use via Congress’s power over commerce.
That’s only a few years before one of the most embarrassing warts in American legal history: the 18th Amendment, banning alcohol throughout the country, took effect in 1920. It’s largely blamed on the religiously-motivated “temperance movement” as well as progressive (!) concerns over the power of saloons and the rate of alcohol-fueled crimes. Oh and also it got tangled up in anti-immigration sentiment, of course. Anyway.
Opium sounds so archaic now that it barely registers as a drug; I think it’s been entirely superseded by heroin. I don’t know a lot about its use a century ago, but a now-archived NIH report mentions that the majority of users (addicts?) were women. And according to a survey from 1878:
The most frequent cause of the opium habit in females is the taking of opiates to relieve painful menstruation and diseases of the female organs of generation.
Apparently this was itself partly because women “were considered less capable of managing painful conditions and thus more in need of medication.” (It goes on; the entirety of page 5 of that report is quotes about this.) It’s considerably ironic, then, that the Harrison acts limited opium use to medicinal purposes only.
It’s also curious that cocaine was included in these laws, seeing as it’s not actually a narcotic — “narcotic” originally meant something that would make you drowsy, hence the shared root with “narcolepsy”, and cocaine is a stimulant. Also, by the time the Act was passed, virtually every state had its own laws (Apparently this handwaving is partly to blame for our modern vague definition of “narcotic”.) Clearly it had made enough of an impact to ruffle some feathers.
I know cocaine was first isolated from the coca leaf in the 1850s, and the coca leaf had been known to be psychoactive for centuries before that, but I’m not really clear when we first realized cocaine is addictive. Surely not in the last half of the 19th century — it’s fairly well-known that Coca-Cola had cocaine for its first few decades, but it was in all manner of off-the-shelf products. Its use as an anesthetic saw it put in toothache drops for children; its reputation as very nearly a miracle drug saw it put in all manner of medicines and even beauty products; and coca wine was a thing.
Of interest: the Sherlock Holmes novels were written in the last decade of the 19th century, and while the novels themselves clearly established Holmes’s cocaine habit as an undesirable thing, that same source mentions that “most Victorians [did not understand] the side effects of drug use”. Also that Freud recommended cocaine as medication for all manner of conditions, including… treating addictions to other drugs.
So I’m actually not sure what happened in the early 1900s to change the popular perception of cocaine from a wonder cure to a huge problem requiring federal action. Dr. Hamilton Wright, the first “Opium Commissioner”, had this to say about it in a New York Times article from 1911:
Of all the nations of the world, the United States consumes most habit-forming drugs per capita. Opium, the most pernicious drug known to humanity, is surrounded, in this country, with far fewer safeguards than any other nation in Europe fences it with.
Well okay, fair enough. Except, um:
Wright was a fanatic racist, announcing that “[i]t is been authoritatively stated that cocaine is often the direct incentive to the crime of rape by the Negroes of the South and other regions.” One of Wright’s favored authorities was Dr. Christopher Koch of the State Pharmacy Board of Pennsylvania. Koch testified before Congress in 1914 in support of the Harrison Bill, shortly to pass into law as the first criminalization of drug use. Sad Koch: “Most of the attacks upon the white women of the South are the direct result of a cocaine-crazed Negro brain.” At the same hearing, Wright alleged that drugs made blacks uncontrollable, gave them superhuman powers and prompted them to rebel against white authority. These hysterical charges were trumpeted by the press, in particular the New York Times, which on February 8, 1914, ran an article by Edward Hunting Williams reporting how Southern sheriffs had upped the caliber of their weapons from .32 to .38 in order to bring down black men under the influence of cocaine. The Times‘s headline for the article read, “Negro Cocaine ‘Fiends’ Are New Southern Menace: Murder and Insanity Increasing Among Lower-Class Blacks.”
Jesus fucking christ.
And what a surprise: it seems that by every measurement we can reasonably make, black users of cocaine and opiates were significantly in the minority. The same article mentions that cocaine use dropped precipitously across the board after 1907, coinciding with the Pure Food and Drug Act, which required medicine labels to list addictive ingredients like cocaine and opium. (Before this, there were no requirements for listing ingredients at all; the FDA came out of this very Act.)
Another choice quote:
One of the most unfortunate phases of smoking opium in this country is the large number of women who have become involved and were living as common-law wives or cohabitating with Chinese in the Chinatowns of our various cities.
So we put cocaine in our fucking soda and prescribed opium to women because they were delicate flowers, then decided we didn’t like it so we blamed it all on black and Chinese people who are clearly coming after our poor white women who are again delicate flowers.
I shouldn’t be surprised. It’s an American political tradition, after all: if you want to convince people to vote against a thing that they’re doing or benefiting from, just tie in some thinly-disguised racial fears. Want to outlaw cannabis too? Just refer to it as “marihuana”. Feel really strongly about cutting welfare? Extend the stereotype of black people as lazy and invent the idea of the black welfare queen; lo and behold, plenty of districts that rely heavily on food stamps will vote for the party trying to cut them. Want to go to war, or worried about immigration? Ho ho, those practically write themselves.
I might be a little cynical.
The Anti-Drug Abuse Act of 1986 added some mandatory minimum sentences, including 5 years for possessing 5g of crack cocaine or 500g of, uh, regular cocaine. It’s my understanding that (a) crack is much more likely to have other junk in it (even if it’s just baking soda) whereas powder cocaine is more likely to be pure, so 5g of crack is notably less potent than 5g of not-crack; (b) they are trivially different forms of the same compound, and in particular are equally habit-forming. So it’s especially strange that you have to have a hundred times more of the powder form to earn the same sentence as for the salt form.
You may think this is because crack is vastly more popular among black people, whereas powder is vastly more popular among white people. Yeah, I thought that too, but it’s not true:
More than 80 percent of those sentenced for dealing crack are Black, even though two-thirds of those who use the drug are either White or Hispanic.
It’s not news that drug convictions disproportionately target black people, of course, but this stark contrast means that even the straightforward explanation isn’t right. It’s possible that most black users of cocaine use crack, which is a different statistic than whether most users of crack are black. Or it could just be that there’s no statistical link of interest whatsoever and we all made it up. Because the words rhyme. Or something. Or I might be wrong and race didn’t factor into it even subtly — this LA Times op-ed suggests that when the law was passed, common wisdom held that crack was many times more addictive.
I can’t find a breakdown of crack users by race — a lot of government surveys seem to have stopped separating crack and powder cocaine in recent years. I did find a report on drug use among minorities, and Table 6 lists that powder cocaine is used 4 to 5 times more commonly than crack! In fact, counted separately, crack is the second least-used drug in that table, beaten only by heroin. I’m left completely baffled as to why crack was legislated so much more harshly.
To be fair, this was all fixed five years ago by the Fair Sentencing Act, which increased the minimum for crack cocaine to 28g instead. Haha, just kidding, that doesn’t actually fix it at all. In fact there’s a ridiculous history preceding it. Congress ignored the US Sentencing Commission’s proposal to equalize the thresholds in 1994, apparently the first time they had ever done so. At least ten bills were proposed between then and 2010 that would have reduced or eliminated the disparity, and none of them succeeded. No wonder we ended up only shrinking the gap from 100× to 18×.
I forget what point I was going for here and got lost in statistics. But clearly there are some powerful factors at play here that have nothing to do with the substances themselves. Salt and powder cocaine are the same damn thing, with the same effects and the same risk of dependence. Crack is even made from powder cocaine, so it should be harder to make, thus more expensive, thus less popular. (Which is true!) That it remains impossible to punish people the same way for identical substances shows that a lot of very influential people have some very strong opinions based on something other than what the drugs actually do.
That was the original question, and I’m still fuzzy on the answer.
I won’t pretend that drug use never causes problems. People who are physiologically dependent on something expensive do sometimes inflict violence to get more of it. People do make themselves sick or worse.
But violence is already illegal, and outlawing activities that sometimes lead to other crimes doesn’t sit particularly well with me. And people have every right to harm themselves if they so wish, surely. Reacting to drug use by throwing people in jail just doesn’t make any sense to me. It’s clumsy and naïve and ultimately makes problems worse — like trying to defund Planned Parenthood to reduce the number of abortions, even though Planned Parenthood is also where a lot of people get birth control.
We saw this with American Prohibition, and now Mexico is embroiled in a drug war. How is that different?
“But Eevee,” someone cries, “those are drugs like cocaine and heroin, which are way more harmful—”
Yeah hang on I think it’s time for some charts. Like these ones, which ranked various substances by various forms of harm, and put alcohol as the most harmful, ahead of heroin and crack. (On the other hand, crack is ranked twice as harmful as powder cocaine. Hmm.) Or this one, where alcohol is ranked fifth most harmful, moderately safer than cocaine. Or this chart on active/legal dosage that puts alcohol off to the right, second only to heroin in risk of overdose.
I would hardly call heroin and cocaine safe, but alcohol is more dangerous than you’d think given its availability, and we regard it as little more than a fun beverage. Have a glass of wine with dinner, have a beer to relax in the evening, take shots with your broworkers, cook with vodka.
Did you know that:
…within the first 1 to 2 years of use, an estimated 5 to 6 percent of cocaine users develop the clinical syndrome of cocaine dependence (Wagner and Anthony 2001). An estimated one in six cocaine users had developed cocaine dependence within 10 to 20 years of initial cocaine use (Anthony et al. 1994; Wagner and Anthony 2002).
Six percent? I sure didn’t. Cocaine is generally listed as really really habit-forming, so I was under the impression that you take it a couple times and you are doomed forever. These are not the harrowing statistics I was expecting. Meanwhile, various sources put alcohol dependence at anywhere from 1 in 12 to 1 in 3 adults in the US. That’s not adults who drink; that’s all adults.
The point is not that we should ban alcohol (again), but that our perceptions of drugs are heavily colored by cultural influences. Consider that cigarette smoking has been on a steady decline for decades; not because it’s ever been made illegal, but because it’s becoming less appealing culturally. Or look at the recent push for decriminalizing/legalizing pot — the drug hasn’t gotten any more or less dangerous, but perceptions of it have changed.
So, again, drug policy is largely informed by factors that have nothing to do with the drugs themselves. If we truly only cared about danger, we’d legalize everything safer than alcohol, or at the very least the consistently safest ones like ecstasy and LSD. We certainly wouldn’t keep cannabis on Schedule I, the list of drugs with “no medicinal value but a high potential for abuse”, when it’s frequently used medically and is about as habit-forming as caffeine. (You know, the thing that half the population ingests orally on a daily basis, and that we put in beverages we give to children.)
We might also take note that banning all known recreational drugs drives people to invent unknown ones, like bath salts.
And if our goal is really to reduce dependence on drugs, remember the Rat Park experiment, in which a population of rats in a stimulating happy rat world actively avoided water laced with habit-forming drugs, and isolated rats who had been heavy users of the drugged water gave it up once they were moved into the utopia. Then consider the stereotypes about people who become addicted to illegal drugs. What kind of environments are those people in? Are they happy and stimulated and surrounded by people who love them? Or are they poor, homeless, unemployed, stressed, discriminated against, ostracized, isolated, purposeless, and otherwise miserable?
Maybe we should’ve spent that trillion dollars on making the US a better place to live, rather than on banning the things people use to escape when it’s not. What a fucking concept.
I bought Super Mario Maker a few days ago. I was a little iffy on blowing $60 on a level editor, but I really like level editors, so here we are.
I’ve always liked level editors. They appeal the same way programming does: here’s a blank slate; here’s a bunch of individual components that may interact in interesting ways; see what you can come up with. In many cases level editors express logic via special invisible node objects, which really is programming.
I didn’t do much with it at first. I played a few demo levels, but the need to spam blocks (or make a lot of very bland levels) to unlock all the tools was a bit of a turn-off.
My mind was really on Doom mapping, for whatever reason. I’ve been distantly orbiting the ZDoom world on and off for over a decade, making the occasional wiki edit or giving the odd mapping tip here and there. But in all that time I’ve never made a completed map. I’m pretty sure that’s true even for the loosest possible definition of “completed”, which is “has an exit”.
I made some tweets reflecting on my attempts to build worlds, and realized I have much the same problem as I do in other spheres. I run out of obvious ideas for the big picture, which I hate, because it’s a problem with no fixed solution and no reliable approach for finding one. So I naturally drift towards fiddling with small details, which I do still have ideas for.
This is particularly bad with Doom because it has actual maps, not just tiles. If I put a crate in the corner of a room flush with the walls, I can’t easily move it later once I’ve figured out more about the general map layout — the walls of the room and the edges of the crate are the same line as far as the map format is concerned.
Someone suggested that I use Mario Maker to practice overall world design. I thought that was a pretty good idea.
I started a gist for keeping track of them, though this strikes me as the kind of thing that this, my personal website ought to be able to handle, and I’m gonna beat on Pelican a bit sometime to see if I can make that work.
The first one was inspired by the realization that the two parts of a level (joined by pipes) can use any themes, and I thought it would be interesting to start with a fairly simple-looking grassy level that then transitions to something entirely unexpected. I ended up building an airship and filling it with, well, very airshipesque traps.
Rather than make it linear — go down pipe A, traverse airship, return through pipe B — I decided to make the player circle the airship and return the way they came. This meant that something on the airship had to make it possible to reach the goal when it wasn’t before, and there are only so many ways to do that with what’s available, so I went with giving the player a feather and making them high-jump up to the goal.
I somewhat regret this decision, since it’s kind of mean to make a level unbeatable if the player loses a particular powerup. I mean, part of the point of having a powerup is that you can survive another hit. (Maybe I should’ve given the player a helmet or shoe as well.) It was extra mean since I’d originally intended for the player to backtrack all the way back through the airship, but I discovered that you could skip all that with a single fairly easy jump, so I let that be.
I’ve played 100 Mario Challenge a few times, which subjects you to a random selection of uploaded levels, and I’ve gotten the faint impression that many level authors deliberately try to stop players from using creative alternate routes. I think that’s really against the spirit of Mario, and certainly against the spirit of many of the original levels — how many entire games can you skip with a feather? Or, christ, a P-Wing? Forcing players through your single blessed path does not a grand level make.
Also, being forced to beat my own course before I could upload it was fantastic — I went back and edited it so many times after I’d initially intended to publish it. Some of the pain points I discovered I never would’ve thought of just by looking at the map (or even playtesting from the middle) — for example, thrown wrenches blend into a lot of backgrounds really well, and I removed a few moles entirely because they were causing problems.
The only other thing I regret is that the level is kind of cramped in places, which makes for some jumps that are more awkward than they need to be. There’s not really any good reason for this, and it’s a shame when the intro part is very open. Not leaving enough breathing room (whitespace?) seems to be a fairly common mistake: I see it in a lot of uploaded Mario levels, and I know I have to try hard to resist making every hallway 64 units wide (about the width of the player) in Doom maps.
The second one was a Boo house, because I like Boo houses. They’re generally twisty and confusing and weird. I also put it in SMB style, which was kind of interesting since that game never had Boo houses of its own; all the music and most of the graphics were created specifically for this game.
The first part ends with the classic “ha ha this isn’t the real exit”, along with a light hint that may have been a bit too light. The second part has multiple different areas that look identical, along with a subtle hint as to how to escape.
It’s always tricky leaving hints for players. I can’t very well tell what other people will catch on to, and the whole point of a puzzle is for everyone else to actually solve it. In the end I added an alternate route that I thought players would likely discover if they missed the hint and flailed around wildly.
The main problem with this level, I think, is the Boo circles. I just had a hell of a time placing them well, and I think in a couple places they force trickier jumps than I’d intended. I still like it, but I think the unintended difficulty detracts from the theme too much.
Incidentally, I don’t think difficulty is fun. Challenge is fun, and that implies something that you can learn to overcome. Expert mode of 100 Mario Challenge (which I believe picks levels with very low completion rates) produces utterly nasty levels, requiring numerous pixel-perfect jumps in sequence, or incredible reflexes for dodging a dozen things on the screen at once, or wall-jumping through an entire level. Most of the levels don’t feel like something I could do if I were slightly better at platforming, but something designed to make me fail so the level designer can pat themselves on the back and chuckle about how “hard” they made their level. Well, congratulations, but if all you wanted was to stop me from winning, you could’ve just built a wall in front of the flag and called it a day. (I ran across a level that literally did this. I beat it anyway and left a comment spoiling it. Fuck that noise.)
My third level is the first one I feel was actually successful. It’s fairly straightforward platforming, except that there are several pipes that take you to an identical area where all the enemies are double-size. There are several puzzles throughout that are most easily solved by switching back and forth between the big and little worlds.
I had to pay a lot of attention to decorations here, though it was less tedious than you might expect. The background decorations (bushes, small trees, flowers, etc.) aren’t actually objects in their own right; they sprout from ground tiles when you place them, at random. Thankfully, you can copy decorated ground tiles around. So after I’d finished the level, I had to create a little “palette” of the decorations in both areas, and toggle back and forth making sure the same decorations were in the same places. (You can’t copy between areas, alas.)
Though I’d intended that the player has to switch back and forth several times, every single puzzle can actually be completed in either area. I really didn’t want to end up with an arduously difficult level again, so I’m going to say this is a good thing.
Overall I think this was a pretty successful exercise, and I’m going to keep playing around to see what I can come up with. I haven’t run into many uploaded levels that feel like Mario levels, just fun platforming and exploring, so I’d like to try improving on that.
I did go back to Doom mapping along the way, and I managed to get a map’s general layout from 20% to 70% done, after it had haunted me for ages. I don’t know if I can ascribe that to Mario Maker, but I’d like to think working with simpler constraints shook a few cobwebs loose.
Super Mario Maker is a great tool, don’t get me wrong. It has a good spread of objects that interact in neat ways, a pretty friendly editor, and some clever touches (like requiring you to beat your own level before you can upload it).
I find myself left a little wanting. The most blindingly obvious problem is that you can’t actually recreate World 1–1 from any of the four games it emulates:
- Super Mario Bros. has one-way pipes and variable powerups (fire flower if you’re super or better, otherwise a mushroom).
- Super Mario Bros. 3 has a split pipe, black pipes, and four colors of semisolid platform.
- Super Mario World has a checkpoint, slopes, Yoshi coins, Rexes (the dragons), Banzai Bill (the screen-filling bullet), hint blocks, and a Chargin’ Chuck (the football guys).
- New Super Mario Bros. U has a ridiculous title. But it also sports a checkpoint, flying squirrels, acorn mushrooms, star coins, fake foreground, moving semisolid platforms, colored pipes, red coins, moving coins, a coin heaven theme, and more complex pipe connections.
The various Worlds 1-2 are missing yet more objects: infinite platform elevators from SMB, pink note blocks from SMB3, berries from SMW, tilted pipes and rotating platforms from NSMBU.
I’m not just nitpicking, or lamenting that no one can be super creative and recreate all the old levels. I have actual design concerns here:
There are very few mechanisms in the game right now, things the player can interact with to change the state of the level. We’ve got P switches and vines, and that’s really about all. I miss the track switches from SMW, which gave the player something to actually do while riding a track. The blue warp doors from SMW only appeared when a P switch was active, and that helped a few Boo houses feel all the more bizarre. Control coins and silver coins interacted with P switches in interesting ways too. Looping levels from several of the castles in SMB were mean, but made for cute simple puzzles. SMW’s switch houses spanned levels, granted, but they could be emulated here to alter a level.
It’s hard to reward players at the moment. There are no Coin Heavens — there’s no coin heaven theme, vines can’t be climbed off the top of the screen as in SMB, and the pink note blocks are just for playing music. There are no Yoshi coins from SMW nor star coins from NSMBU, so nothing to collect. You can give the player coins and 1-Ups, of course, but those don’t feel very rewarding when they rarely persist and when bunches of levels drown you in 1-Ups anyway.
This is proving to be a little frustrating for me, since I absolutely love designing secrets but have very little to offer the player as a reward. I’ve been using 1-Ups just because they’re an obvious indicator, but jeez. Hurrah, here’s yet another 1-Up. You did it. There are no real secret exits, no warp zones, no key and keyhole, no 3-Up Moon, no bonus block.
You can’t even reward the player for clearing a difficult part by giving them a checkpoint, since those aren’t in either. As a designer this is actually fairly limiting, since if you have more than a couple of tricky spots, chances are you’re just going to make the player sick of your level.
Movement is largely restricted to jumping, flying, and standing on platforms. Which is a shame, because Mario is all about movement, and new Mario series tend to revolve around giving the player new ways to move around. SMB3 added slopes and sliding, water outside of water levels. SMW had the triangle block that let you run up walls, the fences in castles that you could climb on and punch Koopas through, ropes and buzzsaws for traversing tracks, the Power Balloon, and Yoshi’s wings. NSMBU has a lot of various rotating blocks and moving semisolid platforms. Even the platforms and skull rafts have limited range you can’t change, and you can’t make rotating platforms or multi-block winged platforms, which has led to a lot of clumsily overlapping objects in a lot of levels I’ve seen.
Decorations are hard to wrangle! All we really have to work with are three kinds of decorations on ground tiles, which show up whenever they want. You can use semisolid platforms to kind of fake a background, but they are really cumbersome to work with: you can’t directly place objects on top of a semisolid platform, because that’s interpreted as moving the platform. And that’s it. Anything else requires some really creative use of objects intended for other purposes.
Surprising omissions include the castle background from World 8 of SMB, and of course the large bushes from SMW.
There are a lot of enemies, and yet rather a lot of them just walk back and forth along the ground. I’d like a little more variety, like the angry sun, Roto-disc, Pile Driver Micro-Goomba (the tiny monsters that hide in blocks), Chargin’ Chuck, Pokey, Eerie, Blargg, and Big Bertha.
Also, it’s pretty tricky to make a boss battle that actually requires defeating a boss, rather than just going around it. Would be nice if a monster could drop… something… when defeated?
Some miscellaneous mechanics are altered or conspicuously missing. Yoshi has no color-related powers — no fireballs, no flight, no earthquake. No baby Yoshi, either. You can jump over the goalposts in SMW levels. Using Lakitu without trivially breaking a level is actually kind of hard, because there’s no way to make Lakitu not leave a cloud behind. You can’t spin jump on grinders. The more I think about how all the games (save for SMB) actually worked, the more shallow Mario Maker seems in comparison.
The world itself has some frustrating limitations. The start and goal are completely fixed, and in fact you can’t avoid having a seam between the starting point and the ground after it. The vertical size constraints (two screens high) mean you can’t make vertical cave levels, or paths that diverge too widely. If you want to have separate underground areas, doing it right means filling in at least 12 columns of solid blocks — which seriously eats into your limit of 2000 “environment” blocks. You can only have water as a full area style, not as a small pond or running across the bottom to make a Big Bertha level. Similarly, you can’t mix styles in the same level, so you can’t have a water/ground level like exist in SML2, and you can’t have a ground/underground level like SMB3 World 1–5. Doors can’t go to the other area, pipes can’t lead to the same area, and both doors and pipes must be symmetric two-way connections.
And browsing levels could use a teeny bit of work — you can’t easily offer a set of your own levels intended to be played as a single world, and I hear it’s actually surprisingly difficult to find levels made by your friends?
So I really hope they update the game to add some more of these classic mechanics and lift a few of the restrictions.
And while they’re at it, I wouldn’t mind seeing the game list expanded to Super Mario Land 2: Six Golden Coins, which was my first Mario and has some unique touches like the bunny suit and blocks that can only be destroyed by fireballs. Super Mario 3D World has a nice aesthetic, too, and a few interesting touches of its own (cat suit).
If you’d made or found any interesting levels, I would love to see them! Gimme gimme.
I spent a good chunk of the last four days installing an Internet web forum, which claims it can be up and running in 30 minutes.
I like to think I’m pretty alright at computers. So what went wrong here? Well let me tell you.
I don’t want to name and shame here, because this is not my first such experience and the problem is larger than one individual product. (Let’s just say it rhymes with “piss horse”.)
The 30-minute claim came because the software only officially supports being installed via Docker, the shiny new container gizmo that everyone loves because it’s shiny and new, which already set off some red flags. I managed to install an entire interoperating desktop environment without needing Docker for any of it, but a web forum is so complex that it needs its own quasivirtualized OS? Hmm.
I tried installing the vendor Docker (I’m using Ubuntu 14.04, the current LTS release), but that’s 1.0, and Docker has gotten up to 1.7 in the intervening year and a half, and this software needs at least Docker 1.2. I stress that this web forum is so cutting-edge that it refuses to install without technology that did not exist two years ago.
So I tried installing current Docker via the officially condoned mechanism, which of course involves piping
curl into your shell. That’s a fucking appalling idea, but security is kind of a joke with Docker anyway. It also didn’t work, giving me the rather useless
E: Unable to locate package docker-engine instead. I’m sure glad Docker exists, to save me from all those package management nightmares!
Some digging revealed that Docker just doesn’t exist for 32-bit, even though they say it should work (as evidenced by the existence of a canonical 32-bit Ubuntu package), and they just don’t bother mentioning this in their README or installation docs or shell script that runs as root.
At this point I was pretty sick of Docker, so I decided to try installing the damn thing manually. It was just a Rails app, after all, and I’ve managed to install those before. How hard could it possibly be?
Ha, ha! After a
git clone (because the app isn’t in rubygems??), I then spent maybe six hours fighting with RVM. (I’m sure you have a suggestion for a different Ruby environment thing I should be using instead, and I don’t care, shut up, I already had RVM installed and running something else.)
The problem was some extremely obtuse errors when running
bundle install, which is supposed to install all of the app’s dependencies. Some library was complaining that a
.a file in its own build directory didn’t exist, which didn’t make a lot of sense. Also, I spotted
x86_64-linux in the path, which made even less sense.
See, I actually have a 64-bit kernel, but a 32-bit userspace. (There’s a perfectly good reason for this.) And the Ruby binary that RVM built was, of course, 32-bit — it wouldn’t have worked otherwise, since libc and everything else are all 32-bit. But those binaries thought they were on a 64-bit system (which they were), and rubygems uses the system architecture for building native extensions for some stupid fucking reason, so everything was built as 64-bit. In a way I’m lucky that this one particular package happened to fail, because all the others built just fine, and I only would’ve found the problem later when I actually tried to run the damn thing.
I tried all kinds of environment variables and hand-editing of files and whatnot to convince Ruby that it was actually 32-bit, to no avail. Eventually I resorted to reading a bunch of RVM’s source code, and then I discovered a
--32 flag that magically fixes everything. It’s not documented, but don’t worry! I found a GitHub issue comment from three and a half years ago, saying the docs will be fixed with RVM 2.0.
So now I had a working Ruby, and after some tedious rebuilding, I had a set of gems as well. Super.
Now I just had to figure out how to configure the damn app, which is tricky when the README just says “use Docker”. It had a
config/app.conf.sample file, but this turned out to be sample configuration for Upstart, the Ubuntu service manager. I ended up discovering that there are still docs for installing on Ubuntu, just not linked from anywhere.
The next step was to migrate the database from “doesn’t exist” to “exists”, which is usually a breeze in Rails, by which I mean I have never once had it actually work without descending into a hellish nightmare and this time was no exception. The documentation claims the app needs to be superuser. Let’s see what PostgreSQL says about superusers.
Superuser status is dangerous and should be used only when really needed.
Yes, this definitely seems like something a web forum needs. I opted not to give it root on my entire database, which of course broke the migrations because they use
CREATE EXTENSION to load binary extensions into my server, a perfectly reasonable thing for database migrations to be doing. I didn’t even have the required extension installed, and of course the documentation never once mentions needing it, so off I went to install it.
postgresql-contrib, and then some funny things happened. Long story short, I was running Postgres 9.1, and the current Ubuntu version is 9.3. I’d originally installed the
postgresql package, and using Arch Linux on my desktop has spoiled me into thinking that that will keep me on the latest version, but Ubuntu cares about trivialities like “not breaking your entire server” and had just kept me on 9.1 the whole time. But
postgresql-contrib, unqualified, meant the current version now which was 9.3, and had also installed the full server. Whoops! So I just took a quick detour to upgrade to 9.3, which I’ve done before and which is relatively painless.
Okay! Now I have a database.
At this point the docs take a wild detour into installing some Ruby process management library called Bluepill and copying some massive pile of “configuration” (actually just Ruby code, of course) and using that to run the app and also adding Bluepill to the user’s crontab as a
@reboot and what the ever-loving fuck.
(I assumed this was some oblique Matrix reference, but someone later pointed out to me that it’s called bluepill and it keeps things up. Charming, but par for the course for Ruby.)
Anyway, I opted to not do all that, and just ran the thing directly with
Almost done. Now I just need to proxy nginx to it. The app helpfully provides some configuration for me, which is two hundred lines long and consists mostly of convoluted rules for which URLs are static assets and which should be proxied. I decided to hell with it and just proxied the whole thing and I’ll fix it later if I feel like it.
Now we’re up and running! Except I never get any signup email, and it turns out this is because I also have to run “sidekiq”, a job processor. And with that, now we’re done.
I tell you this story to make the point that this is all completely fucking ridiculous.
Set aside the oddball tool breakage and consider that if you follow the instructions to the letter, this web forum requires:
- Cloning (not installing!) the software’s source code and modifying it in-place.
- Copy-pasting hundreds of lines of configuration into nginx, as root, and hoping it doesn’t change when you upgrade.
- Copy-pasting hundreds of lines of Ruby for the sake of bluepill, and hoping it doesn’t change when you upgrade.
- Installing non-default Postgres extensions, as root.
- Running someone else’s arbitrary database commands as a superuser.
- Installing logrotate configuration, as root.
There’s nothing revolutionary here. It’s an app that wants to accept HTTP connections, use a database, and send email. Why is this so fucking complicated?
I’ll tell you why—
My experience is admittedly limited here, but as far as I can tell, installing a Rails app is impossible. It reads configuration from the source directory. It logs to the source directory. You have to manually precompile all the assets, which are of course also written to the source directory.
Rails is one of the most popular web frameworks in the world, championed by developers everywhere. And you can’t actually install anything written with it. This is a joke, right?
Back in the day, when Windows effectively didn’t have users and everyone just ran everything as an administrator, Unix nerds (myself included) would crow about how great Unix was for making heavy use of separate users for everything.
Boy, do I have egg on my face. Let’s recap here:
- If you’re missing a library or program, and that library or program happens to be written in C, you either need root to install it from your package manager, or you will descend into a lovecraftian nightmare of attempted local builds from which there is no escape. You say you need
lxmlon shared hosting and they don’t have
libxml2installed? Well, fuck you.
- Only one thing can bind to port 80 and it has to run as root, so your options are to use nginx and need root to add a new app, or use Apache and do
.htaccessor something equally atrocious.
- You want your app to start automatically, of course. You can add it to your crontab with
@reboot, which is kind of a hack and also won’t restart it if it dies. So you can also install your own local process manager, like this app did. Or you can do what most people do and add it to the system’s daemon manager, as root. Allegedly many modern daemon manager things allow non-root users to set their own things up, but I’ve never seen this actually done or even explained very clearly.
- If you want to rotate your logs, well, that needs root.
- You think Docker solves any of this? Let me know how piping
curlto a shell script that uses
sudoworks out for you. Oh, and if you’re in the docker group, you are root.
Modern Linux desktops are pretty alright at the multi-user case, which basically no one uses. On the server side, well, if you have a server everyone just assumes you have root anyway, so everything is a giant mess. Even RVM, which is designed for having multiple per-user Ruby installations, prompted me for my password so it could
sudo apt-get install something.
We are really, really bad at enumerating and handling dependencies.
I mean, we can’t even express them in our own software. System package managers deal with it, and that’s great — but I’m a developer, not a packager. If I write a Python library that wraps a C library, there is no way to express that dependency. How would I? There’s no canonical repository of C/C++ packages, anywhere. Even if I could, what good would it do? Installing a shared C library locally is a gigantic pain in the ass, involving
LD_LIBRARY_PATH, or maybe it was
LDFLAGS=-rpath? See, I don’t even know. Virtually no one does it, because it’s a huge pain, because virtually no one does it.
So it should come as no surprise that there is no way whatsoever to list dependencies on services. You’d think that a web app could just have some metadata saying “I need Postgres and, optionally, Redis”, but this doesn’t exist. And the other side, where the system can enumerate the services it has available for a user, similarly doesn’t exist. So there’s no chance of discovery. If you’re missing a service the app needs but failed to document, or you set it up wrong, you’ll just find out on the first request.
For all the moving parts and all the things that can go wrong, there sure is a huge lack of reporting when it breaks. I basically rely on people tweeting at me or asking on IRC if something is broken. This particular app relied critically on a job queue, but didn’t notice it wasn’t running.
There are a few widgets that will email all crash logs to you, but what idiot came up with that? That’s completely fucking useless. I have over two thousand unread crash emails for my perfectly functional modest-traffic website. Almost all of them are some misconfigured crawler blowing up on bogus URLs in a way I don’t strongly care about fixing.
But if the app goes down and completely fails to start, I get zero email. If the app runs but every request takes 20 seconds, I get zero email. If every page 404s, I get zero email. And if real actual pages start to break, I get a flood of email that I’ll never notice because I don’t even look in that folder any more.
These are not unique problems. Yet the only solutions I’ve seen take the form of dozens of graphs you’re expected to keep an eye on manually.
We should have apps that install with one (1) command, take five minutes to configure, and scale up to multiple servers and down to shared hosting. If I cannot install your web forum on Dreamhost, you have failed spectacularly.
But we haven’t even tried to solve this, and all the people who are most capable of solving it are too busy scaling Twitter or Amazon up to ten million servers or whatever. Installing basic web software gets harder all the time, and shared hosting becomes less useful all the time, and web developers flock to garbage like Docker that basically runs a VM because we can’t figure out how to make two apps use the same damned database.
The thing I want, but never figured out how to build, is an intermediate web app for the express purpose of installing and managing web apps. Yes, sure, like cPanel or whatever, but not with ad-hoc support for some smattering of popular apps; I also want a protocol for apps to explain their own minimal requirements.
I want to be able to say “install the Ruby app ‘pisshorse’”. And it goes and finds that gem. And it sees what Ruby version it claims to work on, and installs an RVM environment with that version. And it makes a new gemset and installs the gem. And it looks at a metadata file in a Well-Known Place, and it sees that the file demands a Postgres database and a Redis instance. And it inspects the common ways you might expect to be able to connect to Postgres or Redis. And then it asks me where Postgres and Redis are, and it offers whatever it found as defaults, and it accepts something concise like
postgresql:///pisshorse rather than ten separate fields that make no sense if you’re not connecting over TCP. And it double-checks that those are okay, and it writes them to a very small configuration file in
~/.config/webapps/pisshorse or wherever. At no point am I asked to configure some ridiculous value like the TTL of database connections, which no one cares about and which the computer should be smart enough to gauge on its own.
If this is a shared host and you only have one Postgres database, that’s totally fine, because this is a magical world where people actually know about and use Postgres schemata, and apps actually support them.
The metadata file also lists any system-level libraries or binaries that are required (or desired), and if any of them aren’t installed, you’ll be asked to install them, with a single
packer command you can inspect and then run yourself. Again, if this is a shared host and you can’t install software yourself, then the installer can either attempt to do it locally or just give up, and everything’s fine because it turns out web forums don’t actually need
optipng and can just carry on without it.
Then it adds the app to your user-scoped daemon manager, and if you don’t have one then it quietly pretends to be one, using the
@reboot hack. And it sees that the app also needs a job queue running, so it adds that too. It uses gunicorn or unicorn or uwsgi or whatever, but you don’t actually care which, and if you do then you can ask for a different one. It defaults to only two workers, but it also keeps an eye on the load and spawns a few more if necessary, learning how much traffic is normal as it goes. If it thinks it’s eating too much of the machine, it sends you an email or pings you on IRC or whatever.
The app is bound to
~/.config/webapps/pisshorse/pisshorse.sock, which isn’t too useful to you. And this is the hard part that I haven’t figured out yet, because there’s not really a good way to determine what your HTTP vhost setup looks like, and if you’re using nginx then you still need root. But I have ideas for a couple (convoluted) workarounds, so let’s pretend that the world is a nice place and it can set up the reverse proxying for you, without needing root. It even adds rules for caching the static assets (also defined in the metadata file), and perhaps can ask you for a CDN if you have one.
Now the app runs, but it has no users, and you can’t log in because you don’t have a confirmed email yet. But that’s okay, because the metadata file also specifies a few administrative commands you can run from the command line, and of course the magical web GUI can also do this for you.
From here you can basically forget about the management GUI. But it quietly collects logs and stats, and there are graphs to look at if you please. If at any point the app fails to start, or there’s a sharp uptick in failures on pages that used to work, or it can’t keep up with requests, or the job queue is broken, you get a ping.
Eventually you’ll need to upgrade, and that’s also fine, because it’s just a single button click. Your current instance goes into read-only mode, which is a thing that all apps support, because it would be embarrassing if they didn’t. The job queue is shut down, the database is copied and upgraded, and a separate new instance of the app is launched. New requests are directed to the new code, the old instance is shut down, and the old database is archived. Or, if the new instance immediately starts to spew errors, the old code is kept up and an irate email is automatically sent to the app’s maintainer. Either way, the disruption is minimal.
And the app benefits as well, because it uses a small library that knows whether it’s running under gunicorn or uwsgi or something else, and can perform some simple tasks like inspect its own load or restart itself or run some simple code outside a request.
I can dream.
We’ve been doing this for 20 years. We should have this by now. It should work, it should be pluggable and agnostic, and it should do everything right — so if you threw away the web gui, it would look like something a very tidy sysadmin set up by hand, not autogenerated sludge.
Instead, we stack layer after layer of additional convoluted crap on top of what we’ve already got because we don’t know how to fix it. Instead, we flit constantly from Thin to Mongrel to Passenger to Heroku to Bitnami to Docker to whatever new way to deploy trivial apps came out yesterday. Instead, we obsess over adding better Sass integration to our frameworks.
And I’m really not picking on Ruby, or Rails, or this particular app. I hate deploying my own web software, because there are so many parts all over the system that only barely know about each other, but if any of them fail then the whole shebang stops working. I have at least five things just running inside
tmux right now, because at least I can read the logs and restart them easily.
This is terrible and we should all be ashamed. No wonder PHP is so popular. How am I supposed to tell a new web developer that this is what they have to look forward to?
I’m assuming, if you are on the Internet and reading kind of a nerdy blog, that you know what Unicode is. At the very least, you have a very general understanding of it — maybe “it’s what gives us emoji”.
That’s about as far as most people’s understanding extends, in my experience, even among programmers. And that’s a tragedy, because Unicode has a lot of… ah, depth to it. Not to say that Unicode is a terrible disaster — more that human language is a terrible disaster, and anything with the lofty goals of representing all of it is going to have some wrinkles.
So here is a collection of curiosities I’ve encountered in dealing with Unicode that you generally only find out about through experience. Enjoy.
Also, I strongly recommend you install the Symbola font, which contains basic glyphs for a vast number of characters. They may not be pretty, but they’re better than seeing the infamous Unicode lego.
Unicode is a big table that assigns numbers (codepoints) to a wide variety of characters you might want to use to write text. We often say “Unicode” when we mean “not ASCII”, but that’s silly since of course all of ASCII is also included in Unicode.
UTF-8 is an encoding, a way of turning a sequence of codepoints into bytes. All Unicode codepoints can be encoded in UTF-8. ASCII is also an encoding, but only supports 128 characters, mostly English letters and punctuation.
A character is a fairly fuzzy concept. Letters and numbers and punctuation are characters. But so are Braille and frogs and halves of flags. Basically a thing in the Unicode table somewhere.
A glyph is a visual representation of some symbol, provided by a font. It might represent a single character, or it might represent several. Or both!
Unicode is divided into seventeen planes, numbered zero through sixteen. Plane 0 is also called the Basic Multilingual Plane, or just BMP, so called because it contains the alphabets of most modern languages. The other planes are much less common and are sometimes informally referred to as the astral planes.
If the only written languge you’re familiar with is English, that goes doubly so.
Perhaps you want to sort text. A common enough problem. Let’s give this a try in Python. To simplify things, we’ll even stick to English text.
1 2 3 4
>>> words = ['cafeteria', 'caffeine', 'café'] >>> words.sort() >>> words ['cafeteria', 'caffeine', 'café']
Oops. Turns out Python’s sorting just compares by Unicode codepoint, so the English letter “é” (U+00E9) is greater than the English letter “f” (U+0066).
Did you know the German letter “ß” is supposed to sort equivalently to “ss”? Where do you sort the Icelandic letter “æ”? What about the English ligature “æ”, which is the same character?
What about case? The Turkish dotless “ı” capitalizes to the familiar capital “I”, but in Turkish, the lowercase of that is “ı” and the uppercase of “i” is “İ”. Is uppercase “ß” the more traditional “SS”, or maybe “Ss”, or the somewhat recent addition “ẞ”?
Or, how do you compare equality? Is “ß” equal to “ss”? Is “æ” equal to “ae”? Is “é” equal to “é”?
Ah, you say! I’ve heard about this problem and know how to solve it. I can just throw Unicode normalization at it, which will take care of combining characters and all that other nonsense. I can even strip out all combining characters and have nice normal English text left, because for some reason I am under the impression that English text is “normal” and all else is “abnormal”.
Sure, let’s give that a try.
1 2 3 4 5
>>> import unicodedata >>> normalize = lambda s: ''.join(ch for ch in unicodedata.normalize('NFKD', s) if not unicodedata.combining(ch)) >>> >>> normalize('Pokémon') 'Pokemon'
Great, problem solved.
1 2 3 4 5 6
>>> normalize('ı') 'ı' >>> normalize('æ') 'æ' >>> normalize('ß') 'ß'
1 2 3 4
>>> normalize('한글') '한글' >>> normalize('イーブイ') 'イーフイ'
Yes, it turns out that Unicode decomposition also decomposes Hangul (the alphabet used to write Korean) into its sub-components, which then may or may not still even render correctly, as well as splitting the diacritics off of Japanese kana, which significantly alters the pronunciation and meaning. Almost as if Unicode decomposition was never meant to help programmers forcibly cram the entire world back into ASCII.
Even if you only care about English text, there’s more than one Latin alphabet in Unicode! Is “x” equivalent to “𝗑” or “𝘅” or “𝘹” or “𝙭” or “𝚡” or “ｘ” or “𝐱”? What about “×” or “х” or “⨯” or “ⅹ”? Ah, sorry, those last four are actually the multiplication sign, a Cyrillic letter, the symbol for cross product, and the Roman numberal for ten.
This is a particularly aggravating problem because most programming languages have facilities for comparing and changing the case of text built in, and most of them are extremely naïve about it. You can’t even correctly change the case of English-looking text without knowing what locale it came from — the title-case of “istanbul” may actually be “İstanbul” depending on language, because of Turkish’s dotted “i”.
The only library I’m aware of off the top of my head for correctly dealing with any of these problems is ICU, which is a hulking monstrosity hardly suited for shipping as part of a programming language. And while their homepage does list a lot of impressive users, I’ve only encountered it in code I’ve worked on once.
Typically we think of combining characters as being the floating diacritical marks that can latch onto the preceding letter, such as using U+0301 COMBINING ACUTE ACCENT to make “q́”, in case we are direly in need of it for some reason. There are a few other combining “diacriticals” that aren’t so related to language; for example, U+20E0 COMBINING ENCLOSING CIRCLE BACKSLASH can produce “é⃠”, the universal symbol for “my software only supports English, and also I am not aware that English has diacritics too”. Or perhaps you’d use U+20E3 COMBINING ENCLOSING KEYCAP to make “é⃣” and indicate that the user should press their é key.
All of these have an impact on the “length” of a string. You could write either of those “é” sequences with three codepoints: the letter “e”, the combining accent, and the combining border. But clearly they each only contribute one symbol to the final text. This isn’t a particularly difficult problem; just ignore combining characters when counting, right?
More interesting are the Unicode characters that are not combining characters, but compose in some way in practice anyway. The flag emoji, for example, don’t actually exist in Unicode. The Unicode Consortium didn’t want to be constantly amending a list of national flags as countries popped in and out of existence, so instead they cheated. They added a set of 26 regional indicator symbols, one for each letter of the English alphabet, and to encode a country’s flag you write its two-letter ISO country code with those symbols. So the Canadian flag, 🇨🇦, is actually the two characters U+1F1E8 REGIONAL INDICATOR SYMBOL LETTER C and U+1F1E6 REGIONAL INDICATOR SYMBOL LETTER A. But if you put a bogus combination together, you probably won’t get a flag glyph; you’ll get stand-ins for the characters instead. (For example, 🇿🇿.) So the “length” of a pair of these characters depends both on the display font (which may not support all flags), and on the current geopolitical state of the world. How’s that for depending on global mutable state?
But it gets better! There’s a character called U+200D ZERO WIDTH JOINER, which is used to combine otherwise distinct characters in some languages (but has fairly general semantics). Apple has made creative use of this character to compose emoji together. The report on emoji has some examples. So now the length of some text is completely arbitrary, based on whatever arbitrary ligatures the font includes.
To be fair, that was already true anyway. You might argue that the length of text in human terms is not actually all that interesting a quantity, and you’d be right, but that’s why this section is about character width. Because I’m typing in a terminal right now, and terminals fit all their text in a grid.
Let’s return to the simpler world of letters and revisit that Hangul example:
>>> normalize('한글') '한글'
Hangul characters are actually blocks composed of exactly three parts called Jamo. (Here’s gritty detail on Hangul, Jamo, and Unicode. It’s a really cool alphabet.) Applying Unicode decomposition actually breaks each character down into its component Jamo, which are then supposed to render exactly the same as the original. They aren’t marked as combining characters in the Unicode database, but if you have three of them in a row (arranged sensibly), you should only see one character. The actual decomposition for the text above is “ㅎㅏㄴ ㄱㅡㄹ”, written with separate characters that don’t combine. There are a good few languages that work this way — Devanagari (the script used for Hindi et al.) and Bengali rely heavily on character composition, and Hebrew uses it for rendering vowels.
And yet I ended up with four very different renderings. In this blog post, with my default monospace font, I see the full sequence of six Jamo. If I paste the same text somewhere with a proportional font, I see something very nearly identical to the original characters, albeit slightly fuzzier from being generated on the fly. In Konsole, I see only the first Jamo for each character:
'ㅎㄱ'. And in my usual libvte-based terminal, the combining behavior falls apart, and I see a nonsensical mess that I can’t even reproduce with Unicode:
I can only guess at what happened here. Clearly both terminals decided that each set of three Jamo was only one character wide, but for some reason they didn’t combine. Konsole adamantly refuses to render any Jamo beyond the first, even if I enter them independently; VTE dutifully renders them all but tries to constrain them to the grid, leading to overlap.
This is not the first width-related problem I’ve encountered with Unicode and terminals. Consider emoji, which tend to be square in shape. I might reasonably want to say to someone on IRC: “happy birthday! 🎁 hope it’s a good one.” (That’s U+1F380 WRAPPED PRESENT, if you didn’t take my advice and install Symbola.) But I use a terminal IRC client, and here’s how that displays, in VTE and Konsole:
You can see how VTE has done the same thing as with Hangul: it thinks the emoji should only take up one character cell, but dutifully renders the entire thing, allowing the contents to spill out and overlap the following space. You might think Konsole has gotten this one right, but look carefully — the final quote is slightly overlapping the cursor. Turns out that Konsole will print each line of text as regular text, so any character that doesn’t fit the terminal grid will misalign every single character after it. The cursor (and selection) is always fit to the grid, so if you have several emoji in the same row, the cursor might appear to be many characters away from its correct position. There are several bugs open on Konsole about this, dating back many years, with no response from developers. I actually had to stop using Konsole because of this sole issue, because I use ⚘ U+2698 FLOWER as my shell prompt, which misaligned the cursor every time.
All of these problems can be traced back to the same source: a POSIX function called
wcwidth, which is intended to return the number of terminal columns needed to display a given character. It exists in glibc, which sent me on a bit of a wild goose chase. I originally thought that
wcwidth must be reporting that the second and third Jamo characters are zero width, but this proved not to be the case:
1 2 3 4
>>> libc.wcwidth(c_wchar('\u1100')) # initial Jamo 2 >>> libc.wcwidth(c_wchar('\u1161')) # second Jamo 1
Hangul Jamo medial vowels and final consonants (U+1160-U+11FF) have a column width of 0.
Aha. So Konsole saw that the second and third Jamo took zero space, so it didn’t bother trying to print them at all.
Then what the hell is VTE doing? It defers to some utility functions in
glib (GNOME’s library of… stuff), such as
g_unichar_iszerowidth, which… explicitly says yes for everything between U+1160 and U+1200. Wouldn’t you know it, those are the secondary and tertiary Jamo characters. So VTE saw that they took zero space, so it didn’t make any extra room for them, but still tried to print them. I expect they didn’t combine in VTE because VTE has no idea they’re supposed to combine, so it printed each one individually.
Oh, but this madness gets even better. WeeChat, another terminal IRC client, outright strips emoji, everywhere. This is apparently the fault of… glibc’s implementation of
wcwidth, which defaults to 1 for printable characters and 0 otherwise, which requires knowing what the characters are, which oops doesn’t work so well when glibc was using a vendored copy of the (pre-emoji) Unicode 5.0 database until glibc 2.22, which was released less than a month ago.
Beloved SSH replacement mosh has a similar problem, in this case blamed on the
wcwidth implementation shipped with OS X. Gosh, I thought Apple was on the ball with Unicode.
We’re now up to at least four mutually incompatible and differently broken versions of this same function. Lovely.
I might be on the fringe here, but I’m pretty adamant that having a communication program silently and invisibly eat parts of your text is a bad thing.
While I’m at it: why are emoji left with a width of 1? They tend to be drawn to fit a square, just like CJK characters (which are why we need double-width character support in the first place), and they’re even of Japanese origin. My rendering problems would go away in both terminals if they used widths of 2. Hell, I’m going to go file bugs on both of them right now.
WSpace property to a handful of codepoints. Seems like a good approach, except for this one unusual exception: ” ” is a space character, U+1680 OGHAM SPACE MARK. Ogham is an alphabet used in older forms of Irish, and its space character generally renders as a line. Surprise!
Complicating this somewhat further, there are actually two definitions of whitespace in Unicode. Unicode assigns every codepoint a category, and has three categories for what sounds like whitespace: “Separator, space”; “Separator, line”; and “Separator, paragraph”.
If you’re familiar with Unicode categories, you might be tempted to use these to determine what characters are whitespace. Except that CR, LF, tab, and even vertical tab are all categorized as “Other, control” and not as separators. You might think that at least LF should count as a line separator, but no; the only character in the “Separator, line” category is U+2028 LINE SEPARATOR, and the only character in “Separator, paragraph” is U+2029 PARAGRAPH SEPARATOR. I have never seen either of them used, ever. Thankfully, all of these have the
As an added wrinkle, the lone oddball character “⠀” renders like a space in most fonts. But it’s not whitespace, it’s not categorized as a separator, and it doesn’t have
WSpace. It’s actually U+2800 BRAILLE PATTERN BLANK, the Braille character with none of the dots raised. (I say “most fonts” because I’ve occasionally seen it rendered as a 2×4 grid of open circles.) Everything is a lie.
var bomb = "💣"; console.log(bomb.length); // 2
console.log(bomb.charCodeAt(0).toString(16)); // d83d console.log(bomb.charCodeAt(1).toString(16)); // dca3
These aren’t actually characters. Everything from U+D800 through U+DFFF is permanently reserved as a non-character for the sake of encoding astral plane characters in UTF-16. The short version is that all BMP characters are two bytes in UTF-16, and all astral plane characters are two of these non-characters (called a surrogate pair) for a total of four bytes.
char*) are just sequences of bytes, so you can’t fit more than Latin-1. Some libraries have historically tried to address this with “wide strings”,
wchar_t*, but the size of
Arguably, 16-bit faux strings are worse than 8-bit faux strings. It becomes pretty obvious pretty quickly that 8 bits is not enough to fit more than some European alphabets, and anyone but the most sheltered programmer is forced to deal with it the first time they encounter an em dash. But 16 bits covers the entire BMP, which contains all current languages, some ancient languages, dingbats, mathematical symbols, and tons of punctuation. So if you have 16-bit faux strings, it’s very easy to think you have all of Unicode automatically handled and then be sorely mistaken. Thankfully, the increasing availability and popularity of emoji, which are mostly not in the BMP (but see below), makes astral plane support a more practical matter.
This probably all dates back to the original design of Unicode, which assumed that we’d never possibly need any more than 65,536 different characters and promised that two bytes would be enough for everyone. Oops.
(This is the same reason that Chinese hanzi and Japanese kanji are merged into a single set of codepoints: they’re both huge alphabets and it was the only way to fit them both into two bytes. This is called Han unification, and I have seen it end friendships, so I prefer not to discuss it further.)
One more trivium: MySQL has a
utf8 encoding, and it’s generally regarded as best practice to use that for all your text columns so you can store Unicode. But, oops, MySQL arbitrarily limits it to three bytes per character, which isn’t enough to encode most astral plane characters! What a great technical decision and not at all yet another thorn in the unusable sinkhole that is MySQL. Version 5.5 introduced a
utf8mb4 encoding that fixes this, so have fun
ALTERing some multi-gigabyte tables in production.
I exaggerate slightly.
The word “emoji” is generally used to mean “any character that shows as a colored picture on my screen”, much like the word “Unicode” is generally used to mean “any character not on my US QWERTY keyboard”. So what characters qualify as emoji?
There’s actually no Unicode block called “emoji”. The set of smiley faces is in a block called Emoticons, and most of the rest are in Miscellaneous Symbols and Pictographs and Transport and Map Symbols.
The Unicode Consortium has a technical report about emoji, which should be an immediate hint that this is not a trivial matter. In fact the report defines two levels of emoji, and look at how arbitrary these definitions are:
emoji character — A character that is recommended for use as emoji.
level 1 emoji character — An emoji character that is among those most commonly supported as emoji by vendors at present.
level 2 emoji character — An emoji character that is not a level 1 emoji character.
So emoji are defined somewhat arbitrarily, and even based on what’s treated as an emoji in the wild.
It’s tempting to just say that those few astral plane blocks are emoji, but you might be surprised at what else qualifies sometimes. There’s also a data table listing emoji levels, and it classifies as emoji a good handful of arrows and dingbats and punctuation, even though they’ve been in Unicode for many years. 🃏 U+1F0CF PLAYING CARD BLACK JOKER is a level 1 emoji, but nothing else in the entire Playing Cards block qualifies. Similarly, 🀄 U+1F004 MAHJONG TILE RED DRAGON is the only representative of Mahjong Tiles, and Domino Tiles aren’t represented at all.
I stress, also, that a colored graphic is not the only way emoji (however you define them) may be rendered. Here’s a screenshot of part of that table on my desktop:
That font is Symbola, which only has monochrome vector glyphs. So they’re no different than any other character.
I’ve been seeing an increasing trend lately of treating emoji as somehow completely unique. The IM programs WhatsApp and Telegram both use Apple’s emoji font on every platform, and I’ve seen even technically-inclined people passionately argue that this is a good state of affairs, because it means both parties will see exactly the same pixels. Wouldn’t want to confuse anyone by having them see a slightly different image of a steak! (You’d think that’s what sending images is for, but what do I know.)
This is somewhat troubling to me. The entire point of having these symbols exist in Unicode is so they can be transferred between different systems and treated just like any other text, because now they’re just text. They aren’t special in any way (besides being in an astral plane, I suppose), and there’s no reason you couldn’t construct an emoji font that displayed regular English characters as graphics. Hell, if you’re using Firefox, here’s a demo of SVG embedded in an OpenType font that displays the letter “o” as an animated soccer ball.
To wrap this up, here are some obscure characters I’ve had reason to use at times and think are interesting.
Control Pictures is an entire block of visual representations of control characters. So if you want to indicate there’s a NUL byte, instead of writing out “NUL” or “\x00”, you can use ␀ U+2400 SYMBOL FOR NULL. I’ve actually found it fairly useful once or twice to use ␤ U+2424 SYMBOL FOR NEWLINE to display multi-line text in a way that fits in a single line! I like it so much that I added a compose key shortcut for it: compose, n, l.
Ruby characters are annotations used with Chinese and Japanese text to explain pronuncuation, like the tiny characters here: 日本語（にほんご）. Usually they’re expressed with HTML’s
<ruby> tag, but outside of HTML, what are you to do? Turns out Unicode actually supports this in the form of three “interlinear annotation” characters, and you could write the above as “
[U+FFFB]“. They tend not to have any rendering in fonts, since they’re control characters, and Unicode actually recommends they not be exposed directly to users at all, so there are no rules for how to actually display them. But if you want to store annotated Chinese or Japanese text without pretending all text is HTML, there you go.
More well-known are U+200E LEFT-TO-RIGHT MARK and U+200F RIGHT-TO-LEFT MARK, which are part of the byzantine Unicode bi-directional text system, but among English speakers are mainly known for being able to reverse text in unsuspecting websites.
All the way back in humble ASCII, U+000C FORM FEED can be used to encode a page break in plain text. veekun’s database uses form feeds to mark where the Pokédex flavor text in Gold, Silver, and Crystal breaks across two pages.
Finally, here are some of my favorite Unicode blocks, good places to look for interesting squiggles. Especially if, say, you’re making a text-based game.
- Arrows: ↹ ⇝ ↻ ↯
- Mathematical Operators: ≈ ∞ ∀ ⊕ ⊠
- Miscellaneous Technical: ⌘ ⌚ ⌛ ⌨ ⏣ ⏏
- Box Drawing and Block Elements: ╟─╢ ░ ▒ ▓
- Geometric Shapes and Geometric Shapes Extended: ◎ 🞋 ◭ ▸ ▢ 🞠 🞴
- Miscellaneous Symbols, containing:
- weather symbols: ☀ ☁ ☂ ☃ ☄
- playing card suits: ♡ ♢ ♤ ♧ ♥ ♦ ♠ ♣
- planetary and astrological symbols: ☿ ♀ ♁ ♂ ♃ ♄ ♅ ♆ ♇ ♈ ♉ ♊ ♋ ♌ ♍ ♎ ♏ ♐ ♑ ♒ ♓
- chess pieces: ♔ ♕ ♖ ♗ ♘ ♙ ♚ ♛ ♜ ♝ ♞ ♟
- dice: ⚀ ⚁ ⚂ ⚃ ⚄ ⚅
- musical symbols: ♩ ♪ ♫ ♬ ♭ ♮ ♯
- and other goodness: ⚢ ⚠ ☠ ☢ ☮ ☭ ⚰ ⚘ ⚙ ♲ ⛤ ⛓ ⛏
- Dingbats: ✭ ❁ ❄ ➤ ➠ ✎
- Mahjong Tiles: 🀀 🀟 🀩
- Domino Tiles: 🀳🁃🂃🁐
- Playing Cards: 🂪 🂫 🂭 🂮 🂡
- Alchemical Symbols: 🜱 🜲 🜻
Creating this list has made me realize that codepoints.net is not actually a very nice way to browse by block or copy-paste a lot of characters. There’s fileformat.info, if you weren’t aware, but it’s kind of barebones and clumsy. And most Unicode blocks have Wikipedia articles. But overall, everything is uniquely terrible; welcome to computers.
|Database has heights off by a factor of 10|