Emoji Modifiers and Sequence Combinations

Spec Dec 19, 2019

Following up on incredible interest around my previous front-page articles, Unicode is Awesome, and Hacking GitHub with Unicode's Turkish Dotless 'i',  I've put together another article of my own favorite features of Unicode's Emoji modifiers and sequences. Before I get started, I want to remind readers that I'm in no way the canonical source of the Unicode standard. Mark Davis and Peter Edberg are a few of the many smart ones behind the Emoji spec (tr51-16).  I'm just a software developer who's given the spec a fair read through and really enjoyed it. So here we go, let's dive in.

Currently, the Unicode Emoji spec is on v12, with v13 coming in 2020. Since Emoji v8 (mid 2015) we've had skin tone modifiers based on the 6 point Fitzpatrick dermatology scale. And gender modifiers arrived in v11 (2018). This is really awesome. The Unicode Consortium has made a huge effort better reflect and incorporate human diversity including cultural practices. Naturally there's always more that can be done, but overall I'm impressed that the consortium and implementing vendors have managed thus far.

By now nearly everyone has seen emojis with various skin tones. This is actually created using skin modifier characters with a base emoji. Other modifiers are often available, including hair color.

To use a modifier, just append your desired base emoji with one of the five skin color modifiers \u{1F466}\u{1F3FE}.


 → 
 



Code Skin Tone Modifier Samples
U+1F3FB EMOJI MODIFIER FITZPATRICK TYPE-1-2
U+1F3FC EMOJI MODIFIER FITZPATRICK TYPE-3
U+1F3FD EMOJI MODIFIER FITZPATRICK TYPE-4
U+1F3FE EMOJI MODIFIER FITZPATRICK TYPE-5
U+1F3FF EMOJI MODIFIER FITZPATRICK TYPE-6

  • There are also Glyph direction modifiers, but I haven't found many details.

Combinations: Emoji Sequences using ZWJ

Whats really cool is something called Emoji Sequences. Basically combining standalone emojis to create new emojis. The trick is to use the Zero Width Joiner (ZWJ) U+200D character between standalone emojis in a sequence to indicate that they should be combined into a single emoji, when available. There are two standard component groups often available, gender, and hair type:



Code Gender Component Image
U+2640 Female ♀️
U+2642 Male ♂️



Code Hair Type Component Image
U+1F9B0 Red Hair Component
U+1F9B1 Curly Hair Component
U+1F9B2 Bald Head Component
U+1F9B3 White Hair Component

So now we can create for example, Woman Technologist: Medium Skin Tone.

U+1F469 + U+1F3FD + U+200D + U+1F4BB



 → 
 

Again, notice that the ZWJ character isn't needed between the skin tone modifier. The ZWJ character merges two standalone base emojis together into one.

Multi-Person Groupings

But what's even cooler is the support for what's called Multi-Person Groupings, like a a family, or two people holding hands. Now you can combine a somewhat diverse family or adult relationship. Again, the trick is to add a ZWJ character between each standalone emoji to indicate they should be grouped together, if available.

Basically:

Code Points Recipe Combined
U+1F469 U+200D U+2764 U+FE0F U+200D U+1F469 👩 ❤️‍ ❤️‍ ❤️‍ 👩 couple with heart: woman, woman
U+1F468 U+200D U+1F468 U+200D U+1F467 U+200D U+1F466


// Try it out in your browser's console.
`\u{1F468}\u{200D}\u{1F469}\u{200D}\u{1F467}\u{200D}\u{1F466}`


There are quite a few multi-person glyphs by spec, but here are the few glyphs actually implemented by the major vendors.

Hex Char CLDR Name
U+1F91D 🤝 handshake
U+1F46F 👯️‍ people with bunny ears
U+1F93C 🤼️‍ people wrestling
U+1F46B 👫 woman and man holding hands
U+1F46C 👬 men holding hands
U+1F46D 👫 women holding hands
U+1F48F 💏 kiss
U+1F491 💑 couple with heart
U+1F46A 👪 family, 1-2 adults, 0-2 children

A Note on Sequence Ordering

When representing emoji ZWJ sequences for an individual person, the following order should be used:

  • Base Emoji
  • Emoji modifier or emoji presentation selector
  • Hair component
  • Gender Modifier
  • Direction indicator—see Section 2.8, Emoji Glyph Facing Direction

Great so we covered emoji modifiers, emoji sequences, and emoji multi-person glyphs. Pretty cool. So where do things start to get messy?

Limitations

As mentioned earlier, the Unicode consortium made a big push to go gender neutral where possible. Since vendors actually implement the artwork of the emoji, the Emoji spec mostly just adds recommendations notices and renames certain emoji from a gendered term to "person". Great. Allowing adult human emojis to be modified with skin tones (6), gender (3), and particular hair (4 or 5*) is absolutely amazing.  But for one base emoji, we've got 90 variations that we need PNG images for. With larger sequences like a family grouping of up to 4 humans, we'd be looking at 810 variations, so naturally vendors only implement a select few.

You might have also noticed that age isn't a modifier or emoji component, instead you must use the base emoji for "child" and "older" persons.  I guess Apple, Facebook, and Google couldn't face having to implement further variations like children swimming  (🏊‍♂️) and older folks rock climbing (🧗‍♀️).

With only a few exceptions, the spec won't handle the case for multiple skin colors. Somewhat similarly, you generally cannot mix gendered and gender-neutral multi-person groupings. For the most part, only humans with the same demographic interact with eachother (👯‍♀️)— Seemingly the holding hands emoji being the one exception ()

Gendered Emojis without gender-neurtal alternatives
Emoji Name Codepoints Comments
Prince 1F934 No genderless alternative
Princess 1F478 No genderless alternative
Woman Dancing 1F483 No genderless alternative
Man Dancing 1F57A No genderless alternative
Pregnant Woman 1F930 No alternatives
Woman with Headscarf 1F9D5 Male modifier available
Bearded Man 1F9D4 No alternatives
Breastfeeding Woman 1F931 No alternatives, breasted or bottled
Man in Tuxedo 1F935 Pairs with Woman in Veil

So why don't modifiers and components work to create some emojis?

  • Woman in tuxedo?
  • Person in headress?
  • (Breast?)feeding Man?

It turns out that the Unicode consortium put out a Recommended Emoji ZWJ Sequences guide to help vendors figure out which sequences to actually implement given an explosive combination of possibilities. And it looks like they are trying to figure out a path to address gender-diverse inclusion in version 14- with some of the easier specification choices arriving in v13. Some of the officially recommended ZWJ sequences are particularly exciting.

Code Points Recipe Combined Name
U+1F3F4 U+200D U+2620 U+FE0F couple with heart: woman, woman Priate Flag
U+1F3F3 U+FE0F U+200D U+1F308 Rainbow Flag
1F3F3 FE0F 200D 26A7 FE0F Transgender Flag
U+1F415 U+200D U+1F9BA Service Dog
U+1F43B U+200D U+2744 - Polar Bear (v13)
U+1F408 U+200D U+2B1B - Black Cat (v13)

Defining a specification and mandating every vendor implement and package every emoji isn't currently feasible, and not the right way to look at the scope of Emojis. The spec works the other way around. Vendors implement the coolest, in-demand emojis, then figure out how to attach meaning to them, inline with Unicode's Emoji specification. It seems surprisingly reasonable method to classify and index meaning if everyone follows along. Still though. Will a pirate puppy be a newly specified emoji, or just the combination of dog+safety vest+skull? Or is that a zombie dog?

Does any of it matter if everyone ends up continuing to only use 😂 "Tears of Joy" (1F602) or the cat variant, 😹"Cat With Tears of Joy" (1F639)?  

Closing

I thought I knew about Unicode code points vs code units, and how to properly count character length in languages like JavaScript. But now with Emojis having various modifiers, joiners, and presentation variants, I don't think any coding ecosystem is ready yet for the can of worms that is counting emojis. It's hard enough to figure out how many emojis there are. Perhaps I'll take shot at counting emoji length over the winter holiday, it sounds like an exciting challenge after all.

[..."👨‍👩‍👧‍👦"] /// ["👨", "‍", "👩", "‍", "👧", "‍", "👦"]

"👨‍👩‍👧‍👦".length // 11


Going Further

As mentioned earlier on, checkout my two pervious articles, Unicode is Awesome, and Hacking GitHub with Unicode's Turkish Dotless 'i'. You should also dive in and check out Unicode's Emoji spec— it's actually a fairly readable document. If you just want to skim through the modifier components, try out a few of the sections topics below.

Unicode is Awesome
A curated list of delightful Unicode tidbits, packages and resources. Foreword Unicode is Awesome! Prior to Unicode, international communication was grueling- everyone had defined their separate extended character set in the upperhalf of ASCII (called Code Pages) that would conflict- Just think, Ge…
Hacking GitHub with Unicode’s dotless ‘i’.
From combining emoji marks and astral planes, Unicode is under appreciated and poorly understood. One lesser known attack vector is Unicode Case Mapping Collisions— an edge case that many of the best devs don’t understand— even at Github.



Plug: Wisdom is a front-end monitoring tool that helps front-end developers catch and fix bugs faster by combining session replay, error tracking, and developer tools— all in one amazing package.

Wisdom

We develop the systems behind Wisdom. If you're into this sort of stuff, you should apply to join our team.