At issue is a six-color scale known as Fitzpatrick Skin Type
(FST), which dermatologists have used since the 1970s. Tech companies now rely
on it to categorize people and measure whether products such as facial
recognition systems or smartwatch heart-rate sensors perform equally well
across skin tones.
Critics say FST, which includes four categories for
"white" skin and one apiece for "black" and
"brown," disregards diversity among people of color. Researchers at
the U.S. Department of Homeland Security, during a federal technology standards
conference last October, recommended abandoning FST for evaluating facial
recognition because it poorly represents color range in diverse populations.
In response to Reuters' questions about FST, Google, for the
first time and ahead of peers, said that it has been quietly pursuing better
measures.
"We are working on alternative, more inclusive,
measures that could be useful in the development of our products, and will
collaborate with scientific and medical experts, as well as groups working with
communities of color," the company said, declining to offer details on the
effort.
The controversy is part of a larger reckoning over racism
and diversity in the tech industry, where the workforce is more white than in
sectors like finance. Ensuring technology works well for all skin colors, as
well different ages and genders, is assuming greater importance as new
products, often powered by artificial intelligence (AI), extend into sensitive
and regulated areas such as healthcare and law enforcement.
Companies know their products can be faulty for groups that
are under-represented in research and testing data. The concern over FST is
that its limited scale for darker skin could lead to technology that, for
instance, works for golden brown skin but fails for espresso red tones.
Numerous types of products offer palettes far richer than
FST. Crayola last year launched 24 skin tone crayons, and Mattel Inc's Barbie
Fashionistas dolls this year cover nine tones.
The issue is far from academic for Google. When the company
announced in February that cameras on some Android phones could measure pulse
rates via a fingertip, it said readings on average would err by 1.8% regardless
of whether users had light or dark skin.
The company later gave similar warranties that skin type
would not noticeably affect results of a feature for filtering backgrounds on
Meet video conferences, nor of an upcoming web tool for identifying skin
conditions, informally dubbed Derm Assist
Those conclusions derived from testing with the six-tone
FST.
'STARTING POINT'
The late Harvard University dermatologist Dr. Thomas
Fitzpatrick invented the scale to personalize ultraviolet radiation treatment
for psoriasis, an itchy skin condition. He grouped the skin of
"white" people as Roman numerals I to IV by asking how much sunburn
or tan they developed after certain periods in sun.
A decade later came type V for "brown" skin and VI
for "black." The scale is still part of U.S. regulations for testing
sunblock products, and it remains a popular dermatology standard for assessing
patients' cancer risk and more.
Some dermatologists say the scale is a poor and overused
measure for care, and often conflated with race and ethnicity.
"Many people would assume I am skin type V, which
rarely to never burns, but I burn," said Dr. Susan Taylor, a University of
Pennsylvania dermatologist who founded Skin of Color Society in 2004 to promote
research on marginalized communities. "To look at my skin hue and say I am
type V does me disservice."
Technology companies, until recently, were unconcerned.
Unicode, an industry association overseeing emojis, referred to FST in 2014 as
its basis for adopting five skin tones beyond yellow, saying the scale was
"without negative associations."
A 2018 study titled "Gender Shades," which found
facial analysis systems more often misgendered people with darker skin,
popularized using FST for evaluating AI. The research described FST as a
"starting point," but scientists of similar studies that came later
told Reuters they used the scale to stay consistent.
"As a first measure for a relatively immature market,
it serves its purpose to help us identify red flags," said Inioluwa
Deborah Raji, a Mozilla fellow focused on auditing AI.
In an April study testing AI for detecting deepfakes,
Facebook Inc researchers wrote FST "clearly does not encompass the
diversity within brown and black skin tones." Still, they released videos
of 3,000 individuals to be used for evaluating AI systems, with FST tags
attached based on the assessments of eight human raters.
The judgment of the raters is central. Facial recognition
software startup AnyVision last year gave celebrity examples to raters: former
baseball great Derek Jeter as a type IV, model Tyra Banks a V and rapper 50
Cent a VI.
AnyVision told Reuters it agreed with Google's decision to
revisit use of FST, and Facebook said it is open to better measures.
Microsoft Corp and smartwatch makers Apple Inc and Garmin
Ltd reference FST when working on health-related sensors.
But use of FST could be fueling "false assurances"
about heart rate readings from smartwatches on darker skin, University of
California San Diego clinicians, inspired by the Black Lives Matter social
equality movement, wrote in the journal Sleep last year.
Microsoft acknowledged FST's imperfections. Apple said it
tests on humans across skin tones using various measures, FST only at times
among them. Garmin said due to wide-ranging testing it believes readings are
reliable.
Victor Casale, who founded makeup company Mob Beauty and
helped Crayola on the new crayons, said he developed 40 shades for foundation,
each different from the next by about 3%, or enough for most adults to
distinguish.
Color accuracy on electronics suggest tech standards should
have 12 to 18 tones, he said, adding, "you can't just have six."