Fun is also a means of subversion in China, especially when it comes to language. In the early 2000s, as online censors banned certain characters, computer users got around the state by switching to homophones. To mock the notion of a “harmonious society,” a Maospeak phrase popularized during Hu Jintao’s rule, they joked about crustaceans—hexie, river crab, is pronounced similarly to hexie, harmony. “Serve the people” became “smog the people.” Both contain the sound wu. The characters were neighbors in an input drop-down menu.
Alarmed by the proliferation of online sarcasm, the central government went so far as to ban homophones and other wordplay. So dissidents turned to other means of dissemination. “Activists saw new opportunities in video as it became easier and cheaper on cameras and phones to record, view, and also distribute,” said Dechen Pemba, a Tibetan human rights activist in London who edits the site High Peaks Pure Earth. But by the late aughts, the Communist Party had embarked on a quest to master speech technologies—one that ran in parallel with iFlytek’s growth as a consumer voice company.
In 2009, Meng Jianzhu, the head of China’s Ministry of Public Security, traveled to Hefei and visited iFlytek’s headquarters. According to a report posted on the central government’s website, he spoke there of the need for “public security organs to closely cooperate with technology companies” to create “prevention and control systems.” As the CCP has amped up its surveillance capabilities over the past decade, it has installed millions of cameras, introduced electronic ID cards and real-name registration online, and built tech-driven “smart” cities. iFlytek’s technology has helped the government to integrate audio signals into this network of digital surveillance, according to Human Rights Watch.
The company is emblematic of a broader Chinese government effort called “military-civil fusion,” which aims to harness advances in China’s tech sector for military might. “iFlytek is contributing to military-civil fusion quite actively,” says Elsa Kania, a fellow at the Center for a New American Security in Washington, DC, who studies artificial intelligence in China. “There are elements of the company that pursue consumer applications, but the public security, policing, and defense-oriented applications appear to be significant as well.” The company has promoted its products to the People’s Liberation Army, according to testimony that Kania presented to Congress last year. She adds, “It’s not clear that there are firewalls or divisions” between consumer and other state-oriented applications. (The spokesperson reached through Chartwell Strategy Group said that iFlytek does not develop military technologies and would not comment on the company’s security work or on whether data gathered through iFlytek’s consumer products is firewalled from its government projects. )
For the CCP, monitoring speech appears to be about more than censorship. “The collection of voice and video data assists with identifying people, networks, how people speak, what they care about, and what are the trends,” says Samantha Hoffman, an analyst at the Australian Strategic Policy Institute’s Cyber Centre in Canberra.
iFlytek has patented a system that can sift through large volumes of audio and video in order to identify files that have been copied or reposted—part of an operation that the patent explains as “very important in information security and monitoring public opinion.” iFlytek responded that “analyzing audio and video data can have a number of potential applications, including identifying popular songs, detecting spam callers, etc.”
But iFlytek does enable security work. In 2012 the Ministry of Public Security purchased machines from iFlytek focused on intelligent voice technology. The ministry chose Anhui province, where iFlytek is headquartered, as one of the pilot locations for compiling a voice-pattern database—a catalog of people’s unique speech that would enable authorities to identify speakers by the sound of their voice.
The project relies on an iFlytek product called the Forensic Intelligent Audio Studio, a workstation that includes speakers, a microphone, and a desktop tower. The unit, which according to a 2016 local government procurement announcement sells for around $1,700, can identify people based on the unique characteristics of their voices. An iFlytek white paper uploaded online in 2013 touts voiceprint or speaker recognition as the “only biometric identification method that can be operated remotely,” noting that “in the defense field, voiceprint identification technology can detect whether there are key speakers in a telephone conversation and then track the content of the conversation.” The workstation can take a snippet of audio, compare it against the voices of 200 speakers, and pick out the person talking in under two seconds, according to the white paper.