Human image synthesis explained

Human image synthesis is technology that can be applied to make believable and even photorealistic renditions[1] [2] of human-likenesses, moving or still. It has effectively existed since the early 2000s. Many films using computer generated imagery have featured synthetic images of human-like characters digitally composited onto the real or other simulated film material. Towards the end of the 2010s deep learning artificial intelligence has been applied to synthesize images and video that look like humans, without need for human assistance, once the training phase has been completed, whereas the old school 7D-route required massive amounts of human work.

Timeline of human image synthesis

See main article: History of computer animation and Timeline of computer animation in film and television.

Key breakthrough to photorealism: reflectance capture

In 1999 Paul Debevec et al. of USC did the first known reflectance capture over the human face with their extremely simple light stage. They presented their method and results in SIGGRAPH 2000.[30]

The scientific breakthrough required finding the subsurface light component (the simulation models are glowing from within slightly) which can be found using knowledge that light that is reflected from the oil-to-air layer retains its polarization and the subsurface light loses its polarization. So equipped only with a movable light source, movable video camera, 2 polarizers and a computer program doing extremely simple math and the last piece required to reach photorealism was acquired.[30]

For a believable result both light reflected from skin (BRDF) and within the skin (a special case of BTDF) which together make up the BSDF must be captured and simulated.

Capturing

Synthesis

The whole process of making digital look-alikes i.e. characters so lifelike and realistic that they can be passed off as pictures of humans is a very complex task as it requires photorealistically modeling, animating, cross-mapping, and rendering the soft body dynamics of the human appearance.

Synthesis with an actor and suitable algorithms is applied using powerful computers. The actor's part in the synthesis is to take care of mimicking human expressions in still picture synthesizing and also human movement in motion picture synthesizing. Algorithms are needed to simulate laws of physics and physiology and to map the models and their appearance, movements and interaction accordingly.

Often both physics/physiology based (i.e. skeletal animation) and image-based modeling and rendering are employed in the synthesis part. Hybrid models employing both approaches have shown best results in realism and ease-of-use. Morph target animation reduces the workload by giving higher level control, where different facial expressions are defined as deformations of the model, which facial allows expressions to be tuned intuitively. Morph target animation can then morph the model between different defined facial expressions or body poses without much need for human intervention.

Using displacement mapping plays an important part in getting a realistic result with fine detail of skin such as pores and wrinkles as small as 100 μm.

Machine learning approach

In the late 2010s, machine learning, and more precisely generative adversarial networks (GAN), were used by NVIDIA to produce random yet photorealistic human-like portraits. The system, named StyleGAN, was trained on a database of 70,000 images from the images depository website Flickr. The source code was made public on GitHub in 2019.[31] Outputs of the generator network from random input were made publicly available on a number of websites.[32] [33]

Similarly, since 2018, deepfake technology has allowed GANs to swap faces between actors; combined with the ability to fake voices, GANs can thus generate fake videos that seem convincing.[34]

Applications

Main applications fall within the domains of stock photography, synthetic datasets, virtual cinematography, computer and video games and covert disinformation attacks.[35] [33] Some facial-recognition AI use images generated by other AI as synthetic data for training.[36]

Furthermore, some research suggests that it can have therapeutic effects as "psychologists and counselors have also begun using avatars to deliver therapy to clients who have phobias, a history of trauma, addictions, Asperger’s syndrome or social anxiety." The strong memory imprint and brain activation effects caused by watching a digital look-alike avatar of yourself is dubbed the Doppelgänger effect.[37] The doppelgänger effect can heal when covert disinformation attack is exposed as such to the targets of the attack.

Related issues

The speech synthesis has been verging on being completely indistinguishable from a recording of a real human's voice since the 2016 introduction of the voice editing and generation software Adobe Voco, a prototype slated to be a part of the Adobe Creative Suite and DeepMind WaveNet, a prototype from Google.[38] Ability to steal and manipulate other peoples voices raises obvious ethical concerns.[39]

At the 2018 Conference on Neural Information Processing Systems (NeurIPS) researchers from Google presented the work 'Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis', which transfers learning from speaker verification to achieve text-to-speech synthesis, that can be made to sound almost like anybody from a speech sample of only 5 seconds (listen).

Sourcing images for AI training raises a question of privacy as people who are used for training didn't consent.[40]

Digital sound-alikes technology found its way to the hands of criminals as in 2019 Symantec researchers knew of 3 cases where technology has been used for crime.[41] [42]

This coupled with the fact that (as of 2016) techniques which allow near real-time counterfeiting of facial expressions in existing 2D video have been believably demonstrated increases the stress on the disinformation situation.[14]

See also

Notes and References

  1. https://ieeexplore.ieee.org/document/568819 Physics-based muscle model for mouth shape control
  2. https://ieeexplore.ieee.org/document/531968 Realistic 3D facial animation in virtual space teleconferencing
  3. News: Berlin . Isabelle . Images de synthèse : palme de la longévité pour l'ombrage de Gouraud . 3 October 2024 . Interstices . 14 September 2008 . fr-FR.
  4. Web site: Images de synthèse : palme de la longévité pour l'ombrage de Gouraud. 14 September 2008 .
  5. Web site: Pighin . Frédéric . Siggraph 2005 Digital Face Cloning Course Notes . 24 May 2017.
  6. Web site: St. Andrews Face Transformer. Futility Closet. 7 December 2020. 30 January 2005.
  7. West. Marc. Changing the face of science. 7 December 2020. Plus Magazine. 4 December 2007.
  8. News: Goddard. John. The many faces of race research. 7 December 2020. thestar.com. 27 January 2010. en.
  9. http://www.ted.com/talks/paul_debevec_animates_a_photo_real_digital_face.html In this TED talk video
  10. ReForm – Hollywood's Creating Digital Clones . youtube . The Creators Project . 24 May 2017 .
  11. Web site: Debevec . Paul . Digital Ira SIGGRAPH 2013 Real-Time Live . 24 May 2017 . 21 February 2015 . https://web.archive.org/web/20150221212728/http://gl.ict.usc.edu/Research/DigitalIra/ . dead .
  12. Web site: Scanning and printing a 3D portrait of President Barack Obama . University of Southern California . 2013 . 24 May 2017 . 17 September 2015 . https://web.archive.org/web/20150917140258/http://gl.ict.usc.edu/Research/PresidentialPortrait/ . dead .
  13. Web site: 'Furious 7' and How Peter Jackson's Weta Created Digital Paul Walker . Giardina . Carolyn . 25 March 2015 . . 24 May 2017 .
  14. Web site: Thies . Justus . Face2Face: Real-time Face Capture and Reenactment of RGB Videos . Proc. Computer Vision and Pattern Recognition (CVPR), IEEE . 2016 . 24 May 2017.
  15. Web site: Synthesizing Obama: Learning Lip Sync from Audio . grail.cs.washington.edu . 3 October 2024.
  16. News: Porn Producers Offer to Help Hollywood Take Down Deepfake Videos. Roettgers. Janko. 21 February 2018. Variety. 28 February 2018. en-US.
  17. Web site: Epic Games shows off amazing real-time digital human with Siren demo . Takahashi . Dean . 21 March 2018 . . 10 September 2018 .
  18. Web site: World's first AI news anchor unveiled in China . Kuo . Lily . . 9 November 2018 . 9 November 2018 .
  19. Web site: China created what it claims is the first AI news anchor — watch it in action here . Hamilton . Isobel Asher . . 9 November 2018 . 9 November 2018 .
  20. News: Fake-porn videos are being weaponized to harass and humiliate women: 'Everybody is a potential target' . Harwell . Drew . 30 December 2018 . . 14 March 2019 . In September [of 2018], Google added “involuntary synthetic pornographic imagery” to its ban list.
  21. Web site: NVIDIA Open-Sources Hyper-Realistic Face Generator StyleGAN. 9 February 2019. Medium.com. 3 October 2019.
  22. Web site: This Person Does Not Exist Is the Best One-Off Website of 2019 . Paez . Danny . 13 February 2019 . . 5 March 2018 .
  23. Web site: New state laws go into effect July 1. 24 June 2019.
  24. Web site: § 18.2–386.2. Unlawful dissemination or sale of images of another; penalty. . . 1 January 2020 .
  25. Web site: Relating to the creation of a criminal offense for fabricating a deceptive video with intent to influence the outcome of an election . 14 June 2019 . . 2 January 2020 . In this section, "deep fake video" means a video, created with the intent to deceive, that appears to depict a real person performing an action that did not occur in reality.
  26. Web site: Here Are the New California Laws Going Into Effect in 2020 . Johnson . R.J. . 30 December 2019 . . . 1 January 2020 .
  27. Web site: Mihalcik . Carrie . California laws seek to crack down on deepfakes in politics and porn . . . 4 October 2019 . 14 October 2019 .
  28. Web site: China seeks to root out fake news and deepfakes with new online content rules . 29 November 2019 . . . 8 December 2019 .
  29. Web site: China makes it a criminal offense to publish deepfakes or fake news without disclosure . Statt . Nick . 29 November 2019 . . 8 December 2019 .
  30. Book: Debevec , Paul . Paul Debevec . Acquiring the reflectance field of a human face . 145–156 . ACM . 2000 . http://dl.acm.org/citation.cfm?id=344855 . 10.1145/344779.344855 . 24 May 2017. Proceedings of the 27th annual conference on Computer graphics and interactive techniques - SIGGRAPH '00 . 978-1581132083 . 2860203 .
  31. Web site: Synced. 9 February 2019. NVIDIA Open-Sources Hyper-Realistic Face Generator StyleGAN. 4 August 2020. Synced. en-US.
  32. http://thispersondoesnotexist.com StyleGAN public showcase website
  33. Web site: Porter. Jon. 20 September 2019. 100,000 free AI-generated headshots put stock photo companies on notice. 7 August 2020. The Verge. en.
  34. News: What Is a Deepfake? . 8 June 2020 . PCMAG.com . March 2020 . en.
  35. News: Harwell. Drew. Dating apps need women. Advertisers need diversity. AI companies offer a solution: Fake people. 4 August 2020. Washington Post. en.
  36. News: Neural Networks Need Data to Learn. Even If It's Fake. . 18 June 2023 . Quanta Magazine . 2023-12-11.
  37. Web site: Murphy . Samantha . Scientific American: Your Avatar, Your Guide . Scientific American / Uni of Stanford . 2023 . .pdf . 11 December 2023.
  38. Web site: WaveNet: A Generative Model for Raw Audio . 8 September 2016 . Deepmind.com . 24 May 2017 . 27 May 2017 . https://web.archive.org/web/20170527161520/https://deepmind.com/blog/wavenet-generative-model-raw-audio/ . dead .
  39. Web site: Adobe Voco 'Photoshop-for-voice' causes concern . 7 November 2016 . . . 5 July 2016 .
  40. Web site: Rachel Metz. If your image is online, it might be training facial-recognition AI. 4 August 2020. CNN. 19 April 2019 .
  41. Web site: Fake voices 'help cyber-crooks steal cash' . 8 July 2019 . . . 16 April 2020 .
  42. News: An artificial-intelligence first: Voice-mimicking software reportedly used in a major theft . Drew . Harwell . 16 April 2020 . Washington Post . 8 September 2019 .