Nicholas Carlini Explained

Nicholas Carlini
Field:	Computer Security
Work Institutions:	Google DeepMind
Alma Mater:	University of California, Berkeley (PhD)
Thesis Title:	Evaluation and Design of Robust Neural Network Defenses
Thesis Year:	2018
Doctoral Advisor:	David A. Wagner

Nicholas Carlini is an American researcher affiliated with Google DeepMind who has published research in the fields of computer security and machine learning. He is known for his work on adversarial machine learning, particularly his work on the Carlini & Wagner attack in 2016. This attack was particularly useful in defeating defensive distillation, a method used to increase model robustness, and has since been effective against other defenses against adversarial input.

In 2018, Carlini demonstrated an attack on Mozilla's DeepSpeech model, showing that hidden commands could be embedded in speech inputs, which the model would execute even if they were inaudible to humans. He also led a team at UC Berkeley that successfully broke seven out of eleven defenses against adversarial attacks presented at the 2018 International Conference on Learning Representations.

In addition to his work on adversarial attacks, Carlini has made significant contributions to understanding the privacy risks of machine learning models. In 2020, he revealed that large language models, like GPT-2, could memorize and output personally identifiable information. His research demonstrated that this issue worsened with larger models, and he later showed similar vulnerabilities in generative image models, such as Stable Diffusion.

Life and career

Nicholas Carlini obtained his Bachelor of Arts in Computer Science and Mathematics from the University of California, Berkeley, in 2013.^[1] He then continued his studies at the same university, where he pursued a PhD under the supervision of David Wagner, completing it in 2018.^[2] ^[3] Carlini became known for his work on adversarial machine learning. In 2016, he worked alongside Wagner to develop the Carlini & Wagner attack, a method of generating adversarial examples against machine learning models. The attack was proved to be useful against defensive distillation, a popular mechanism where a student model is trained based on the features of a parent model to increase the robustness and generalizability of student models. The attack gained popularity when it was shown that the methodology was also effective against most other defenses, rendering them ineffective.^[4] ^[5] In 2018, Carlini demonstrated an attack against Mozilla Foundation's DeepSpeech model where he showed that by hiding malicious commands inside normal speech input the speech model would respond to the hidden commands even when the commands were not discernible by humans.^[6] ^[7] In the same year, Carlini and his team at UC Berkeley showed that out of the 11 papers presenting defenses to adversarial attacks accepted in that year's ICLR conference, seven of the defenses could be broken.^[8]

Since 2021, he and his team have been working on large-language models, creating a questionnaire where humans typically scored 35% whereas AI models scored in the 40%, with GPT-3 getting 38% which could be improved to 40% through few shot prompting. The best performer in the test was UnifiedQA, a model developed by Google specifically for answer questions and answer sets.^[9] Carlini has also developed methods to cause large language models like ChatGPT to answer harmful questions like how to construct bombs.^[10] ^[11]

He is also known for his work studying the privacy of machine learning models. In 2020, he showed for the first time that large language models would memorize some of the text data that they were trained on. For example, he found that GPT-2 could output personally identifiable information.^[12] He then led an analysis of larger models and studied how memorization increased with model size. Then, in 2022 he showed the same vulnerability in generative image models, and specifically diffusion models, by showing that Stable Diffusion could output images of people's faces that it was trained on.^[13] Following on this, Carlini then showed that ChatGPT would also sometimes output exact copies of webpages it was trained on, including personally identifiable information.^[14] Some of these studies have since been referenced by the courts in debating the copyright status of AI models.^[15]

Other work

Carlini received the Best of Show award at the 2020 IOCCC for implementing a tic-tac-toe game entirely with calls to printf, expanding on work from a research paper of his from 2015. The judges commented on his submission "This year's Best of Show (carlini) is such a novel way of obfuscation that it would be worth of a special mention in the (future) Best of IOCCC list!".^[16]

Awards

Best Student Paper Award, IEEE S&P 2017 ("Towards Evaluating the Robustness of Neural Networks")^[17]
Best Paper Award, ICML 2018 ("Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples")^[18]
Distinguished Paper Award, USENIX 2021 ("Poisoning the Unlabeled Dataset of Semi-Supervised Learning")^[19]
Distinguished Paper Award, USENIX 2023 ("Tight Auditing of Differentially Private Machine Learning")^[20]
Best Paper Award, ICML 2024 ("Stealing Part of a Production Language Model")^[21]
Best Paper Award, ICML 2024 ("Considerations for Differentially Private Learning with Large-Scale Public Pretraining")

Notes and References

Web site: Nicholas Carlini . June 4, 2024 . nicholas.carlini.com . June 3, 2024 . https://web.archive.org/web/20240603235028/https://nicholas.carlini.com/ . live .
Web site: Nicholas Carlini . June 4, 2024 . AI for Good . en-US . June 4, 2024 . https://web.archive.org/web/20240604055509/https://aiforgood.itu.int/speaker/nicholas-carlini/ . live .
Web site: Graduates . June 4, 2024 . people.eecs.berkeley.edu.
Book: Pujari . Medha . Cherukuri . Bhanu Prakash . Javaid . Ahmad Y . Sun . Weiqing . An Approach to Improve the Robustness of Machine Learning based Intrusion Detection System Models Against the Carlini-Wagner Attack . July 27, 2022 . 2022 IEEE International Conference on Cyber Security and Resilience (CSR) . https://ieeexplore.ieee.org/document/9850306 . IEEE . 62–67 . 10.1109/CSR54599.2022.9850306 . 978-1-6654-9952-1 . June 4, 2024 . February 2, 2023 . https://web.archive.org/web/20230202055428/https://ieeexplore.ieee.org/document/9850306/ . live .
Web site: Schwab . Katharine . December 12, 2017 . How To Fool A Neural Network . June 4, 2023 . . October 30, 2023 . https://web.archive.org/web/20231030175619/https://www.fastcompany.com/90153084/how-to-fool-a-neural-network . live .
News: Smith . Craig S. . May 10, 2018 . Alexa and Siri Can Hear This Hidden Command. You Can't. . June 4, 2024 . The New York Times . en-US . 0362-4331 . January 25, 2021 . https://web.archive.org/web/20210125172430/https://www.nytimes.com/2018/05/10/technology/alexa-siri-hidden-command-audio-attacks.html . live .
Web site: As voice assistants go mainstream, researchers warn of vulnerabilities . June 4, 2024 . CNET . en.
Simonite . Tom . AI Has a Hallucination Problem That's Proving Tough to Fix . June 4, 2024 . Wired . en-US . 1059-1028 . June 11, 2023 . https://web.archive.org/web/20230611170023/https://www.wired.com/story/ai-has-a-hallucination-problem-thats-proving-tough-to-fix/ . live .
Hutson . Matthew . March 3, 2021 . Robo-writers: the rise and risks of language-generating AI . Nature . en . 591 . 7848 . 22–25 . 10.1038/d41586-021-00530-0. 33658699 . 2021Natur.591...22H .
Web site: Conover . Emily . February 1, 2024 . AI chatbots can be tricked into misbehaving. Can scientists stop it? . July 26, 2024 . . en-US.
News: Metz . Cade . July 27, 2023 . Researchers Poke Holes in Safety Controls of ChatGPT and Other Chatbots . July 26, 2024 . The New York Times . en-US . 0362-4331.
Web site: What does GPT-3 "know" about me? . July 26, 2024 . MIT Technology Review . en.
Web site: Edwards . Benj . February 1, 2023 . Paper: Stable Diffusion "memorizes" some images, sparking privacy concerns . July 26, 2024 . Ars Technica . en-us.
Newman . Lily Hay . ChatGPT Spit Out Sensitive Data When Told to Repeat 'Poem' Forever . July 26, 2024 . Wired . en-US . 1059-1028 . July 26, 2024 . https://web.archive.org/web/20240726092303/https://www.wired.com/story/chatgpt-poem-forever-security-roundup/ . live .
J. DOE 1. United states district court northern district of California. https://storage.courtlistener.com/recap/gov.uscourts.cand.403220/gov.uscourts.cand.403220.253.0_1.pdf . https://web.archive.org/web/20240709155147/https://storage.courtlistener.com/recap/gov.uscourts.cand.403220/gov.uscourts.cand.403220.253.0_1.pdf. live.
Web site: The 27th IOCCC . July 26, 2024 . www.ioccc.org . September 8, 2024 . https://web.archive.org/web/20240908180733/https://www.ioccc.org/2020/index.html . live .
Web site: IEEE Symposium on Security and Privacy 2017 . live . https://web.archive.org/web/20240902002724/https://www.ieee-security.org/TC/SP2017/awards.html . September 2, 2024 . September 2, 2024 . www.ieee-security.org.
Web site: ICML 2018 Awards . live . https://web.archive.org/web/20240902002725/https://icml.cc/Conferences/2018/Awards . September 2, 2024 . September 2, 2024 . icml.cc.
Carlini . Nicholas . 2021 . Poisoning the Unlabeled Dataset of Learning . USENIX Security 2021 . en . 1577–1592 . 978-1-939133-24-3.
Nasr . Milad . Hayes . Jamie . Steinke . Thomas . Balle . Borja . Tramèr . Florian . Jagielski . Matthew . Carlini . Nicholas . Terzis . Andreas . 2023 . Tight Auditing of Differentially Private Machine Learning . live . USENIX Security 2023 . en . 1631–1648 . 978-1-939133-37-3 . https://web.archive.org/web/20240908180646/https://www.usenix.org/conference/usenixsecurity23/presentation/nasr . September 8, 2024 . September 2, 2024.
Web site: ICML 2024 Awards . live . https://web.archive.org/web/20240908180823/https://icml.cc/virtual/2024/awards_detail . September 8, 2024 . September 2, 2024 . icml.cc.