PhD Proposal: Understanding and Connecting Fairness, Copyright, and Watermarking Through a Security Lens
As Artificial Intelligence (AI) systems become increasingly integral to modern society, the imperative for secure, robust, and trustworthy AI has intensified. While distinct subfields such as security, fairness, and data provenance have expanded rapidly, they are often studied in isolation. This study argues that a holistic understanding of Machine Learning (ML) systems requires a rigorous examination of the interactions between these developing domains.First, we investigate the unintended cross-domain consequences of machine learning interventions. We show that fairness and output watermarking can respectively impact security and copyright enforcement in unexpected ways. For instance, fairness interventions may inadvertently introduce security vulnerabilities, while output watermarking can hinder the detection of copyrighted training data.Second, we analyze copyright compliance methods in Large Language Models (LLMs) through a security lens. We argue that failing to address the root causes of the issues these methods aim to resolve leaves models fundamentally vulnerable to adaptive attacks. For instance, we show that commonly used techniques—such as training data deduplication for preventing copyrighted content generation, and standard membership inference attacks (MIAs) for detecting copyright infringement—are insufficient, as they fail to address the core ambiguity between memorization and generalization, which adaptive attacks can exploit.Finally, in future work, we aim to develop a framework for robust image watermarking. Building on recent advances in autoregressive image generation, we plan to introduce an in-generation watermarking technique that complements state-of-the-art post-generation methods. By combining these methods into a unified hybrid strategy, we seek to improve the robustness of the watermarks and advance the broader goal of reliable data provenance.