I am a Ph.D. candidate in Computer Science at University of Maryland College Park. My research advisor is Prof. Tudor
Dumitras. My broad research focus is on adversarial machine learning. Specifically, I develop methods to digest the hidden information within deep neural networks into intuitive and often security-related metrics, such as overthinking. I also have done work in exploring practical threat models against ML systems, such as sneaky poisoning attacks or hardware-based attacks, and I recently started working in ML privacy, including differential privacy. As a huge psychology geek, I am hoping to utilize the concepts and techniques from cognitive psychology in my research.
My research life started in summer 2016, when I joined Prof. Dumitras' team in MC2 as a research intern. In MC2, I had the opportunity to see a great research environment. After returning to Turkey and finishing my undergraduate degree, I decided to come back to UMD and pursue a Ph.D. degree with Prof. Dumitras.
I usually just go by Can, pronounced very similar to 'John'.
Research Projects
Practical targeted poisoning attacks and defenses against machine learning (2016)
Machine learning models learn from training data to make inferences on unseen data. This mode of learning creates an opportunity for the adversaries to compromise the model by injecting 'poisonous' data into the training set. In this project, we study such poisoning attacks, especially the 'targeted' poisoning attacks, in which the attacker wants the model
to make mistakes only on specific unseen future inputs. Our attack, 'StingRay', is first of its kind as it is a practical attack with realistic constraints, such as inconspicuousness.
In addition to developing the attack, we also investigate a practical defense against this threat, 'Inception Guard', that leverages influence of individual training instances and propagates trust between them to eliminate 'unreliable' instances. Some of the results of this project are presented in our USENIX Security Symposium 2018 paper, When Does Machine Learning FAIL? Generalized Transferability for Evasion and Poisoning Attacks.
Defining realistic threat models for machine learning systems (2017)
The recent developments in machine learning models has also sparked the research into the security of such models. Previous work has proposed many threats, such as test-time or training time attacks, and defenses against these threats. One of the key elements of security research is defining the adversaries precisely and unambiguously. However, we observed
that existing work in adversarial machine learning lacks a unifying adversary definition that causes discordance between individual studies. Furthermore, we identified that existing threat models are not reflective of the practical capabilities of the adversaries. In this project, we address these two issues by designing a unifying framework to define capabilities of the adversaries against machine learning. Our 'FAIL' (Features, Algorithms, Instances, Leverage) framework is a step towards precise threat models in adversarial machine learning and it promotes principled research by building a common ground. The results of this project are presented in our USENIX Security Symposium 2018 paper, When Does Machine Learning FAIL? Generalized Transferability for Evasion and Poisoning Attacks, and our International Workshop on Security Protocols 2018 paper, Too Big to FAIL: What You Need to Know Before Attacking a Machine Learning System .
Hardware-based attacks against deep learning (2018) (Project Website)
Research in the attacks against deep learning mainly focuses on test-time evasion attacks or training-time poisoning attacks. However, the emergence of 'Deep Learning as a Service' business model, expands the attack surface by enabling the low-level attacks against the hardware that runs the service. Deep learning requires extensive computation and special hardware, all of which make the hardware-based attacks an emerging threat. First, we focus on cache side-channel attacks for reverse-engineering the deep learning models and stealing the potentially proprietary network architectures, by leveraging the information leaked by the cache hardware. The results of this project are presented in our ICLR2020 paper: How to 0wn the NAS in Your Spare Time. Second, we measure the vulnerability of a deep learning model to bit-flips that are facilitated by prominent RowHammer attack that can trigger random or targeted bit-flips in the physical memory. The results of this project are presented in our USENIX2018 paper: Terminal Brain Damage: Exposing the Graceless Degradation in Deep Neural Networks Under Hardware Fault Attacks.
Towards cognitive neuroscience of the deep neural networks for security (2018) (Project Website)
Our theoretical understanding of deep neural networks (DNNs) is falling behind our practical achievements. As a result, recent work demonstrated that DNNs are susceptible to be manipulated in many ways by adversaries, the ways that we cannot fully understand. Similarly, even though the inner mechanisms of human brain are not well understood, the recent brain imaging technologies, such as fMRI, gave rise to cognitive neuroscience. Cognitive neuroscience, specifically focuses on the neural connections in the brain which are involved in mental
processes and it seeks to find how cognitive activities are affected or controlled by neural circuits in the brain. In this project, we form a mapping between human cognitive tasks and the tasks a deep learning model performs. Cognitive neuroscience, by looking at the structure in the brain, aims to answer how these cognitive tasks are actuated and here we aim to answer, by analyzing the information flow in a network structure, how a network reaches to a certain decision. Our primary goal is to build a foundation to analyze the impact of the attacks on the circuitry in a deep neural network, i.e. how the information flow is altered. We believe that such analysis has a potential to improve our conceptual understanding of DNNs and pave the way for more general defensive strategies that would leverage this understanding. In our first research paper under this project, Shallow-Deep Networks: Understanding and Mitigating Network Overthinking, we formed a parallel between human cognition and DNNs; and identified that DNNs can also suffer from 'overthinking'.
Deep neural networks (DNNs) are known to be highly vulnerable to adversarial examples (AEs) that include malicious perturbations. Assumptions about the statistical differences between natural and adversarial inputs are commonplace in many detection techniques. As a best practice, AE detectors are evaluated against 'adaptive' attackers who actively perturb their inputs to avoid detection. Due to the difficulties in designing adaptive attacks, however, recent work suggests that most detectors have incomplete evaluation. We aim to fill this gap by designing a generic adaptive attack against detectors: the 'statistical indistinguishability attack' (SIA). The SIA optimizes a novel objective to craft adversarial examples (AEs) that follow the same distribution as the natural inputs with respect to DNN representations. Our objective targets all DNN layers simultaneously as we show that AEs being indistinguishable at one layer might fail to be so at other layers. The SIA is formulated around evading distributional detectors that inspect a set of AEs as a whole and is also effective against four individual AE detectors, two dataset shift detectors, and an out-of-distribution sample detector, curated from published works. This suggests that the SIA can be a reliable tool for evaluating the security of a range of detectors.
Quantization is a popular technique that transforms the parameter representation of a neural network from floating-point numbers into lower-precision ones (e.g., 8-bit integers). It reduces the memory footprint and the computational cost at inference, facilitating the deployment of resource-hungry models. However, the parameter perturbations caused by this transformation result in behavioral disparities between the model before and after quantization. For example, a quantized model can misclassify some test-time samples that are otherwise classified correctly. It is not known whether such differences lead to a new security vulnerability. We hypothesize that an adversary may control this disparity to introduce specific behaviors that activate upon quantization. To study this hypothesis, we weaponize quantization-aware training and propose a new training framework to implement adversarial quantization outcomes. Following this framework, we present three attacks we carry out with quantization: (i) an indiscriminate attack for significant accuracy loss; (ii) a targeted attack against specific samples; and (iii) a backdoor attack for controlling model with an input trigger. We further show that a single compromised model defeats multiple quantization schemes, including robust quantization techniques. Moreover, in a federated learning scenario, we demonstrate that a set of malicious participants who conspire can inject our quantization-activated backdoor. Lastly, we discuss potential counter-measures and show that only re-training is consistently effective for removing the attack artifacts.
Deep learning models often raise privacy concerns as they leak information about their training data. This leakage enables membership inference attacks (MIA) that can identify whether a data point was in a model's training set. Research shows that some 'data augmentation' mechanisms may reduce the risk by combatting a key factor increasing the leakage, overfitting. While many mechanisms exist, their effectiveness against MIAs and privacy properties have not been studied systematically. Employing two recent MIAs, we explore the lower bound on the risk in the absence of formal upper bounds. First, we evaluate 7 mechanisms and differential privacy, on three image classification tasks. We find that applying augmentation to increase the model's utility does not mitigate the risk and protection comes with a utility penalty. Further, we also investigate why popular label smoothing mechanism consistently amplifies the risk. Finally, we propose 'loss-rank-correlation' (LRC) metric to assess how similar the effects of different mechanisms are. This, for example, reveals the similarity of applying high-intensity augmentation against MIAs to simply reducing the training time. Our findings emphasize the utility-privacy trade-off and provide practical guidelines on using augmentation to manage the trade-off.
Recent increases in the computational demands of deep neural networks (DNNs), combined with the observation that most input samples require only simple models, have sparked interest in input-adaptive multi-exit architectures, such as MSDNets or Shallow-Deep Networks. These architectures enable faster inferences and could bring DNNs to low-power devices, e.g. in the Internet of Things (IoT). However, it is unknown if the computational savings provided by this approach are robust against adversarial pressure. In particular, an adversary may aim to slow down adaptive DNNs by increasing their average inference time−a threat analogous to the denial-of-service attacks from the Internet. In this paper, we conduct a systematic evaluation of this threat by experimenting with three generic multi-exit DNNs (based on VGG16, MobileNet, and ResNet56) and a custom multi-exit architecture, on two popular image classification benchmarks (CIFAR-10 and Tiny ImageNet). To this end, we show that adversarial sample-crafting techniques can be modified to cause slowdown, and we propose a metric for comparing their impact on different architectures. We show that a slowdown attack reduces the efficacy of multi-exit DNNs by 90%-100%, and it amplifies the latency by 1.5-5× in a typical
IoT deployment. We also show that it is possible to craft universal, reusable perturbations and that the attack can be effective in realistic black-box scenarios, where the attacker has limited knowledge about the victim. Finally, we show that adversarial training provides limited protection against slowdowns. These results suggest that further research is needed for defending multi-exit architectures against this emerging threat.
Machine learning algorithms are vulnerable to data poisoning attacks. Prior taxonomies that focus on specific scenarios, e.g., indiscriminate or targeted, have enabled defenses for the corresponding subset of known attacks. Yet, this introduces an inevitable arms race between adversaries and defenders. In this work, we study the feasibility of an attack-agnostic defense relying on artifacts that are common to all poisoning attacks. Specifically, we focus on a common element between all attacks: they modify gradients computed to train the model. We identify two main artifacts of gradients computed in the presence of poison: (1) their ell2 norms have significantly higher magnitudes than those of clean gradients, and (2) their orientation differs from clean gradients. Based on these observations, we propose the prerequisite for a generic poisoning defense: it must bound gradient magnitudes and minimize differences in orientation. We call this gradient shaping. As an exemplar tool to evaluate the feasibility of gradient shaping, we use differentially private stochastic gradient descent (DP-SGD), which clips and perturbs individual gradients during training to obtain privacy guarantees. We find that DP-SGD, even in configurations that do not result in meaningful privacy guarantees, increases the model's robustness to indiscriminate attacks. It also mitigates worst-case targeted attacks and increases the adversary's cost in multi-poison scenarios. The only attack we find DP-SGD to be ineffective against is a strong, yet unrealistic, indiscriminate attack. Our results suggest that, while we currently lack a generic poisoning defense, gradient shaping is a promising direction for future research.
(6) Sanghyun Hong, Michael Davinroy, Yigitcan Kaya, Dana Dachman-Soled, Tudor Dumitras: How to 0wn the NAS in Your Spare Time Accepted to ICLR2020 (26.5% Acceptance Rate)
New data processing pipelines and unique network architectures increasingly drive the success of deep learning. In consequence, the industry considers top-performing architectures as intellectual property and devotes considerable computational resources to discovering such architectures through neural architecture search (NAS). This provides an incentive
for adversaries to steal these unique architectures; when used in the cloud, to provide Machine Learning as a Service (MLaaS), the adversaries also have an opportunity to reconstruct the architectures by exploiting a range of hardware side-channels. However, it is challenging to reconstruct unique architectures and pipelines without knowing the computational graph (e.g., the layers, branches or skip connections), the architectural parameters (e.g., the number of filters in a convolutional layer) or the specific pre-processing steps (e.g. embeddings). In this paper, we design an algorithm that reconstructs the key components of a unique deep learning system by exploiting a small amount of information leakage from a cache side-channel attack, Flush+Reload. We use Flush+Reload to infer the trace of computations and the timing for each computation. Our algorithm then generates candidate computational graphs from the trace and eliminates incompatible candidates through a parameter estimation process. We implement our algorithm on PyTorch and Tensorflow. We demonstrate experimentally that we can reconstruct MalConv, a novel data pre-processing pipeline for malware detection, and ProxylessNAS-CPU, a novel network architecture for the ImageNet classification optimized to run
on CPUs, without knowing the architecture family. In both cases, we achieve 0% error. These results suggest hardware side channels are a practical attack vector against MLaaS, and more efforts should be devoted to understanding their impact on the security of deep learning systems.
Deep neural networks (DNNs) have been shown to tolerate "brain damage": cumulative changes to the network's parameters (e.g., pruning, numerical perturbations) typically result in a graceful degradation of classification accuracy. However, the limits of this natural resilience are not well understood in the presence of small adversarial changes to the DNN parameters' underlying memory representation, such as bit-flips that may be induced by hardware fault attacks. We study the effects of bitwise corruptions on 19 DNN models---six architectures on three image classification tasks---and we show that most models have at least one parameter that, after a specific bit-flip in their bitwise representation, causes an accuracy loss of over 90%. We employ simple heuristics to efficiently identify the parameters likely to be vulnerable. We estimate that 40-50% of the parameters in a model might lead to an accuracy drop greater than 10% when individually subjected to such single-bit perturbations. To demonstrate how an adversary could take advantage of this vulnerability, we study the impact of an exemplary hardware fault attack, Rowhammer, on DNNs. Specifically, we show that a Rowhammer enabled attacker co-located in the same physical machine can inflict significant accuracy drops (up to 99%) even with single bit-flip corruptions and no knowledge of the model. Our results expose the limits of DNNs' resilience against parameter perturbations induced
by real-world fault attacks. We conclude by discussing possible mitigations and future research directions towards fault attack-resilient DNNs.
We characterize a prevalent weakness of deep neural networks (DNNs)—overthinking—which occurs when a network can reach correct predictions before its final layer. Overthinking is computationally wasteful, and it can also be destructive when, by the final layer, the correct prediction changes into a misclassification. Understanding overthinking requires studying how each prediction evolves during a network’s forward pass, which conventionally is opaque. For prediction transparency, we propose the Shallow-Deep Network (SDN), a generic modification to off-the-shelf DNNs that introduces internal classifiers. We apply SDN to four modern architectures, trained on three image classification tasks, to characterize the overthinking problem. We show that SDNs can mitigate the wasteful effect of overthinking with confidence-based early exits, which reduce the average inference cost by more than 50% and preserve the accuracy. We also find that the destructive effect occurs for 50% of misclassifications on natural inputs and that it can be induced, adversarially, with a recent backdooring
attack. To mitigate this effect, we propose a new confusion metric to quantify the internal disagreements that will likely lead to misclassifications.
Recent results suggest that attacks against supervised machine learning systems are quite effective, while defenses are easily bypassed by new attacks. However, the specifications for machine learning systems currently lack precise adversary definitions, and the existing attacks make diverse, potentially unrealistic assumptions about the strength of the adversary who launches them. We propose the FAIL attacker model, which describes the adversary's knowledge and control along four dimensions. Our model allows us to consider a wide range
of weaker adversaries who have limited control and incomplete knowledge of the features, learning algorithms and training instances utilized. To evaluate the utility of the FAIL model, we consider the problem of conducting targeted poisoning attacks in a realistic setting: the crafted poison samples must have clean labels, must be individually and collectively inconspicuous, and must exhibit a generalized form of transferability, defined by the FAIL model. By taking these constraints into account, we design StingRay, a targeted poisoning attack that is practical against 4 machine learning applications, which use 3 different learning algorithms, and can bypass 2 existing defenses. Conversely, we show that a prior evasion attack is less effective under generalized transferability. Such attack evaluations, under the FAIL adversary model, may also suggest promising directions for future defenses.
There is an emerging arms race in the field of adversarial machine learning (AML). Recent results suggest that machine learning (ML) systems are vulnerable to a wide range of attacks; meanwhile, there are no systematic defenses. In this position paper we argue that to make progress toward such defenses, the specifications for machine learning systems must include precise adversary definitions—a key requirement in other fields, such as cryptography or network security. Without common adversary definitions, new AML attacks risk making
strong and unrealistic assumptions about the adversary’s capabilities. Furthermore, new AML defenses are evaluated based on their robustness against adversarial samples generated by a
specific attack algorithm, rather than by a general class of adversaries. We propose the FAIL adversary model, which describes the adversary’s knowledge and control along four dimensions: data Features, learning Algorithms, training Instances and crafting Leverage. We analyze several common assumptions often implicit, from the AML literature, and we argue that the FAIL model can represent and generalize the adversaries considered in these references. The FAIL model allows us to consider a range of adversarial capabilities and enables systematic comparisons of attacks against ML systems, providing a clearer picture of the security threats that these attacks raise. By evaluating how much a new AML attack’s success depends on the strength of the adversary along each of the FAIL dimensions, researchers will be able to reason about the real effectiveness of the attack. Additionally, such evaluations may suggest
promising directions for investigating defenses against the ML threats.
I was a reviewer for NeurIPS2022, ICML2022, ICLR2022, NeurIPS2021, ICML2021, ICML2020, NeurIPS2020 and ACM TOPS 2020.
I had the chance to be an external reviewer for: USENIX2017, CCS2017, NDSS2018, IEEE S&P 2018, USENIX2018, RAID2018, CCS2018, NDSS2019, RAID2019, NDSS2020, USENIX 2022.
In the summer of 2018, I supervised an REU (Research Experiences for Undergraduates) group and guided another one on two different projects regarding the security of deep learning.
I organized a security reading group in the Maryland Cybersecurity Center for Fall 2018.
Awards
Accepted into the Clark School Future Faculty Program that prepares selected fellows to successful academic careers. (2022).
Received an Honorable Mention in 2019 National Science Foundation (NSF) Graduate Research Fellowship Program (GRFP) competition.
Received the Dean's Fellowship that provides $2500 each year to new graduate students. (2017 - 2018)
BSc
Computer Engineering Bilkent University
September 2012 - January 2017 GPA: 3.62 / 4.00
Received a comprehensive scholarship that includes a full tuition waiver and a stipend. (2012 - 2017)
Internship at the cyber-security department. Developed tools for continuous integration, source code management and code review. Worked on honeypot and virtualization systems.
Worked in the ATES (Smart Intrusion Detection System) project, using C/C++ as the main development language. Developed
dynamically loaded modules that control and monitor various security tools such as honeypots and IDS/IPS systems.
Worked on attacks and defenses against machine learning especially targeted poisoning attacks and data sanitation defenses. Developed a reputation system for machine learning data as a part of the poisoning defense. Co-authored a research paper.
Graduate Courses
CMSC828N - Database System Architecture and Implementation (Prof. Daniel Abadi - Fall 2017)
ENEE657 - Computer Security (Prof. Tudor Dumitras - Fall 2017)
CMSC828M - Applied Mechanism Design for Social Good (Prof. John Dickerson - Spring 2018)
CMSC764 - Advanced Numerical Optimization (Prof. Tom Goldstein - Spring 2018)
CMSC828L - Deep Learning (Prof. David Jacobs - Fall 2018)
CMSC727 - Neural Modeling (Prof. James Reggia - Fall 2018)
CMSC828X - Algorithms for Probabilistic and Deterministic Graphical Models (Prof. Rina Dechter - Spring 2019)
LING849C - Computational Psycholinguistics (Prof. Naomi Feldman - Spring 2019)