ACM SIGPLAN 2006 Conference on
Programming Language Design and Implementation
|Expressing and Enforcing Security with Programming Languages|
Andrew Myers (Cornell University)
|Binary Program Rewriting with Diablo|
Bjorn De Sutter (Ghent University)
|bddbddb: Using Datalog and BDDs for Program Analysis|
John Whaley (Stanford University)
|The Fortress Programming Language|
Guy L. Steele Jr., Jan-Willem Maessen, and Sukyoung Ryu (Sun Microsystems)
Computer security remains a fundamental, unsolved problem both because we lack good ways either to define precisely what security we need, or to enforce it. Because attackers can attack at any level of abstraction, computer security must similarly be enforced at every level of abstraction from the hardware on up. Enforcement at the language level offers a powerful set of tools for ensuring that our increasingly complex computing systems satisfy the security needs of their users. Language-based enforcement techniques include program analysis and transformation, which can be grounded in a formal semantics for program behavior, offering fundamentally stronger security assurance. Many of these techniques can be applied even to untrusted binary code. Language-based security also offers the opportunity to better express security requirements. When the programmer can express security policies as part of the programming language, it becomes possible to support application-specific security requirements that could not be described at or enforced by the operating system or the network.
This tutorial reviews the state of the art in using languages to obtain security assurance. We work our way up from lower-level security properties such as memory safety to higher-level properties such as encapsulation, access control, confidentiality, and integrity. At each level, we examine language-based approaches that have employed, such as security type systems and safe low-level programming languages, program rewriting techniques such as sandboxing, certifying compilation and other techniques for validating security of binaries, information flow analysis, and language-based protocol analysis and synthesis for building secure distributed systems.
Andrew Myers is an Associate Professor in the Computer Science Department at Cornell University in Ithaca, NY. He received his Ph.D. in Electrical Engineering and Computer Science from MIT in 1999. His research interests include computer security, programming languages, and distributed objects. His work on computer security has focused on practical, sound, expressive languages and systems for enforcing information security.
This tutorial will introduce attendees to the current state of the art in binary program rewriting. Applications of static binary rewriting such as compaction, optimization, customization, instrumentation, obfuscation, support for dynamic rewriting, etc. will be presented, as well as pitfalls and reliability concerns. The use of the retargetable, extensible and reliable binary rewriting framework Diablo to support these applications, and to facilitate experimentation will be discussed extensively. This discussion will include more abstract design aspects of Diablo, as well as more concrete data structures, algorithms and methods implemented in Diablo.
The tutorial consist of four parts:
Bjorn De Sutter obtained his PhD in Computer Science from Ghent University in 2002 on the subject of link-time program compaction. Since then he has been working at that university on the more broad subject of binary program rewriting. Applications for which methods and tools were developed include software compaction, Linux kernel customization, program instrumentation, and program obfuscation. Furthermore, he has worked on the subject of Java whole program optimization in IBM T.J Watson Research Center in Hawthorne, NY during 2002-2003. Since the summer of 2005, Bjorn is also working at IMEC (Interuniversity Micro-Electronics Center) where he is developing tool chains and methodologies for high- performance, low-power computing on coarse-grained reconfigurable architectures.
Many problems in program analysis can be expressed naturally and concisely in a declarative language like Datalog. This makes it easy to specify new analyses or extend or compose existing analyses. When combined with binary decision diagrams --- a data structure that can efficiently manipulate large relations --- the result is a potent combination that allows you to create sophisticated program analyses with just a few lines of code.
This four-hour tutorial will describe bddbddb (BDD-Based Deductive DataBase), an open-source tool for translating high-level analysis specifications written in Datalog into efficient implementations. People have used bddbddb to implement a variety of analyses, from a C analysis to discover format string vulnerabilities to a Java static race-detection analysis.
Using bddbddb to specify a program analysis is seductively simple; however, making your analysis run fast and scale up to large programs requires a lot of tuning, patience, knowledge, and luck. This tutorial will also cover many of the pitfalls, caveats, and idiosyncrasies of program analysis using BDDs.
By the end of this tutorial, attendees will learn how to:
Prior familiarity with BDDs and Datalog is not necessary; however, it is encouraged that attendees take a look at the following papers prior to the tutorial:
John Whaley is the author of many open source projects, including the joeq virtual machine and compiler infrastructure, the JavaBDD library, and bddbddb (BDD-Based Deductive DataBase). He has also worked at IBM Watson Research on the Jalapeno Java virtual machine and at IBM Tokyo Research Laboratory on the JIT compiler for the IBM Java Development Kit. He is the winner of numerous awards, including Best Paper Awards at OOPSLA 2001 and PLDI 2004, and a SIGSOFT Distinguished Paper Award at ISSTA 2002. His primary research interests are in program analysis, compilers, and virtual machines. He is receiving his Ph.D. in Computer Science from Stanford University.
As part of the DARPA program for High Productivity Computing Systems, the Programming Language Research Group at Sun Microsystems Laboratories is developing Fortress, a language intended to support large-scale scientific computation with the same level of portability that the Java programming language provided for multithreaded commercial applications.
One of the design principles of Fortress is to expose at the level of source code aspects of language definition and implementation that are usually buried in the implementation of a compiler or interpreter. To this end, Fortress has relatively rich mechanisms for encapsulation and abstraction, including rather ambitious type declaration and type inference. The idea is that Fortress serves two audiences: application programmers and library programmers. The design of Fortress intentionally provides a fairly complicated language for library programmers that enables them to write libraries that present a relatively simple set of interfaces to the application programmer. Thus Fortress is as much a framework for language developers as it is a language for coding scientific applications.
This tutorial begins by describing the application language, primarily through examples. Where relevant we will discuss features and design decisions by comparison with the Fortran and Java programming languages. Two points of particular interest are that the surface syntax is designed to be much closer to traditional mathematical notation than is currently customary (which had interesting consequences for parsing and overloading), and that in Fortress parallelism is encouraged everywhere (for example, it is intentionally just a little bit harder to write a sequential loop than a parallel loop).
The second half of the tutorial will cover features intended to support the coding of libraries, including ways in which a parameterized polymorphic type system can be used to organize multithreading and data distribution on large parallel machines. A sufficiently rich type system can take the place of certain kinds of flow analysis to guide certain kinds of code selection and optimization, thus moving policymaking out of the compiler and into libraries coded in the source language.
Fortress is a work in progress. We look forward to interactive participation and critique of the design with participants.
Guy L. Steele Jr. (PhD, MIT, 1980) is a Sun Fellow at Sun Microsystems, Inc. In 1975 he co-invented the Scheme programming language at MIT with Prof. Gerald Jay Sussman. He is author or co-author of four books on programming languages (Common Lisp, C, High Performance Fortran, and the Java programming language). He is an ACM Fellow and received the 1996 ACM SIGPLAN Programming Languages Achievement Award. He designed the original EMACS command set and was the first person to port TeX. At Sun Microsystems he is responsible for research in language design and implementation strategies, and architectural and software support.
Jan-Willem Maessen (PhD, MIT, 2002) spent over a decade at MIT as a student, working with Prof. Arvind and others, before joining Sun Microsystems. His dissertation "Hybrid Eager and Lazy Evaluation for Efficient Compilation of Haskell" described a mostly-strict implementation of the Haskell programming language; resource-bounded evaluation preserves the semantics of the original lazy program. Before working on Eager Haskell, he wrote the run-time system and collaborated on the compiler for pH, an implicitly-parallel programming language based on Haskell.
Sukyoung Ryu (PhD, KAIST, 2001) has studied compile-time program analysis, language design and implementation, formal semantics and types, and programming environments. At the Korea Advanced Institute of Science and Technology, she worked on static analysis techniques for Standard ML programs with Prof. Kwangkeun Yi. Before joining Sun Microsystems, she worked at Harvard University with Prof. Norman Ramsey on the Debugging Everywhere project.