Mike Furr
Static Type Inference for Ruby
Michael Furr, Jong-hoon (David) An, Jeffrey S. Foster, and Michael Hicks
In Proceedings of the 24th Annual ACM Symposium on Applied Computing, OOPS track.
Honolulu, HI. March 2009
[ Abstract |
pdf ]
Many general-purpose, object-oriented scripting languages are
dynamically typed, which provides flexibility but leaves the
programmer without the benefits of static typing, including early
error detection and the documentation provided by type
annotations. This paper describes Diamondback Ruby (DRuby), a tool
that blends Ruby’s dynamic type system with a static typing
discipline. DRuby provides a type language that is rich enough to
precisely type Ruby code we have encountered, without unneeded
complexity. When possible, DRuby infers static types to discover type
errors in Ruby programs. When necessary, the programmer can provide
DRuby with annotations that assign static types to dynamic code. These
annotations are checked at run time, isolating type errors to
unverified code. We applied DRuby to a suite of benchmarks and found
several bugs that would cause run-time type errors. DRuby also
reported a number of warnings that reveal questionable programming
practices in the benchmarks. We believe that DRuby takes a major step
toward bringing the benefits of combined static and dynamic typing to
Ruby and other object-oriented languages.
Profile-Guided Static Typing for Dynamic Scripting Languages
Michael Furr, Jong-hoon (David) An, and Jeffrey S. Foster
University of Maryland, Computer Science Department
Technical Report
CS-TR-4935
April 2009
[ Abstract |
pdf ]
Many popular scripting languages such as Ruby, Python, and Perl
include highly dynamic language constructs, such as an eval method
that evaluates a string as program text. While these constructs allow
terse and expressive code, they have traditionally obstructed static
analysis. In this paper we present PRuby, an extension to Diamondback
Ruby (DRuby), a static type inference system for Ruby. PRuby augments
DRuby with a novel dynamic analysis and transformation that allows us
to precisely type uses of highly dynamic constructs. PRuby’s analysis
proceeds in three steps. First, we use run-time instrumentation to
gather per-application profiles of dynamic feature usage. Next, we
replace dynamic features with statically analyzable alternatives based
on the profile. We also add instrumentation to safely handle cases
when subsequent runs do not match the profile. Finally, we run DRuby’s
static type inference on the transformed code to enforce type
safety. We used PRuby to gather profiles for a benchmark suite of
sample Ruby programs. We found that dynamic features are pervasive
throughout the benchmarks and the libraries they include, but that
most uses of these features are highly constrained and hence can be
effectively profiled. Using the profiles to guide type inference, we
found that DRuby can generally statically type our benchmarks modulo
some refactoring, and we discovered several previously unknown type
errors. These results suggest that profiling and transformation is a
lightweight but highly effective approach to bring static typing to
highly dynamic languages.
Checking Type Safety of Foreign Function Calls
Michael Furr, and Jeffrey S. Foster
University of Maryland, Computer Science Department
Technical Report
CS-TR-4845
December 2006
[ Abstract |
pdf ]
Foreign function interfaces (FFIs) allow components in different
languages to communicate directly with each other. While FFIs are
useful, they often require writing tricky, low-level code and
include little or no static safety checking, thus providing a rich
source of hard-to-find programming errors. In this paper, we study
the problem of enforcing type safety across the OCaml-to-C FFI and
the Java Native Interface (JNI). We present O-Saffire and
J-Saffire, a pair of
multilingual type inference systems that ensure C code that uses
these FFIs accesses high-level data safely. Our inference systems use
representational types to model C’s low-level view of OCaml
and Java values, and singleton types to track integers, strings,
memory offsets, and type tags through C. J-Saffire, our Java
system, uses a polymorphic, flow-insensitive, unification-based
analysis. Polymoprhism is important because it allows us to
precisely model user-defined wrapper functions and the more than 200
JNI functions. O-Saffire, our OCaml system, uses a monomorphic,
flow-sensitive analysis, because while polymorphism is much less
important for the OCaml FFI, flow-sensitivity is critical to track
conditional branches, which are used when ``pattern matching’’ OCaml
data in C. O-Saffire also tracks garbage collection
information to ensure that local C pointers to the OCaml heap are
registered properly, which is not necessary for the JNI. We have
applied O-Saffire and J-Saffire to a set of benchmarks and found
many bugs and questionable coding practices. These results suggest
that static checking of FFIs can be a valuable tool in writing
correct multilingual software.
Polymorphic Type Inference for the JNI
Michael Furr, and Jeffrey S. Foster
University of Maryland, Computer Science Department
Technical Report
CS-TR-4759
November 2005
[ Abstract |
pdf ]
We present a multi-lingual type inference system for checking type
safety of programs that use the Java Native Interface (JNI). The
JNI uses specially-formatted strings to represent class and field
names as well as method signatures, and so our type system tracks
the flow of string constants through the program. Our system embeds
string variables in types, and as those variables are resolved to
string constants during inference they are replaced with the
structured types the constants represent. This restricted form of
dependent types allows us to directly assign type signatures to each
of the more than 200 functions in the JNI. Moreover, it allows us
to infer types for user-defined functions that are parameterized by
Java type strings, which we have found to be common practice. Our
inference system allows such functions to be treated polymorphically
by using instantiation constraints, solved with semi-unification,
at function calls. Finally, we have implemented our
system and applied it to a small set of benchmarks. Although
semi-unification is undecidable, we found our system to be scalable
and effective in practice. We discovered 155 errors 36 cases of
suspicious programming practices in our benchmarks.
Checking Type Safety of Foreign Function Calls
Michael Furr, and Jeffrey S. Foster
University of Maryland, Computer Science Department
Technical Report
CS-TR-4627
November 2004
[ Abstract |
pdf ]
We present a multi-lingual type inference system for checking type
safety across a foreign function interface. The goal of our system
is to prevent foreign function calls from introducing type and
memory safety violations into an otherwise safe language. Our
system targets OCaml’s FFI to C, which is relatively lightweight and
illustrates some interesting challenges in multi-lingual type
inference. The type language in our system embeds OCaml types in C
types and vice-versa, which allows us to track type information
accurately even through the foreign language, where the original
types are lost. Our system uses a representational type that can
model multiple OCaml types, because C programs can observe that many
OCaml types have the same physical representation. Furthermore,
because C has a low-level view of OCaml data, our inference system
includes a dataflow analysis to track memory offsets and tag
information. Finally, our type system includes garbage collection
information to ensure that pointers from the FFI to the OCaml heap
are tracked properly. We have implemented our inference system and
applied it to a small set of benchmarks. Our results show that
programmers do misuse these interfaces, and our implementation has
found several bugs and questionable coding practices in our
benchmarks.