Penn Arts & Sciences Logo

Logic and Computation Seminar

Monday, February 18, 2002 - 4:31pm

Benjamin C. Pierce

University of Pennsylvania

Location

University of Pennsylvania

DRL 4C8

The recent rush to adopt XML can be attributed in part to the hope that the static typing provided by DTDs (or more sophisticated mechanisms such as XML-Schema) will improve the robustness of data exchange and processing. However, although XML _documents_ can be checked for conformance with DTDs, current XML processing languages offer no way of verifying that _programs_ operating on XML structures will always produce conforming outputs. . In previous work, we have designed and implemented a domain-specific language for XML processing, called XDuce. The main novelties of XDuce are: . 1) A type system based on REGULAR EXPRESSION TYPES. Regular expression types are a natural generalization of DTDs, describing structures in XML documents using regular expression operators (*, ?, |, etc.) and supporting a powerful form of subtyping. . 2) A corresponding mechanism for REGULAR EXPRESSION PATTERN MATCHING, which supports concise "grep-style" patterns for extracting information from inside structured sequences. . The lessons learned from XDuce are now being incorporated in a new language, called Xtatic, whose design focuses on smooth integration of these novel XML-processing features into mainstream, object-oriented languages such as C#. The current vision is that Xtatic will be engineered as a lightweight extension to C#, offering native support for regular expression types and patterns and completely interoperable at the binary level with ordinary C# programs and APIs. . The talk will describe the basic design principles of Xtatic, the technical issues that have been addressed so far (in particular, the integration of regular expression types with the type structures of objects and classes), and the issues still requiring work.