Loris D'Antoni

 Loris D'Antoni

Loris D'Antoni

  • Courses1
  • Reviews7

Biography

University of Wisconsin Madison - Computer Science

Assistant Professor at University of Wisconsin-Madison
Computer Software
Loris
D'Antoni
Madison, Wisconsin
2015-ongoing: Assistant professor at the University of Wisconsin, Madison

2010-2015: PhD at University of Pennsylvania.

Summer 2012 and 2013: Research Internship at Microsoft Research Redmond
2010: Research Fellowship at Università degli Studi di Torino.

-----------------------------------------------------------

My research focuses on building programming languages and designing techniques that use the power of decision procedures to make programming domain-specific tasks simpler, less error-prone, and more efficient.

During my Ph.D. I created three languages that use automata to simplify writing string and tree manipulating programs (Bex, Fast, and DReX), and I also built AutomataTutor, a tool for teaching finite automata to undergraduate students.

-----------------------------------------------------------

PUBLICATIONS:
DReX: A Declarative Language for Efficiently Evaluating Regular String Transformations, POPL15
Symbolic Visibly Pushdown Automata

R. Cochran, L. D'Antoni, B. Livshits, D. Molnar, and M. Veanes, POPL15
Program Boosting: Program Synthesis via Crowd-Sourcing

L. D'Antoni, and R. Alur, CAV14
Symbolic Visibly Pushdown Automata

L. D'Antoni et al., PLDI14
FAST: A Transducer-Based Language for Tree Manipulation

L. D'Antoni, and M. Veanes, POPL14
Minimization of Symbolic Automata

R. Alur et al., IJCAI13
Automated Grading of DFA Constructions

L. D'Antoni, and M. Veanes, CAV13
Equivalence of Symbolic Finite Transducers

B. Mozafari et al., ACM TODS13
High-Performance Complex Event Processing over Hierarchical Data

L. D'Antoni, M. Veanes, VMCAI13
Static Analysis of String Encoders and Decoders

R. Alur, L. D'Antoni, ICALP12
Streaming tree transducers

Specialties: Software verification, Automata and decision procedures, System Modelling, Java, J2EE, C#, ML, Coq


Experience

  • University of Pennsylvania

    PhD Student in Computer Science

    Research on Formal Methods for Computer Science.
    Use of rigorous mathematical techniques for design and analysis of computer systems and their application to a variety of software engineering problems (e.g., requirements engineering, model specification, code design/ generation, testing, implementation verification and validation, software certification).

  • University of Pennsylvania

    Teaching Assistant

    Teaching assistant for Software Foundations (CIS 500, graduate)

  • Universita' degli Studi di Torino

    Web Designer

    Designed the orientation web site for the department of computer science.
    Used CMS (XOOPS) to optimize the process and for the code reuse. Php and SQL skills required during the design.

  • Universita' degli Studi di Torino

    Fellowship researcher

    Research on formal methods to represent biological interactions in membranes.

    Application of bidirectional programming pattern to switch from a model with more information to a model with less information and viceversa.

  • Microsoft

    Research Intern

    Research on Transducers and design of a domain specific language for symbolic tree transformations and analysis.
    Use of such analysis for Augment Reality transformations and skeletons animation. Use of tree transducers for analysis of HTML sanitizers.
    Use of String Transducers for analysis of encoder and decoders.

  • Microsoft

    Research Intern

    Research on Transducers and design of a domain specific language for symbolic tree transformations and analysis.
    Use of such analysis for Augment Reality transformations and skeletons animation. Use of tree transducers for analysis of HTML sanitizers.
    Use of String Transducers for analysis of encoder and decoders.

  • University of Wisconsin-Madison

    Assistant Professor

    Loris worked at University of Wisconsin-Madison as a Assistant Professor

Education

  • MIT SAT/SMT Solver Summer School

    Participation

    SAT/SMT solvers and their applications
    Boolean SAT/SMT constraint solvers have seen dramatic progress in the last decade, and are being used in a diverse set of applications such as program analysis, testing, formal methods, program synthesis, hardware verification, electronic design automation, computer security, AI, operations research (MAXSAT) and biology.The goal is connecting SAT/SMT developers and power users.

  • University of Pennsylvania

    PhD

    Computer Science
    Research on Formal Methods for Computer Science. Use of rigorous mathematical techniques for design and analysis of computer systems and their application to a variety of software engineering problems (e.g., requirements engineering, model specification, code design/ generation, testing, implementation verification and validation, software certification).

  • University of Pennsylvania

    PhD Student in Computer Science


    Research on Formal Methods for Computer Science. Use of rigorous mathematical techniques for design and analysis of computer systems and their application to a variety of software engineering problems (e.g., requirements engineering, model specification, code design/ generation, testing, implementation verification and validation, software certification).

  • University of Pennsylvania

    Teaching Assistant


    Teaching assistant for Software Foundations (CIS 500, graduate)

  • International Summer School Marktoberdorf

    Participation

    Software and system safety
    2 weeks summer school on software and systems and safety. The school is mainly focused on specification and verification via formal methods of reliable systems. Model Checkin, testing, abstraction.

Publications

  • Global Progress in Dynamically Interleaved Multiparty Sessions

    19th International Conference on Concurrency Theory

    A multiparty session forms a unit of structured interactions among many participants which follow a prescribed scenario specified as a global type signature. This paper develops, besides a more traditional communication type system, a novel static interaction type system for global progress in dynamically interleaved multiparty sessions.

  • Global Progress in Dynamically Interleaved Multiparty Sessions

    19th International Conference on Concurrency Theory

    A multiparty session forms a unit of structured interactions among many participants which follow a prescribed scenario specified as a global type signature. This paper develops, besides a more traditional communication type system, a novel static interaction type system for global progress in dynamically interleaved multiparty sessions.

  • High-Performance Complex Event Processing over Hierarchical Data

    ACM TODS

    While complex event processing (CEP) constitutes a considerable portion of the so called Big Data analytics, current CEP systems can only process data having a simple structure, and are otherwise limited in their ability to efficiently support complex continuous queries on structured or semi-structured information. However, XML-like streams represent a very popular form of data exchange, comprising large portions of social network and RSS feeds, financial feeds, configuration files, and similar applications requiring advanced CEP queries. In this paper, we present the XSeq language and system that support CEP on XML streams, via an extension of XPath that is both powerful and amenable to an efficient implementation. Specifically, the XSeq language extends XPath with natural operators to express sequential and Kleene-* patterns over XML streams, while remaining highly amenable to efficient execution. In fact, XSeq is designed to take full advantage of the recently proposed Visibly Pushdown Automata (VPA), where higher expressive power can be achieved without compromising the computationally attractive properties of finite state automata. Besides the efficiency and expressivity benefits, the choice of VPA as the underlying model also enables XSeq go beyond XML streams and be easily applicable to any data with both sequential and hierarchical structures, including JSON messages, RNA sequences, and software traces. Therefore, we illustrate the XSeq’s power for CEP applications through examples from different domains and provide formal results on its expressiveness and complexity. Finally, we present several optimization techniques for XSeq queries. Our extensive experiments indicate that XSeq brings outstanding performance to CEP applications: two orders of magnitude improvement is obtained over the same queries executed in general-purpose XML engines.

  • Global Progress in Dynamically Interleaved Multiparty Sessions

    19th International Conference on Concurrency Theory

    A multiparty session forms a unit of structured interactions among many participants which follow a prescribed scenario specified as a global type signature. This paper develops, besides a more traditional communication type system, a novel static interaction type system for global progress in dynamically interleaved multiparty sessions.

  • High-Performance Complex Event Processing over Hierarchical Data

    ACM TODS

    While complex event processing (CEP) constitutes a considerable portion of the so called Big Data analytics, current CEP systems can only process data having a simple structure, and are otherwise limited in their ability to efficiently support complex continuous queries on structured or semi-structured information. However, XML-like streams represent a very popular form of data exchange, comprising large portions of social network and RSS feeds, financial feeds, configuration files, and similar applications requiring advanced CEP queries. In this paper, we present the XSeq language and system that support CEP on XML streams, via an extension of XPath that is both powerful and amenable to an efficient implementation. Specifically, the XSeq language extends XPath with natural operators to express sequential and Kleene-* patterns over XML streams, while remaining highly amenable to efficient execution. In fact, XSeq is designed to take full advantage of the recently proposed Visibly Pushdown Automata (VPA), where higher expressive power can be achieved without compromising the computationally attractive properties of finite state automata. Besides the efficiency and expressivity benefits, the choice of VPA as the underlying model also enables XSeq go beyond XML streams and be easily applicable to any data with both sequential and hierarchical structures, including JSON messages, RNA sequences, and software traces. Therefore, we illustrate the XSeq’s power for CEP applications through examples from different domains and provide formal results on its expressiveness and complexity. Finally, we present several optimization techniques for XSeq queries. Our extensive experiments indicate that XSeq brings outstanding performance to CEP applications: two orders of magnitude improvement is obtained over the same queries executed in general-purpose XML engines.

  • Operating System Support For Augmented Reality Applications

    14th Workshop on Hot Topics in Operating Systems

    Augmented reality (AR) takes natural user input (NUI), such as gestures, voice, and eye gaze, and produces digital visual overlays on top of reality seen by a user. Today, multiple shipping AR applications exist, most notably titles for the Microsoft Kinect and smartphone applications such as Layar, Wikitude, and Junaio. Despite this activity, little attention has been paid to operating system support for AR applications. Instead, each AR application today does its own sensing and rendering, with the help of user-level libraries like OpenCV or the Microsoft Kinect SDK. In this paper, we explore how operating systems should evolve to support AR applications. Because AR applications work with fundamentally new inputs and outputs, an OS that supports AR applications needs to re-think the input and display abstractions exposed to applications. Unlike mouse and keyboard, which form explicit, separate channels for user input, NUI requires continuous sensing of the real-world environment, which often has sensitive data mixed with user input. Hence, the OS input abstractions must ensure that user privacy is not violated, and the OS must provide a fine-grained permission system for access to recognized objects like a user's face and skeleton. In addition, because visual outputs of AR applications mix real-world and virtual objects, the synthetic window abstraction in traditional GUIs is no longer viable, and OSes must rethink the display abstractions and their management. We discuss research directions for solving these and other issues and building an OS that let multiple applications share one (augmented) reality.

  • Global Progress in Dynamically Interleaved Multiparty Sessions

    19th International Conference on Concurrency Theory

    A multiparty session forms a unit of structured interactions among many participants which follow a prescribed scenario specified as a global type signature. This paper develops, besides a more traditional communication type system, a novel static interaction type system for global progress in dynamically interleaved multiparty sessions.

  • High-Performance Complex Event Processing over Hierarchical Data

    ACM TODS

    While complex event processing (CEP) constitutes a considerable portion of the so called Big Data analytics, current CEP systems can only process data having a simple structure, and are otherwise limited in their ability to efficiently support complex continuous queries on structured or semi-structured information. However, XML-like streams represent a very popular form of data exchange, comprising large portions of social network and RSS feeds, financial feeds, configuration files, and similar applications requiring advanced CEP queries. In this paper, we present the XSeq language and system that support CEP on XML streams, via an extension of XPath that is both powerful and amenable to an efficient implementation. Specifically, the XSeq language extends XPath with natural operators to express sequential and Kleene-* patterns over XML streams, while remaining highly amenable to efficient execution. In fact, XSeq is designed to take full advantage of the recently proposed Visibly Pushdown Automata (VPA), where higher expressive power can be achieved without compromising the computationally attractive properties of finite state automata. Besides the efficiency and expressivity benefits, the choice of VPA as the underlying model also enables XSeq go beyond XML streams and be easily applicable to any data with both sequential and hierarchical structures, including JSON messages, RNA sequences, and software traces. Therefore, we illustrate the XSeq’s power for CEP applications through examples from different domains and provide formal results on its expressiveness and complexity. Finally, we present several optimization techniques for XSeq queries. Our extensive experiments indicate that XSeq brings outstanding performance to CEP applications: two orders of magnitude improvement is obtained over the same queries executed in general-purpose XML engines.

  • Operating System Support For Augmented Reality Applications

    14th Workshop on Hot Topics in Operating Systems

    Augmented reality (AR) takes natural user input (NUI), such as gestures, voice, and eye gaze, and produces digital visual overlays on top of reality seen by a user. Today, multiple shipping AR applications exist, most notably titles for the Microsoft Kinect and smartphone applications such as Layar, Wikitude, and Junaio. Despite this activity, little attention has been paid to operating system support for AR applications. Instead, each AR application today does its own sensing and rendering, with the help of user-level libraries like OpenCV or the Microsoft Kinect SDK. In this paper, we explore how operating systems should evolve to support AR applications. Because AR applications work with fundamentally new inputs and outputs, an OS that supports AR applications needs to re-think the input and display abstractions exposed to applications. Unlike mouse and keyboard, which form explicit, separate channels for user input, NUI requires continuous sensing of the real-world environment, which often has sensitive data mixed with user input. Hence, the OS input abstractions must ensure that user privacy is not violated, and the OS must provide a fine-grained permission system for access to recognized objects like a user's face and skeleton. In addition, because visual outputs of AR applications mix real-world and virtual objects, the synthetic window abstraction in traditional GUIs is no longer viable, and OSes must rethink the display abstractions and their management. We discuss research directions for solving these and other issues and building an OS that let multiple applications share one (augmented) reality.

  • Streaming Tree Transducers

    40th International Colloquium on Automata, Languages and Programming

    We introduce streaming tree transducers as an analyzable and expressive model for transforming hierarchically structured data in a single pass. Given a linear encoding of the input tree, the transducer makes a single left-to-right pass through the input, and computes the linear encoding of the output tree in linear time using a ?nite-state control, a ?nite number of string variables, and a visibly pushdown stack. We establish that the model is closed under regular look ahead: allowing the transducer to make decisions based on a regular property of the remaining input suffix does not increase expressiveness. The expressiveness of the model coincides with transductions de?nable using monadic second-order logic (MSO). Finally, we establish a NExpTime upper bound for checking functional inequivalence of two streaming tree transducers.

  • Global Progress in Dynamically Interleaved Multiparty Sessions

    19th International Conference on Concurrency Theory

    A multiparty session forms a unit of structured interactions among many participants which follow a prescribed scenario specified as a global type signature. This paper develops, besides a more traditional communication type system, a novel static interaction type system for global progress in dynamically interleaved multiparty sessions.

  • High-Performance Complex Event Processing over Hierarchical Data

    ACM TODS

    While complex event processing (CEP) constitutes a considerable portion of the so called Big Data analytics, current CEP systems can only process data having a simple structure, and are otherwise limited in their ability to efficiently support complex continuous queries on structured or semi-structured information. However, XML-like streams represent a very popular form of data exchange, comprising large portions of social network and RSS feeds, financial feeds, configuration files, and similar applications requiring advanced CEP queries. In this paper, we present the XSeq language and system that support CEP on XML streams, via an extension of XPath that is both powerful and amenable to an efficient implementation. Specifically, the XSeq language extends XPath with natural operators to express sequential and Kleene-* patterns over XML streams, while remaining highly amenable to efficient execution. In fact, XSeq is designed to take full advantage of the recently proposed Visibly Pushdown Automata (VPA), where higher expressive power can be achieved without compromising the computationally attractive properties of finite state automata. Besides the efficiency and expressivity benefits, the choice of VPA as the underlying model also enables XSeq go beyond XML streams and be easily applicable to any data with both sequential and hierarchical structures, including JSON messages, RNA sequences, and software traces. Therefore, we illustrate the XSeq’s power for CEP applications through examples from different domains and provide formal results on its expressiveness and complexity. Finally, we present several optimization techniques for XSeq queries. Our extensive experiments indicate that XSeq brings outstanding performance to CEP applications: two orders of magnitude improvement is obtained over the same queries executed in general-purpose XML engines.

  • Operating System Support For Augmented Reality Applications

    14th Workshop on Hot Topics in Operating Systems

    Augmented reality (AR) takes natural user input (NUI), such as gestures, voice, and eye gaze, and produces digital visual overlays on top of reality seen by a user. Today, multiple shipping AR applications exist, most notably titles for the Microsoft Kinect and smartphone applications such as Layar, Wikitude, and Junaio. Despite this activity, little attention has been paid to operating system support for AR applications. Instead, each AR application today does its own sensing and rendering, with the help of user-level libraries like OpenCV or the Microsoft Kinect SDK. In this paper, we explore how operating systems should evolve to support AR applications. Because AR applications work with fundamentally new inputs and outputs, an OS that supports AR applications needs to re-think the input and display abstractions exposed to applications. Unlike mouse and keyboard, which form explicit, separate channels for user input, NUI requires continuous sensing of the real-world environment, which often has sensitive data mixed with user input. Hence, the OS input abstractions must ensure that user privacy is not violated, and the OS must provide a fine-grained permission system for access to recognized objects like a user's face and skeleton. In addition, because visual outputs of AR applications mix real-world and virtual objects, the synthetic window abstraction in traditional GUIs is no longer viable, and OSes must rethink the display abstractions and their management. We discuss research directions for solving these and other issues and building an OS that let multiple applications share one (augmented) reality.

  • Streaming Tree Transducers

    40th International Colloquium on Automata, Languages and Programming

    We introduce streaming tree transducers as an analyzable and expressive model for transforming hierarchically structured data in a single pass. Given a linear encoding of the input tree, the transducer makes a single left-to-right pass through the input, and computes the linear encoding of the output tree in linear time using a ?nite-state control, a ?nite number of string variables, and a visibly pushdown stack. We establish that the model is closed under regular look ahead: allowing the transducer to make decisions based on a regular property of the remaining input suffix does not increase expressiveness. The expressiveness of the model coincides with transductions de?nable using monadic second-order logic (MSO). Finally, we establish a NExpTime upper bound for checking functional inequivalence of two streaming tree transducers.

  • Regular Functions, Cost Register Automata, and Generalized Min-Cost Problems

    LICS13

    Motivated by the successful application of the theory of regular languages to formal verification of finite-state systems, there is a renewed interest in developing a theory of analyzable functions from strings to numerical values that can provide a foundation for analyzing quantitative properties of finite-state systems. In this paper, we propose a deterministic model for associating costs with strings that is parameterized by operations of interest (such as addition, scaling, and min), a notion of regularity that provides a yardstick to measure expressiveness, and study decision problems and theoretical properties of resulting classes of cost functions. Our definition of regularity relies on the theory of string-to-tree transducers, and allows associating costs with events that are conditional upon regular properties of future events. Our model of cost register automata allows computation of regular functions using multiple "write-only" registers whose values can be combined using the allowed set of operations. We show that classical shortest-path algorithms as well as algorithms designed for computing discounted costs, can be adopted for solving the min-cost problems for the more general classes of functions specified in our model. Cost register automata with min and increment give a deterministic model that is equivalent to weighted automata, an extensively studied nondeterministic model, and this connection results in new insights and new open problems.

  • Global Progress in Dynamically Interleaved Multiparty Sessions

    19th International Conference on Concurrency Theory

    A multiparty session forms a unit of structured interactions among many participants which follow a prescribed scenario specified as a global type signature. This paper develops, besides a more traditional communication type system, a novel static interaction type system for global progress in dynamically interleaved multiparty sessions.

  • High-Performance Complex Event Processing over Hierarchical Data

    ACM TODS

    While complex event processing (CEP) constitutes a considerable portion of the so called Big Data analytics, current CEP systems can only process data having a simple structure, and are otherwise limited in their ability to efficiently support complex continuous queries on structured or semi-structured information. However, XML-like streams represent a very popular form of data exchange, comprising large portions of social network and RSS feeds, financial feeds, configuration files, and similar applications requiring advanced CEP queries. In this paper, we present the XSeq language and system that support CEP on XML streams, via an extension of XPath that is both powerful and amenable to an efficient implementation. Specifically, the XSeq language extends XPath with natural operators to express sequential and Kleene-* patterns over XML streams, while remaining highly amenable to efficient execution. In fact, XSeq is designed to take full advantage of the recently proposed Visibly Pushdown Automata (VPA), where higher expressive power can be achieved without compromising the computationally attractive properties of finite state automata. Besides the efficiency and expressivity benefits, the choice of VPA as the underlying model also enables XSeq go beyond XML streams and be easily applicable to any data with both sequential and hierarchical structures, including JSON messages, RNA sequences, and software traces. Therefore, we illustrate the XSeq’s power for CEP applications through examples from different domains and provide formal results on its expressiveness and complexity. Finally, we present several optimization techniques for XSeq queries. Our extensive experiments indicate that XSeq brings outstanding performance to CEP applications: two orders of magnitude improvement is obtained over the same queries executed in general-purpose XML engines.

  • Operating System Support For Augmented Reality Applications

    14th Workshop on Hot Topics in Operating Systems

    Augmented reality (AR) takes natural user input (NUI), such as gestures, voice, and eye gaze, and produces digital visual overlays on top of reality seen by a user. Today, multiple shipping AR applications exist, most notably titles for the Microsoft Kinect and smartphone applications such as Layar, Wikitude, and Junaio. Despite this activity, little attention has been paid to operating system support for AR applications. Instead, each AR application today does its own sensing and rendering, with the help of user-level libraries like OpenCV or the Microsoft Kinect SDK. In this paper, we explore how operating systems should evolve to support AR applications. Because AR applications work with fundamentally new inputs and outputs, an OS that supports AR applications needs to re-think the input and display abstractions exposed to applications. Unlike mouse and keyboard, which form explicit, separate channels for user input, NUI requires continuous sensing of the real-world environment, which often has sensitive data mixed with user input. Hence, the OS input abstractions must ensure that user privacy is not violated, and the OS must provide a fine-grained permission system for access to recognized objects like a user's face and skeleton. In addition, because visual outputs of AR applications mix real-world and virtual objects, the synthetic window abstraction in traditional GUIs is no longer viable, and OSes must rethink the display abstractions and their management. We discuss research directions for solving these and other issues and building an OS that let multiple applications share one (augmented) reality.

  • Streaming Tree Transducers

    40th International Colloquium on Automata, Languages and Programming

    We introduce streaming tree transducers as an analyzable and expressive model for transforming hierarchically structured data in a single pass. Given a linear encoding of the input tree, the transducer makes a single left-to-right pass through the input, and computes the linear encoding of the output tree in linear time using a ?nite-state control, a ?nite number of string variables, and a visibly pushdown stack. We establish that the model is closed under regular look ahead: allowing the transducer to make decisions based on a regular property of the remaining input suffix does not increase expressiveness. The expressiveness of the model coincides with transductions de?nable using monadic second-order logic (MSO). Finally, we establish a NExpTime upper bound for checking functional inequivalence of two streaming tree transducers.

  • Regular Functions, Cost Register Automata, and Generalized Min-Cost Problems

    LICS13

    Motivated by the successful application of the theory of regular languages to formal verification of finite-state systems, there is a renewed interest in developing a theory of analyzable functions from strings to numerical values that can provide a foundation for analyzing quantitative properties of finite-state systems. In this paper, we propose a deterministic model for associating costs with strings that is parameterized by operations of interest (such as addition, scaling, and min), a notion of regularity that provides a yardstick to measure expressiveness, and study decision problems and theoretical properties of resulting classes of cost functions. Our definition of regularity relies on the theory of string-to-tree transducers, and allows associating costs with events that are conditional upon regular properties of future events. Our model of cost register automata allows computation of regular functions using multiple "write-only" registers whose values can be combined using the allowed set of operations. We show that classical shortest-path algorithms as well as algorithms designed for computing discounted costs, can be adopted for solving the min-cost problems for the more general classes of functions specified in our model. Cost register automata with min and increment give a deterministic model that is equivalent to weighted automata, an extensively studied nondeterministic model, and this connection results in new insights and new open problems.

  • Static Analysis of String Encoders and Decoders

    Verification, Model Checking and Abstract Interpretation, 2013

    There has been significant interest in static analysis of programs that manipulate strings, in particular in the context of web security. Many types of security vulnerabilities are exposed through flaws in programs such as string encoders, decoders, and sanitizers. Recent work has focused on combining automata and satisfiability modulo theories techniques to address security issues in those programs. These techniques scale to larger alphabets such as Unicode, that is a de facto character encoding standard used in web software. One approach has been to use character predicates to generalize finite state transducers. This technique has made it possible to perform precise analysis of a large class of typical sanitization routines. However, it has not been able to cope well with decoders, that often require to read more than one character at a time. In order to overcome this limitation we introduce a conservative generalization of Symbolic Finite Transducers (SFTs) called Extended Symbolic Finite Transducers (ESFTs) that incorporates the notion of a bounded lookahead. We demonstrate the advantage ESFTs on analyzing programs for which previous approaches did not scale. In our evaluation we use a UTF-16 to UTF-8 translator (utf8encoder) and a UTF-8 to UTF-16 translator (utf8decoder). We show, among other properties, that utf8encoder and utf8decoder are functionally correct.

  • Global Progress in Dynamically Interleaved Multiparty Sessions

    19th International Conference on Concurrency Theory

    A multiparty session forms a unit of structured interactions among many participants which follow a prescribed scenario specified as a global type signature. This paper develops, besides a more traditional communication type system, a novel static interaction type system for global progress in dynamically interleaved multiparty sessions.

  • High-Performance Complex Event Processing over Hierarchical Data

    ACM TODS

    While complex event processing (CEP) constitutes a considerable portion of the so called Big Data analytics, current CEP systems can only process data having a simple structure, and are otherwise limited in their ability to efficiently support complex continuous queries on structured or semi-structured information. However, XML-like streams represent a very popular form of data exchange, comprising large portions of social network and RSS feeds, financial feeds, configuration files, and similar applications requiring advanced CEP queries. In this paper, we present the XSeq language and system that support CEP on XML streams, via an extension of XPath that is both powerful and amenable to an efficient implementation. Specifically, the XSeq language extends XPath with natural operators to express sequential and Kleene-* patterns over XML streams, while remaining highly amenable to efficient execution. In fact, XSeq is designed to take full advantage of the recently proposed Visibly Pushdown Automata (VPA), where higher expressive power can be achieved without compromising the computationally attractive properties of finite state automata. Besides the efficiency and expressivity benefits, the choice of VPA as the underlying model also enables XSeq go beyond XML streams and be easily applicable to any data with both sequential and hierarchical structures, including JSON messages, RNA sequences, and software traces. Therefore, we illustrate the XSeq’s power for CEP applications through examples from different domains and provide formal results on its expressiveness and complexity. Finally, we present several optimization techniques for XSeq queries. Our extensive experiments indicate that XSeq brings outstanding performance to CEP applications: two orders of magnitude improvement is obtained over the same queries executed in general-purpose XML engines.

  • Operating System Support For Augmented Reality Applications

    14th Workshop on Hot Topics in Operating Systems

    Augmented reality (AR) takes natural user input (NUI), such as gestures, voice, and eye gaze, and produces digital visual overlays on top of reality seen by a user. Today, multiple shipping AR applications exist, most notably titles for the Microsoft Kinect and smartphone applications such as Layar, Wikitude, and Junaio. Despite this activity, little attention has been paid to operating system support for AR applications. Instead, each AR application today does its own sensing and rendering, with the help of user-level libraries like OpenCV or the Microsoft Kinect SDK. In this paper, we explore how operating systems should evolve to support AR applications. Because AR applications work with fundamentally new inputs and outputs, an OS that supports AR applications needs to re-think the input and display abstractions exposed to applications. Unlike mouse and keyboard, which form explicit, separate channels for user input, NUI requires continuous sensing of the real-world environment, which often has sensitive data mixed with user input. Hence, the OS input abstractions must ensure that user privacy is not violated, and the OS must provide a fine-grained permission system for access to recognized objects like a user's face and skeleton. In addition, because visual outputs of AR applications mix real-world and virtual objects, the synthetic window abstraction in traditional GUIs is no longer viable, and OSes must rethink the display abstractions and their management. We discuss research directions for solving these and other issues and building an OS that let multiple applications share one (augmented) reality.

  • Streaming Tree Transducers

    40th International Colloquium on Automata, Languages and Programming

    We introduce streaming tree transducers as an analyzable and expressive model for transforming hierarchically structured data in a single pass. Given a linear encoding of the input tree, the transducer makes a single left-to-right pass through the input, and computes the linear encoding of the output tree in linear time using a ?nite-state control, a ?nite number of string variables, and a visibly pushdown stack. We establish that the model is closed under regular look ahead: allowing the transducer to make decisions based on a regular property of the remaining input suffix does not increase expressiveness. The expressiveness of the model coincides with transductions de?nable using monadic second-order logic (MSO). Finally, we establish a NExpTime upper bound for checking functional inequivalence of two streaming tree transducers.

  • Regular Functions, Cost Register Automata, and Generalized Min-Cost Problems

    LICS13

    Motivated by the successful application of the theory of regular languages to formal verification of finite-state systems, there is a renewed interest in developing a theory of analyzable functions from strings to numerical values that can provide a foundation for analyzing quantitative properties of finite-state systems. In this paper, we propose a deterministic model for associating costs with strings that is parameterized by operations of interest (such as addition, scaling, and min), a notion of regularity that provides a yardstick to measure expressiveness, and study decision problems and theoretical properties of resulting classes of cost functions. Our definition of regularity relies on the theory of string-to-tree transducers, and allows associating costs with events that are conditional upon regular properties of future events. Our model of cost register automata allows computation of regular functions using multiple "write-only" registers whose values can be combined using the allowed set of operations. We show that classical shortest-path algorithms as well as algorithms designed for computing discounted costs, can be adopted for solving the min-cost problems for the more general classes of functions specified in our model. Cost register automata with min and increment give a deterministic model that is equivalent to weighted automata, an extensively studied nondeterministic model, and this connection results in new insights and new open problems.

  • Static Analysis of String Encoders and Decoders

    Verification, Model Checking and Abstract Interpretation, 2013

    There has been significant interest in static analysis of programs that manipulate strings, in particular in the context of web security. Many types of security vulnerabilities are exposed through flaws in programs such as string encoders, decoders, and sanitizers. Recent work has focused on combining automata and satisfiability modulo theories techniques to address security issues in those programs. These techniques scale to larger alphabets such as Unicode, that is a de facto character encoding standard used in web software. One approach has been to use character predicates to generalize finite state transducers. This technique has made it possible to perform precise analysis of a large class of typical sanitization routines. However, it has not been able to cope well with decoders, that often require to read more than one character at a time. In order to overcome this limitation we introduce a conservative generalization of Symbolic Finite Transducers (SFTs) called Extended Symbolic Finite Transducers (ESFTs) that incorporates the notion of a bounded lookahead. We demonstrate the advantage ESFTs on analyzing programs for which previous approaches did not scale. In our evaluation we use a UTF-16 to UTF-8 translator (utf8encoder) and a UTF-8 to UTF-16 translator (utf8decoder). We show, among other properties, that utf8encoder and utf8decoder are functionally correct.

  • Automated Grading of DFA Constructions

    23th International Joint Conference on Artificial Intelligence

    One challenge in making online education more effective is to develop automatic grading software that can provide meaningful feedback. This paper provides a solution to automatic grading of the standard computation-theory problem that asks a student to construct a deterministic finite automa- ton (DFA) from the given description of its language. We focus on how to assign partial grades for incorrect answers. Each student’s answer is compared to the correct DFA using a hybrid of three techniques devised to capture different classes of errors. First, in an attempt to catch syntactic mis- takes, we compute edit distance between the two DFA descriptions. Second, we consider the entropy of the symmetric difference of the languages of the two DFAs, and compute a score that estimates the fraction of the number of strings on which the student answer is wrong. Our third technique is aimed at capturing mistakes in reading of the problem de- scription. For this purpose, we consider a description language MOSEL, which adds syntactic sugar to the classical Monadic Second Order Logic, and allows defining regular languages in a concise and natural way. We provide algorithms, along with optimizations, for transforming MOSEL descriptions into DFAs and vice-versa. These allow us to compute the syntactic edit distance of the incorrect answer from the correct one in terms of their logical representations. We report an experimental study that evaluates hundreds of answers submitted by (real) students by comparing grades/feedback computed by our tool with human graders. Our conclusion is that the tool is able to assign partial grades in a meaningful way, and should be preferred over human graders for both scale and consistency.

  • Global Progress in Dynamically Interleaved Multiparty Sessions

    19th International Conference on Concurrency Theory

    A multiparty session forms a unit of structured interactions among many participants which follow a prescribed scenario specified as a global type signature. This paper develops, besides a more traditional communication type system, a novel static interaction type system for global progress in dynamically interleaved multiparty sessions.

  • High-Performance Complex Event Processing over Hierarchical Data

    ACM TODS

    While complex event processing (CEP) constitutes a considerable portion of the so called Big Data analytics, current CEP systems can only process data having a simple structure, and are otherwise limited in their ability to efficiently support complex continuous queries on structured or semi-structured information. However, XML-like streams represent a very popular form of data exchange, comprising large portions of social network and RSS feeds, financial feeds, configuration files, and similar applications requiring advanced CEP queries. In this paper, we present the XSeq language and system that support CEP on XML streams, via an extension of XPath that is both powerful and amenable to an efficient implementation. Specifically, the XSeq language extends XPath with natural operators to express sequential and Kleene-* patterns over XML streams, while remaining highly amenable to efficient execution. In fact, XSeq is designed to take full advantage of the recently proposed Visibly Pushdown Automata (VPA), where higher expressive power can be achieved without compromising the computationally attractive properties of finite state automata. Besides the efficiency and expressivity benefits, the choice of VPA as the underlying model also enables XSeq go beyond XML streams and be easily applicable to any data with both sequential and hierarchical structures, including JSON messages, RNA sequences, and software traces. Therefore, we illustrate the XSeq’s power for CEP applications through examples from different domains and provide formal results on its expressiveness and complexity. Finally, we present several optimization techniques for XSeq queries. Our extensive experiments indicate that XSeq brings outstanding performance to CEP applications: two orders of magnitude improvement is obtained over the same queries executed in general-purpose XML engines.

  • Operating System Support For Augmented Reality Applications

    14th Workshop on Hot Topics in Operating Systems

    Augmented reality (AR) takes natural user input (NUI), such as gestures, voice, and eye gaze, and produces digital visual overlays on top of reality seen by a user. Today, multiple shipping AR applications exist, most notably titles for the Microsoft Kinect and smartphone applications such as Layar, Wikitude, and Junaio. Despite this activity, little attention has been paid to operating system support for AR applications. Instead, each AR application today does its own sensing and rendering, with the help of user-level libraries like OpenCV or the Microsoft Kinect SDK. In this paper, we explore how operating systems should evolve to support AR applications. Because AR applications work with fundamentally new inputs and outputs, an OS that supports AR applications needs to re-think the input and display abstractions exposed to applications. Unlike mouse and keyboard, which form explicit, separate channels for user input, NUI requires continuous sensing of the real-world environment, which often has sensitive data mixed with user input. Hence, the OS input abstractions must ensure that user privacy is not violated, and the OS must provide a fine-grained permission system for access to recognized objects like a user's face and skeleton. In addition, because visual outputs of AR applications mix real-world and virtual objects, the synthetic window abstraction in traditional GUIs is no longer viable, and OSes must rethink the display abstractions and their management. We discuss research directions for solving these and other issues and building an OS that let multiple applications share one (augmented) reality.

  • Streaming Tree Transducers

    40th International Colloquium on Automata, Languages and Programming

    We introduce streaming tree transducers as an analyzable and expressive model for transforming hierarchically structured data in a single pass. Given a linear encoding of the input tree, the transducer makes a single left-to-right pass through the input, and computes the linear encoding of the output tree in linear time using a ?nite-state control, a ?nite number of string variables, and a visibly pushdown stack. We establish that the model is closed under regular look ahead: allowing the transducer to make decisions based on a regular property of the remaining input suffix does not increase expressiveness. The expressiveness of the model coincides with transductions de?nable using monadic second-order logic (MSO). Finally, we establish a NExpTime upper bound for checking functional inequivalence of two streaming tree transducers.

  • Regular Functions, Cost Register Automata, and Generalized Min-Cost Problems

    LICS13

    Motivated by the successful application of the theory of regular languages to formal verification of finite-state systems, there is a renewed interest in developing a theory of analyzable functions from strings to numerical values that can provide a foundation for analyzing quantitative properties of finite-state systems. In this paper, we propose a deterministic model for associating costs with strings that is parameterized by operations of interest (such as addition, scaling, and min), a notion of regularity that provides a yardstick to measure expressiveness, and study decision problems and theoretical properties of resulting classes of cost functions. Our definition of regularity relies on the theory of string-to-tree transducers, and allows associating costs with events that are conditional upon regular properties of future events. Our model of cost register automata allows computation of regular functions using multiple "write-only" registers whose values can be combined using the allowed set of operations. We show that classical shortest-path algorithms as well as algorithms designed for computing discounted costs, can be adopted for solving the min-cost problems for the more general classes of functions specified in our model. Cost register automata with min and increment give a deterministic model that is equivalent to weighted automata, an extensively studied nondeterministic model, and this connection results in new insights and new open problems.

  • Static Analysis of String Encoders and Decoders

    Verification, Model Checking and Abstract Interpretation, 2013

    There has been significant interest in static analysis of programs that manipulate strings, in particular in the context of web security. Many types of security vulnerabilities are exposed through flaws in programs such as string encoders, decoders, and sanitizers. Recent work has focused on combining automata and satisfiability modulo theories techniques to address security issues in those programs. These techniques scale to larger alphabets such as Unicode, that is a de facto character encoding standard used in web software. One approach has been to use character predicates to generalize finite state transducers. This technique has made it possible to perform precise analysis of a large class of typical sanitization routines. However, it has not been able to cope well with decoders, that often require to read more than one character at a time. In order to overcome this limitation we introduce a conservative generalization of Symbolic Finite Transducers (SFTs) called Extended Symbolic Finite Transducers (ESFTs) that incorporates the notion of a bounded lookahead. We demonstrate the advantage ESFTs on analyzing programs for which previous approaches did not scale. In our evaluation we use a UTF-16 to UTF-8 translator (utf8encoder) and a UTF-8 to UTF-16 translator (utf8decoder). We show, among other properties, that utf8encoder and utf8decoder are functionally correct.

  • Automated Grading of DFA Constructions

    23th International Joint Conference on Artificial Intelligence

    One challenge in making online education more effective is to develop automatic grading software that can provide meaningful feedback. This paper provides a solution to automatic grading of the standard computation-theory problem that asks a student to construct a deterministic finite automa- ton (DFA) from the given description of its language. We focus on how to assign partial grades for incorrect answers. Each student’s answer is compared to the correct DFA using a hybrid of three techniques devised to capture different classes of errors. First, in an attempt to catch syntactic mis- takes, we compute edit distance between the two DFA descriptions. Second, we consider the entropy of the symmetric difference of the languages of the two DFAs, and compute a score that estimates the fraction of the number of strings on which the student answer is wrong. Our third technique is aimed at capturing mistakes in reading of the problem de- scription. For this purpose, we consider a description language MOSEL, which adds syntactic sugar to the classical Monadic Second Order Logic, and allows defining regular languages in a concise and natural way. We provide algorithms, along with optimizations, for transforming MOSEL descriptions into DFAs and vice-versa. These allow us to compute the syntactic edit distance of the incorrect answer from the correct one in terms of their logical representations. We report an experimental study that evaluates hundreds of answers submitted by (real) students by comparing grades/feedback computed by our tool with human graders. Our conclusion is that the tool is able to assign partial grades in a meaningful way, and should be preferred over human graders for both scale and consistency.

  • Equivalence of Extended Symbolic Finite Transducers

    25th International Conference on Computer-Aided Verification

    Symbolic Finite Transducers augment classic transducers with symbolic alphabets represented as parametric theories. Such extension enables succinctness and the use of potentially infinite alphabets while preserving closure and decidability properties. Extended Symbolic Finite Transducers further extend these objects by allowing transitions to read consecutive input elements in a single step. While when the alphabet is finite this extension does not add expressiveness, it does so when the alphabet is symbolic. We show how such increase in expressiveness causes decision problems such as equivalence to become undecidable and closure properties such as composition to stop holding. We also investigate how the automata counterpart, Extended Symbolic Finite Automata, differs from Symbolic Finite Automata. We then introduce the subclass of Cartesian Extended Symbolic Finite Transducers in which guards are limited to conjunctions of unary predicates. Our main result is an equivalence algorithm for such subclass in the single-valued case. Finally, we model real world problems with Cartesian Extended Symbolic Finite Transducers and use the equivalence algorithm to prove their correctness.

  • Global Progress in Dynamically Interleaved Multiparty Sessions

    19th International Conference on Concurrency Theory

    A multiparty session forms a unit of structured interactions among many participants which follow a prescribed scenario specified as a global type signature. This paper develops, besides a more traditional communication type system, a novel static interaction type system for global progress in dynamically interleaved multiparty sessions.

  • High-Performance Complex Event Processing over Hierarchical Data

    ACM TODS

    While complex event processing (CEP) constitutes a considerable portion of the so called Big Data analytics, current CEP systems can only process data having a simple structure, and are otherwise limited in their ability to efficiently support complex continuous queries on structured or semi-structured information. However, XML-like streams represent a very popular form of data exchange, comprising large portions of social network and RSS feeds, financial feeds, configuration files, and similar applications requiring advanced CEP queries. In this paper, we present the XSeq language and system that support CEP on XML streams, via an extension of XPath that is both powerful and amenable to an efficient implementation. Specifically, the XSeq language extends XPath with natural operators to express sequential and Kleene-* patterns over XML streams, while remaining highly amenable to efficient execution. In fact, XSeq is designed to take full advantage of the recently proposed Visibly Pushdown Automata (VPA), where higher expressive power can be achieved without compromising the computationally attractive properties of finite state automata. Besides the efficiency and expressivity benefits, the choice of VPA as the underlying model also enables XSeq go beyond XML streams and be easily applicable to any data with both sequential and hierarchical structures, including JSON messages, RNA sequences, and software traces. Therefore, we illustrate the XSeq’s power for CEP applications through examples from different domains and provide formal results on its expressiveness and complexity. Finally, we present several optimization techniques for XSeq queries. Our extensive experiments indicate that XSeq brings outstanding performance to CEP applications: two orders of magnitude improvement is obtained over the same queries executed in general-purpose XML engines.

  • Operating System Support For Augmented Reality Applications

    14th Workshop on Hot Topics in Operating Systems

    Augmented reality (AR) takes natural user input (NUI), such as gestures, voice, and eye gaze, and produces digital visual overlays on top of reality seen by a user. Today, multiple shipping AR applications exist, most notably titles for the Microsoft Kinect and smartphone applications such as Layar, Wikitude, and Junaio. Despite this activity, little attention has been paid to operating system support for AR applications. Instead, each AR application today does its own sensing and rendering, with the help of user-level libraries like OpenCV or the Microsoft Kinect SDK. In this paper, we explore how operating systems should evolve to support AR applications. Because AR applications work with fundamentally new inputs and outputs, an OS that supports AR applications needs to re-think the input and display abstractions exposed to applications. Unlike mouse and keyboard, which form explicit, separate channels for user input, NUI requires continuous sensing of the real-world environment, which often has sensitive data mixed with user input. Hence, the OS input abstractions must ensure that user privacy is not violated, and the OS must provide a fine-grained permission system for access to recognized objects like a user's face and skeleton. In addition, because visual outputs of AR applications mix real-world and virtual objects, the synthetic window abstraction in traditional GUIs is no longer viable, and OSes must rethink the display abstractions and their management. We discuss research directions for solving these and other issues and building an OS that let multiple applications share one (augmented) reality.

  • Streaming Tree Transducers

    40th International Colloquium on Automata, Languages and Programming

    We introduce streaming tree transducers as an analyzable and expressive model for transforming hierarchically structured data in a single pass. Given a linear encoding of the input tree, the transducer makes a single left-to-right pass through the input, and computes the linear encoding of the output tree in linear time using a ?nite-state control, a ?nite number of string variables, and a visibly pushdown stack. We establish that the model is closed under regular look ahead: allowing the transducer to make decisions based on a regular property of the remaining input suffix does not increase expressiveness. The expressiveness of the model coincides with transductions de?nable using monadic second-order logic (MSO). Finally, we establish a NExpTime upper bound for checking functional inequivalence of two streaming tree transducers.

  • Regular Functions, Cost Register Automata, and Generalized Min-Cost Problems

    LICS13

    Motivated by the successful application of the theory of regular languages to formal verification of finite-state systems, there is a renewed interest in developing a theory of analyzable functions from strings to numerical values that can provide a foundation for analyzing quantitative properties of finite-state systems. In this paper, we propose a deterministic model for associating costs with strings that is parameterized by operations of interest (such as addition, scaling, and min), a notion of regularity that provides a yardstick to measure expressiveness, and study decision problems and theoretical properties of resulting classes of cost functions. Our definition of regularity relies on the theory of string-to-tree transducers, and allows associating costs with events that are conditional upon regular properties of future events. Our model of cost register automata allows computation of regular functions using multiple "write-only" registers whose values can be combined using the allowed set of operations. We show that classical shortest-path algorithms as well as algorithms designed for computing discounted costs, can be adopted for solving the min-cost problems for the more general classes of functions specified in our model. Cost register automata with min and increment give a deterministic model that is equivalent to weighted automata, an extensively studied nondeterministic model, and this connection results in new insights and new open problems.

  • Static Analysis of String Encoders and Decoders

    Verification, Model Checking and Abstract Interpretation, 2013

    There has been significant interest in static analysis of programs that manipulate strings, in particular in the context of web security. Many types of security vulnerabilities are exposed through flaws in programs such as string encoders, decoders, and sanitizers. Recent work has focused on combining automata and satisfiability modulo theories techniques to address security issues in those programs. These techniques scale to larger alphabets such as Unicode, that is a de facto character encoding standard used in web software. One approach has been to use character predicates to generalize finite state transducers. This technique has made it possible to perform precise analysis of a large class of typical sanitization routines. However, it has not been able to cope well with decoders, that often require to read more than one character at a time. In order to overcome this limitation we introduce a conservative generalization of Symbolic Finite Transducers (SFTs) called Extended Symbolic Finite Transducers (ESFTs) that incorporates the notion of a bounded lookahead. We demonstrate the advantage ESFTs on analyzing programs for which previous approaches did not scale. In our evaluation we use a UTF-16 to UTF-8 translator (utf8encoder) and a UTF-8 to UTF-16 translator (utf8decoder). We show, among other properties, that utf8encoder and utf8decoder are functionally correct.

  • Automated Grading of DFA Constructions

    23th International Joint Conference on Artificial Intelligence

    One challenge in making online education more effective is to develop automatic grading software that can provide meaningful feedback. This paper provides a solution to automatic grading of the standard computation-theory problem that asks a student to construct a deterministic finite automa- ton (DFA) from the given description of its language. We focus on how to assign partial grades for incorrect answers. Each student’s answer is compared to the correct DFA using a hybrid of three techniques devised to capture different classes of errors. First, in an attempt to catch syntactic mis- takes, we compute edit distance between the two DFA descriptions. Second, we consider the entropy of the symmetric difference of the languages of the two DFAs, and compute a score that estimates the fraction of the number of strings on which the student answer is wrong. Our third technique is aimed at capturing mistakes in reading of the problem de- scription. For this purpose, we consider a description language MOSEL, which adds syntactic sugar to the classical Monadic Second Order Logic, and allows defining regular languages in a concise and natural way. We provide algorithms, along with optimizations, for transforming MOSEL descriptions into DFAs and vice-versa. These allow us to compute the syntactic edit distance of the incorrect answer from the correct one in terms of their logical representations. We report an experimental study that evaluates hundreds of answers submitted by (real) students by comparing grades/feedback computed by our tool with human graders. Our conclusion is that the tool is able to assign partial grades in a meaningful way, and should be preferred over human graders for both scale and consistency.

  • Equivalence of Extended Symbolic Finite Transducers

    25th International Conference on Computer-Aided Verification

    Symbolic Finite Transducers augment classic transducers with symbolic alphabets represented as parametric theories. Such extension enables succinctness and the use of potentially infinite alphabets while preserving closure and decidability properties. Extended Symbolic Finite Transducers further extend these objects by allowing transitions to read consecutive input elements in a single step. While when the alphabet is finite this extension does not add expressiveness, it does so when the alphabet is symbolic. We show how such increase in expressiveness causes decision problems such as equivalence to become undecidable and closure properties such as composition to stop holding. We also investigate how the automata counterpart, Extended Symbolic Finite Automata, differs from Symbolic Finite Automata. We then introduce the subclass of Cartesian Extended Symbolic Finite Transducers in which guards are limited to conjunctions of unary predicates. Our main result is an equivalence algorithm for such subclass in the single-valued case. Finally, we model real world problems with Cartesian Extended Symbolic Finite Transducers and use the equivalence algorithm to prove their correctness.

  • Global Progress in Dynamically Interleaved Multiparty Sessions

    19th International Conference on Concurrency Theory

    A multiparty session forms a unit of structured interactions among many participants which follow a prescribed scenario specified as a global type signature. This paper develops, besides a more traditional communication type system, a novel static interaction type system for global progress in dynamically interleaved multiparty sessions.

  • High-Performance Complex Event Processing over Hierarchical Data

    ACM TODS

    While complex event processing (CEP) constitutes a considerable portion of the so called Big Data analytics, current CEP systems can only process data having a simple structure, and are otherwise limited in their ability to efficiently support complex continuous queries on structured or semi-structured information. However, XML-like streams represent a very popular form of data exchange, comprising large portions of social network and RSS feeds, financial feeds, configuration files, and similar applications requiring advanced CEP queries. In this paper, we present the XSeq language and system that support CEP on XML streams, via an extension of XPath that is both powerful and amenable to an efficient implementation. Specifically, the XSeq language extends XPath with natural operators to express sequential and Kleene-* patterns over XML streams, while remaining highly amenable to efficient execution. In fact, XSeq is designed to take full advantage of the recently proposed Visibly Pushdown Automata (VPA), where higher expressive power can be achieved without compromising the computationally attractive properties of finite state automata. Besides the efficiency and expressivity benefits, the choice of VPA as the underlying model also enables XSeq go beyond XML streams and be easily applicable to any data with both sequential and hierarchical structures, including JSON messages, RNA sequences, and software traces. Therefore, we illustrate the XSeq’s power for CEP applications through examples from different domains and provide formal results on its expressiveness and complexity. Finally, we present several optimization techniques for XSeq queries. Our extensive experiments indicate that XSeq brings outstanding performance to CEP applications: two orders of magnitude improvement is obtained over the same queries executed in general-purpose XML engines.

  • Operating System Support For Augmented Reality Applications

    14th Workshop on Hot Topics in Operating Systems

    Augmented reality (AR) takes natural user input (NUI), such as gestures, voice, and eye gaze, and produces digital visual overlays on top of reality seen by a user. Today, multiple shipping AR applications exist, most notably titles for the Microsoft Kinect and smartphone applications such as Layar, Wikitude, and Junaio. Despite this activity, little attention has been paid to operating system support for AR applications. Instead, each AR application today does its own sensing and rendering, with the help of user-level libraries like OpenCV or the Microsoft Kinect SDK. In this paper, we explore how operating systems should evolve to support AR applications. Because AR applications work with fundamentally new inputs and outputs, an OS that supports AR applications needs to re-think the input and display abstractions exposed to applications. Unlike mouse and keyboard, which form explicit, separate channels for user input, NUI requires continuous sensing of the real-world environment, which often has sensitive data mixed with user input. Hence, the OS input abstractions must ensure that user privacy is not violated, and the OS must provide a fine-grained permission system for access to recognized objects like a user's face and skeleton. In addition, because visual outputs of AR applications mix real-world and virtual objects, the synthetic window abstraction in traditional GUIs is no longer viable, and OSes must rethink the display abstractions and their management. We discuss research directions for solving these and other issues and building an OS that let multiple applications share one (augmented) reality.

  • Streaming Tree Transducers

    40th International Colloquium on Automata, Languages and Programming

    We introduce streaming tree transducers as an analyzable and expressive model for transforming hierarchically structured data in a single pass. Given a linear encoding of the input tree, the transducer makes a single left-to-right pass through the input, and computes the linear encoding of the output tree in linear time using a ?nite-state control, a ?nite number of string variables, and a visibly pushdown stack. We establish that the model is closed under regular look ahead: allowing the transducer to make decisions based on a regular property of the remaining input suffix does not increase expressiveness. The expressiveness of the model coincides with transductions de?nable using monadic second-order logic (MSO). Finally, we establish a NExpTime upper bound for checking functional inequivalence of two streaming tree transducers.

  • Regular Functions, Cost Register Automata, and Generalized Min-Cost Problems

    LICS13

    Motivated by the successful application of the theory of regular languages to formal verification of finite-state systems, there is a renewed interest in developing a theory of analyzable functions from strings to numerical values that can provide a foundation for analyzing quantitative properties of finite-state systems. In this paper, we propose a deterministic model for associating costs with strings that is parameterized by operations of interest (such as addition, scaling, and min), a notion of regularity that provides a yardstick to measure expressiveness, and study decision problems and theoretical properties of resulting classes of cost functions. Our definition of regularity relies on the theory of string-to-tree transducers, and allows associating costs with events that are conditional upon regular properties of future events. Our model of cost register automata allows computation of regular functions using multiple "write-only" registers whose values can be combined using the allowed set of operations. We show that classical shortest-path algorithms as well as algorithms designed for computing discounted costs, can be adopted for solving the min-cost problems for the more general classes of functions specified in our model. Cost register automata with min and increment give a deterministic model that is equivalent to weighted automata, an extensively studied nondeterministic model, and this connection results in new insights and new open problems.

  • Static Analysis of String Encoders and Decoders

    Verification, Model Checking and Abstract Interpretation, 2013

    There has been significant interest in static analysis of programs that manipulate strings, in particular in the context of web security. Many types of security vulnerabilities are exposed through flaws in programs such as string encoders, decoders, and sanitizers. Recent work has focused on combining automata and satisfiability modulo theories techniques to address security issues in those programs. These techniques scale to larger alphabets such as Unicode, that is a de facto character encoding standard used in web software. One approach has been to use character predicates to generalize finite state transducers. This technique has made it possible to perform precise analysis of a large class of typical sanitization routines. However, it has not been able to cope well with decoders, that often require to read more than one character at a time. In order to overcome this limitation we introduce a conservative generalization of Symbolic Finite Transducers (SFTs) called Extended Symbolic Finite Transducers (ESFTs) that incorporates the notion of a bounded lookahead. We demonstrate the advantage ESFTs on analyzing programs for which previous approaches did not scale. In our evaluation we use a UTF-16 to UTF-8 translator (utf8encoder) and a UTF-8 to UTF-16 translator (utf8decoder). We show, among other properties, that utf8encoder and utf8decoder are functionally correct.

  • Automated Grading of DFA Constructions

    23th International Joint Conference on Artificial Intelligence

    One challenge in making online education more effective is to develop automatic grading software that can provide meaningful feedback. This paper provides a solution to automatic grading of the standard computation-theory problem that asks a student to construct a deterministic finite automa- ton (DFA) from the given description of its language. We focus on how to assign partial grades for incorrect answers. Each student’s answer is compared to the correct DFA using a hybrid of three techniques devised to capture different classes of errors. First, in an attempt to catch syntactic mis- takes, we compute edit distance between the two DFA descriptions. Second, we consider the entropy of the symmetric difference of the languages of the two DFAs, and compute a score that estimates the fraction of the number of strings on which the student answer is wrong. Our third technique is aimed at capturing mistakes in reading of the problem de- scription. For this purpose, we consider a description language MOSEL, which adds syntactic sugar to the classical Monadic Second Order Logic, and allows defining regular languages in a concise and natural way. We provide algorithms, along with optimizations, for transforming MOSEL descriptions into DFAs and vice-versa. These allow us to compute the syntactic edit distance of the incorrect answer from the correct one in terms of their logical representations. We report an experimental study that evaluates hundreds of answers submitted by (real) students by comparing grades/feedback computed by our tool with human graders. Our conclusion is that the tool is able to assign partial grades in a meaningful way, and should be preferred over human graders for both scale and consistency.

  • Equivalence of Extended Symbolic Finite Transducers

    25th International Conference on Computer-Aided Verification

    Symbolic Finite Transducers augment classic transducers with symbolic alphabets represented as parametric theories. Such extension enables succinctness and the use of potentially infinite alphabets while preserving closure and decidability properties. Extended Symbolic Finite Transducers further extend these objects by allowing transitions to read consecutive input elements in a single step. While when the alphabet is finite this extension does not add expressiveness, it does so when the alphabet is symbolic. We show how such increase in expressiveness causes decision problems such as equivalence to become undecidable and closure properties such as composition to stop holding. We also investigate how the automata counterpart, Extended Symbolic Finite Automata, differs from Symbolic Finite Automata. We then introduce the subclass of Cartesian Extended Symbolic Finite Transducers in which guards are limited to conjunctions of unary predicates. Our main result is an equivalence algorithm for such subclass in the single-valued case. Finally, we model real world problems with Cartesian Extended Symbolic Finite Transducers and use the equivalence algorithm to prove their correctness.

  • Global Progress in Dynamically Interleaved Multiparty Sessions

    19th International Conference on Concurrency Theory

    A multiparty session forms a unit of structured interactions among many participants which follow a prescribed scenario specified as a global type signature. This paper develops, besides a more traditional communication type system, a novel static interaction type system for global progress in dynamically interleaved multiparty sessions.

  • High-Performance Complex Event Processing over Hierarchical Data

    ACM TODS

    While complex event processing (CEP) constitutes a considerable portion of the so called Big Data analytics, current CEP systems can only process data having a simple structure, and are otherwise limited in their ability to efficiently support complex continuous queries on structured or semi-structured information. However, XML-like streams represent a very popular form of data exchange, comprising large portions of social network and RSS feeds, financial feeds, configuration files, and similar applications requiring advanced CEP queries. In this paper, we present the XSeq language and system that support CEP on XML streams, via an extension of XPath that is both powerful and amenable to an efficient implementation. Specifically, the XSeq language extends XPath with natural operators to express sequential and Kleene-* patterns over XML streams, while remaining highly amenable to efficient execution. In fact, XSeq is designed to take full advantage of the recently proposed Visibly Pushdown Automata (VPA), where higher expressive power can be achieved without compromising the computationally attractive properties of finite state automata. Besides the efficiency and expressivity benefits, the choice of VPA as the underlying model also enables XSeq go beyond XML streams and be easily applicable to any data with both sequential and hierarchical structures, including JSON messages, RNA sequences, and software traces. Therefore, we illustrate the XSeq’s power for CEP applications through examples from different domains and provide formal results on its expressiveness and complexity. Finally, we present several optimization techniques for XSeq queries. Our extensive experiments indicate that XSeq brings outstanding performance to CEP applications: two orders of magnitude improvement is obtained over the same queries executed in general-purpose XML engines.

  • Operating System Support For Augmented Reality Applications

    14th Workshop on Hot Topics in Operating Systems

    Augmented reality (AR) takes natural user input (NUI), such as gestures, voice, and eye gaze, and produces digital visual overlays on top of reality seen by a user. Today, multiple shipping AR applications exist, most notably titles for the Microsoft Kinect and smartphone applications such as Layar, Wikitude, and Junaio. Despite this activity, little attention has been paid to operating system support for AR applications. Instead, each AR application today does its own sensing and rendering, with the help of user-level libraries like OpenCV or the Microsoft Kinect SDK. In this paper, we explore how operating systems should evolve to support AR applications. Because AR applications work with fundamentally new inputs and outputs, an OS that supports AR applications needs to re-think the input and display abstractions exposed to applications. Unlike mouse and keyboard, which form explicit, separate channels for user input, NUI requires continuous sensing of the real-world environment, which often has sensitive data mixed with user input. Hence, the OS input abstractions must ensure that user privacy is not violated, and the OS must provide a fine-grained permission system for access to recognized objects like a user's face and skeleton. In addition, because visual outputs of AR applications mix real-world and virtual objects, the synthetic window abstraction in traditional GUIs is no longer viable, and OSes must rethink the display abstractions and their management. We discuss research directions for solving these and other issues and building an OS that let multiple applications share one (augmented) reality.

  • Streaming Tree Transducers

    40th International Colloquium on Automata, Languages and Programming

    We introduce streaming tree transducers as an analyzable and expressive model for transforming hierarchically structured data in a single pass. Given a linear encoding of the input tree, the transducer makes a single left-to-right pass through the input, and computes the linear encoding of the output tree in linear time using a ?nite-state control, a ?nite number of string variables, and a visibly pushdown stack. We establish that the model is closed under regular look ahead: allowing the transducer to make decisions based on a regular property of the remaining input suffix does not increase expressiveness. The expressiveness of the model coincides with transductions de?nable using monadic second-order logic (MSO). Finally, we establish a NExpTime upper bound for checking functional inequivalence of two streaming tree transducers.

  • Regular Functions, Cost Register Automata, and Generalized Min-Cost Problems

    LICS13

    Motivated by the successful application of the theory of regular languages to formal verification of finite-state systems, there is a renewed interest in developing a theory of analyzable functions from strings to numerical values that can provide a foundation for analyzing quantitative properties of finite-state systems. In this paper, we propose a deterministic model for associating costs with strings that is parameterized by operations of interest (such as addition, scaling, and min), a notion of regularity that provides a yardstick to measure expressiveness, and study decision problems and theoretical properties of resulting classes of cost functions. Our definition of regularity relies on the theory of string-to-tree transducers, and allows associating costs with events that are conditional upon regular properties of future events. Our model of cost register automata allows computation of regular functions using multiple "write-only" registers whose values can be combined using the allowed set of operations. We show that classical shortest-path algorithms as well as algorithms designed for computing discounted costs, can be adopted for solving the min-cost problems for the more general classes of functions specified in our model. Cost register automata with min and increment give a deterministic model that is equivalent to weighted automata, an extensively studied nondeterministic model, and this connection results in new insights and new open problems.

  • Static Analysis of String Encoders and Decoders

    Verification, Model Checking and Abstract Interpretation, 2013

    There has been significant interest in static analysis of programs that manipulate strings, in particular in the context of web security. Many types of security vulnerabilities are exposed through flaws in programs such as string encoders, decoders, and sanitizers. Recent work has focused on combining automata and satisfiability modulo theories techniques to address security issues in those programs. These techniques scale to larger alphabets such as Unicode, that is a de facto character encoding standard used in web software. One approach has been to use character predicates to generalize finite state transducers. This technique has made it possible to perform precise analysis of a large class of typical sanitization routines. However, it has not been able to cope well with decoders, that often require to read more than one character at a time. In order to overcome this limitation we introduce a conservative generalization of Symbolic Finite Transducers (SFTs) called Extended Symbolic Finite Transducers (ESFTs) that incorporates the notion of a bounded lookahead. We demonstrate the advantage ESFTs on analyzing programs for which previous approaches did not scale. In our evaluation we use a UTF-16 to UTF-8 translator (utf8encoder) and a UTF-8 to UTF-16 translator (utf8decoder). We show, among other properties, that utf8encoder and utf8decoder are functionally correct.

  • Automated Grading of DFA Constructions

    23th International Joint Conference on Artificial Intelligence

    One challenge in making online education more effective is to develop automatic grading software that can provide meaningful feedback. This paper provides a solution to automatic grading of the standard computation-theory problem that asks a student to construct a deterministic finite automa- ton (DFA) from the given description of its language. We focus on how to assign partial grades for incorrect answers. Each student’s answer is compared to the correct DFA using a hybrid of three techniques devised to capture different classes of errors. First, in an attempt to catch syntactic mis- takes, we compute edit distance between the two DFA descriptions. Second, we consider the entropy of the symmetric difference of the languages of the two DFAs, and compute a score that estimates the fraction of the number of strings on which the student answer is wrong. Our third technique is aimed at capturing mistakes in reading of the problem de- scription. For this purpose, we consider a description language MOSEL, which adds syntactic sugar to the classical Monadic Second Order Logic, and allows defining regular languages in a concise and natural way. We provide algorithms, along with optimizations, for transforming MOSEL descriptions into DFAs and vice-versa. These allow us to compute the syntactic edit distance of the incorrect answer from the correct one in terms of their logical representations. We report an experimental study that evaluates hundreds of answers submitted by (real) students by comparing grades/feedback computed by our tool with human graders. Our conclusion is that the tool is able to assign partial grades in a meaningful way, and should be preferred over human graders for both scale and consistency.

  • Equivalence of Extended Symbolic Finite Transducers

    25th International Conference on Computer-Aided Verification

    Symbolic Finite Transducers augment classic transducers with symbolic alphabets represented as parametric theories. Such extension enables succinctness and the use of potentially infinite alphabets while preserving closure and decidability properties. Extended Symbolic Finite Transducers further extend these objects by allowing transitions to read consecutive input elements in a single step. While when the alphabet is finite this extension does not add expressiveness, it does so when the alphabet is symbolic. We show how such increase in expressiveness causes decision problems such as equivalence to become undecidable and closure properties such as composition to stop holding. We also investigate how the automata counterpart, Extended Symbolic Finite Automata, differs from Symbolic Finite Automata. We then introduce the subclass of Cartesian Extended Symbolic Finite Transducers in which guards are limited to conjunctions of unary predicates. Our main result is an equivalence algorithm for such subclass in the single-valued case. Finally, we model real world problems with Cartesian Extended Symbolic Finite Transducers and use the equivalence algorithm to prove their correctness.

  • Global Progress in Dynamically Interleaved Multiparty Sessions

    19th International Conference on Concurrency Theory

    A multiparty session forms a unit of structured interactions among many participants which follow a prescribed scenario specified as a global type signature. This paper develops, besides a more traditional communication type system, a novel static interaction type system for global progress in dynamically interleaved multiparty sessions.

  • High-Performance Complex Event Processing over Hierarchical Data

    ACM TODS

    While complex event processing (CEP) constitutes a considerable portion of the so called Big Data analytics, current CEP systems can only process data having a simple structure, and are otherwise limited in their ability to efficiently support complex continuous queries on structured or semi-structured information. However, XML-like streams represent a very popular form of data exchange, comprising large portions of social network and RSS feeds, financial feeds, configuration files, and similar applications requiring advanced CEP queries. In this paper, we present the XSeq language and system that support CEP on XML streams, via an extension of XPath that is both powerful and amenable to an efficient implementation. Specifically, the XSeq language extends XPath with natural operators to express sequential and Kleene-* patterns over XML streams, while remaining highly amenable to efficient execution. In fact, XSeq is designed to take full advantage of the recently proposed Visibly Pushdown Automata (VPA), where higher expressive power can be achieved without compromising the computationally attractive properties of finite state automata. Besides the efficiency and expressivity benefits, the choice of VPA as the underlying model also enables XSeq go beyond XML streams and be easily applicable to any data with both sequential and hierarchical structures, including JSON messages, RNA sequences, and software traces. Therefore, we illustrate the XSeq’s power for CEP applications through examples from different domains and provide formal results on its expressiveness and complexity. Finally, we present several optimization techniques for XSeq queries. Our extensive experiments indicate that XSeq brings outstanding performance to CEP applications: two orders of magnitude improvement is obtained over the same queries executed in general-purpose XML engines.

  • Operating System Support For Augmented Reality Applications

    14th Workshop on Hot Topics in Operating Systems

    Augmented reality (AR) takes natural user input (NUI), such as gestures, voice, and eye gaze, and produces digital visual overlays on top of reality seen by a user. Today, multiple shipping AR applications exist, most notably titles for the Microsoft Kinect and smartphone applications such as Layar, Wikitude, and Junaio. Despite this activity, little attention has been paid to operating system support for AR applications. Instead, each AR application today does its own sensing and rendering, with the help of user-level libraries like OpenCV or the Microsoft Kinect SDK. In this paper, we explore how operating systems should evolve to support AR applications. Because AR applications work with fundamentally new inputs and outputs, an OS that supports AR applications needs to re-think the input and display abstractions exposed to applications. Unlike mouse and keyboard, which form explicit, separate channels for user input, NUI requires continuous sensing of the real-world environment, which often has sensitive data mixed with user input. Hence, the OS input abstractions must ensure that user privacy is not violated, and the OS must provide a fine-grained permission system for access to recognized objects like a user's face and skeleton. In addition, because visual outputs of AR applications mix real-world and virtual objects, the synthetic window abstraction in traditional GUIs is no longer viable, and OSes must rethink the display abstractions and their management. We discuss research directions for solving these and other issues and building an OS that let multiple applications share one (augmented) reality.

  • Streaming Tree Transducers

    40th International Colloquium on Automata, Languages and Programming

    We introduce streaming tree transducers as an analyzable and expressive model for transforming hierarchically structured data in a single pass. Given a linear encoding of the input tree, the transducer makes a single left-to-right pass through the input, and computes the linear encoding of the output tree in linear time using a ?nite-state control, a ?nite number of string variables, and a visibly pushdown stack. We establish that the model is closed under regular look ahead: allowing the transducer to make decisions based on a regular property of the remaining input suffix does not increase expressiveness. The expressiveness of the model coincides with transductions de?nable using monadic second-order logic (MSO). Finally, we establish a NExpTime upper bound for checking functional inequivalence of two streaming tree transducers.

  • Regular Functions, Cost Register Automata, and Generalized Min-Cost Problems

    LICS13

    Motivated by the successful application of the theory of regular languages to formal verification of finite-state systems, there is a renewed interest in developing a theory of analyzable functions from strings to numerical values that can provide a foundation for analyzing quantitative properties of finite-state systems. In this paper, we propose a deterministic model for associating costs with strings that is parameterized by operations of interest (such as addition, scaling, and min), a notion of regularity that provides a yardstick to measure expressiveness, and study decision problems and theoretical properties of resulting classes of cost functions. Our definition of regularity relies on the theory of string-to-tree transducers, and allows associating costs with events that are conditional upon regular properties of future events. Our model of cost register automata allows computation of regular functions using multiple "write-only" registers whose values can be combined using the allowed set of operations. We show that classical shortest-path algorithms as well as algorithms designed for computing discounted costs, can be adopted for solving the min-cost problems for the more general classes of functions specified in our model. Cost register automata with min and increment give a deterministic model that is equivalent to weighted automata, an extensively studied nondeterministic model, and this connection results in new insights and new open problems.

  • Static Analysis of String Encoders and Decoders

    Verification, Model Checking and Abstract Interpretation, 2013

    There has been significant interest in static analysis of programs that manipulate strings, in particular in the context of web security. Many types of security vulnerabilities are exposed through flaws in programs such as string encoders, decoders, and sanitizers. Recent work has focused on combining automata and satisfiability modulo theories techniques to address security issues in those programs. These techniques scale to larger alphabets such as Unicode, that is a de facto character encoding standard used in web software. One approach has been to use character predicates to generalize finite state transducers. This technique has made it possible to perform precise analysis of a large class of typical sanitization routines. However, it has not been able to cope well with decoders, that often require to read more than one character at a time. In order to overcome this limitation we introduce a conservative generalization of Symbolic Finite Transducers (SFTs) called Extended Symbolic Finite Transducers (ESFTs) that incorporates the notion of a bounded lookahead. We demonstrate the advantage ESFTs on analyzing programs for which previous approaches did not scale. In our evaluation we use a UTF-16 to UTF-8 translator (utf8encoder) and a UTF-8 to UTF-16 translator (utf8decoder). We show, among other properties, that utf8encoder and utf8decoder are functionally correct.

  • Automated Grading of DFA Constructions

    23th International Joint Conference on Artificial Intelligence

    One challenge in making online education more effective is to develop automatic grading software that can provide meaningful feedback. This paper provides a solution to automatic grading of the standard computation-theory problem that asks a student to construct a deterministic finite automa- ton (DFA) from the given description of its language. We focus on how to assign partial grades for incorrect answers. Each student’s answer is compared to the correct DFA using a hybrid of three techniques devised to capture different classes of errors. First, in an attempt to catch syntactic mis- takes, we compute edit distance between the two DFA descriptions. Second, we consider the entropy of the symmetric difference of the languages of the two DFAs, and compute a score that estimates the fraction of the number of strings on which the student answer is wrong. Our third technique is aimed at capturing mistakes in reading of the problem de- scription. For this purpose, we consider a description language MOSEL, which adds syntactic sugar to the classical Monadic Second Order Logic, and allows defining regular languages in a concise and natural way. We provide algorithms, along with optimizations, for transforming MOSEL descriptions into DFAs and vice-versa. These allow us to compute the syntactic edit distance of the incorrect answer from the correct one in terms of their logical representations. We report an experimental study that evaluates hundreds of answers submitted by (real) students by comparing grades/feedback computed by our tool with human graders. Our conclusion is that the tool is able to assign partial grades in a meaningful way, and should be preferred over human graders for both scale and consistency.

  • Equivalence of Extended Symbolic Finite Transducers

    25th International Conference on Computer-Aided Verification

    Symbolic Finite Transducers augment classic transducers with symbolic alphabets represented as parametric theories. Such extension enables succinctness and the use of potentially infinite alphabets while preserving closure and decidability properties. Extended Symbolic Finite Transducers further extend these objects by allowing transitions to read consecutive input elements in a single step. While when the alphabet is finite this extension does not add expressiveness, it does so when the alphabet is symbolic. We show how such increase in expressiveness causes decision problems such as equivalence to become undecidable and closure properties such as composition to stop holding. We also investigate how the automata counterpart, Extended Symbolic Finite Automata, differs from Symbolic Finite Automata. We then introduce the subclass of Cartesian Extended Symbolic Finite Transducers in which guards are limited to conjunctions of unary predicates. Our main result is an equivalence algorithm for such subclass in the single-valued case. Finally, we model real world problems with Cartesian Extended Symbolic Finite Transducers and use the equivalence algorithm to prove their correctness.

  • Global Progress in Dynamically Interleaved Multiparty Sessions

    19th International Conference on Concurrency Theory

    A multiparty session forms a unit of structured interactions among many participants which follow a prescribed scenario specified as a global type signature. This paper develops, besides a more traditional communication type system, a novel static interaction type system for global progress in dynamically interleaved multiparty sessions.

  • High-Performance Complex Event Processing over Hierarchical Data

    ACM TODS

    While complex event processing (CEP) constitutes a considerable portion of the so called Big Data analytics, current CEP systems can only process data having a simple structure, and are otherwise limited in their ability to efficiently support complex continuous queries on structured or semi-structured information. However, XML-like streams represent a very popular form of data exchange, comprising large portions of social network and RSS feeds, financial feeds, configuration files, and similar applications requiring advanced CEP queries. In this paper, we present the XSeq language and system that support CEP on XML streams, via an extension of XPath that is both powerful and amenable to an efficient implementation. Specifically, the XSeq language extends XPath with natural operators to express sequential and Kleene-* patterns over XML streams, while remaining highly amenable to efficient execution. In fact, XSeq is designed to take full advantage of the recently proposed Visibly Pushdown Automata (VPA), where higher expressive power can be achieved without compromising the computationally attractive properties of finite state automata. Besides the efficiency and expressivity benefits, the choice of VPA as the underlying model also enables XSeq go beyond XML streams and be easily applicable to any data with both sequential and hierarchical structures, including JSON messages, RNA sequences, and software traces. Therefore, we illustrate the XSeq’s power for CEP applications through examples from different domains and provide formal results on its expressiveness and complexity. Finally, we present several optimization techniques for XSeq queries. Our extensive experiments indicate that XSeq brings outstanding performance to CEP applications: two orders of magnitude improvement is obtained over the same queries executed in general-purpose XML engines.

  • Operating System Support For Augmented Reality Applications

    14th Workshop on Hot Topics in Operating Systems

    Augmented reality (AR) takes natural user input (NUI), such as gestures, voice, and eye gaze, and produces digital visual overlays on top of reality seen by a user. Today, multiple shipping AR applications exist, most notably titles for the Microsoft Kinect and smartphone applications such as Layar, Wikitude, and Junaio. Despite this activity, little attention has been paid to operating system support for AR applications. Instead, each AR application today does its own sensing and rendering, with the help of user-level libraries like OpenCV or the Microsoft Kinect SDK. In this paper, we explore how operating systems should evolve to support AR applications. Because AR applications work with fundamentally new inputs and outputs, an OS that supports AR applications needs to re-think the input and display abstractions exposed to applications. Unlike mouse and keyboard, which form explicit, separate channels for user input, NUI requires continuous sensing of the real-world environment, which often has sensitive data mixed with user input. Hence, the OS input abstractions must ensure that user privacy is not violated, and the OS must provide a fine-grained permission system for access to recognized objects like a user's face and skeleton. In addition, because visual outputs of AR applications mix real-world and virtual objects, the synthetic window abstraction in traditional GUIs is no longer viable, and OSes must rethink the display abstractions and their management. We discuss research directions for solving these and other issues and building an OS that let multiple applications share one (augmented) reality.

  • Streaming Tree Transducers

    40th International Colloquium on Automata, Languages and Programming

    We introduce streaming tree transducers as an analyzable and expressive model for transforming hierarchically structured data in a single pass. Given a linear encoding of the input tree, the transducer makes a single left-to-right pass through the input, and computes the linear encoding of the output tree in linear time using a ?nite-state control, a ?nite number of string variables, and a visibly pushdown stack. We establish that the model is closed under regular look ahead: allowing the transducer to make decisions based on a regular property of the remaining input suffix does not increase expressiveness. The expressiveness of the model coincides with transductions de?nable using monadic second-order logic (MSO). Finally, we establish a NExpTime upper bound for checking functional inequivalence of two streaming tree transducers.

  • Regular Functions, Cost Register Automata, and Generalized Min-Cost Problems

    LICS13

    Motivated by the successful application of the theory of regular languages to formal verification of finite-state systems, there is a renewed interest in developing a theory of analyzable functions from strings to numerical values that can provide a foundation for analyzing quantitative properties of finite-state systems. In this paper, we propose a deterministic model for associating costs with strings that is parameterized by operations of interest (such as addition, scaling, and min), a notion of regularity that provides a yardstick to measure expressiveness, and study decision problems and theoretical properties of resulting classes of cost functions. Our definition of regularity relies on the theory of string-to-tree transducers, and allows associating costs with events that are conditional upon regular properties of future events. Our model of cost register automata allows computation of regular functions using multiple "write-only" registers whose values can be combined using the allowed set of operations. We show that classical shortest-path algorithms as well as algorithms designed for computing discounted costs, can be adopted for solving the min-cost problems for the more general classes of functions specified in our model. Cost register automata with min and increment give a deterministic model that is equivalent to weighted automata, an extensively studied nondeterministic model, and this connection results in new insights and new open problems.

  • Static Analysis of String Encoders and Decoders

    Verification, Model Checking and Abstract Interpretation, 2013

    There has been significant interest in static analysis of programs that manipulate strings, in particular in the context of web security. Many types of security vulnerabilities are exposed through flaws in programs such as string encoders, decoders, and sanitizers. Recent work has focused on combining automata and satisfiability modulo theories techniques to address security issues in those programs. These techniques scale to larger alphabets such as Unicode, that is a de facto character encoding standard used in web software. One approach has been to use character predicates to generalize finite state transducers. This technique has made it possible to perform precise analysis of a large class of typical sanitization routines. However, it has not been able to cope well with decoders, that often require to read more than one character at a time. In order to overcome this limitation we introduce a conservative generalization of Symbolic Finite Transducers (SFTs) called Extended Symbolic Finite Transducers (ESFTs) that incorporates the notion of a bounded lookahead. We demonstrate the advantage ESFTs on analyzing programs for which previous approaches did not scale. In our evaluation we use a UTF-16 to UTF-8 translator (utf8encoder) and a UTF-8 to UTF-16 translator (utf8decoder). We show, among other properties, that utf8encoder and utf8decoder are functionally correct.

  • Automated Grading of DFA Constructions

    23th International Joint Conference on Artificial Intelligence

    One challenge in making online education more effective is to develop automatic grading software that can provide meaningful feedback. This paper provides a solution to automatic grading of the standard computation-theory problem that asks a student to construct a deterministic finite automa- ton (DFA) from the given description of its language. We focus on how to assign partial grades for incorrect answers. Each student’s answer is compared to the correct DFA using a hybrid of three techniques devised to capture different classes of errors. First, in an attempt to catch syntactic mis- takes, we compute edit distance between the two DFA descriptions. Second, we consider the entropy of the symmetric difference of the languages of the two DFAs, and compute a score that estimates the fraction of the number of strings on which the student answer is wrong. Our third technique is aimed at capturing mistakes in reading of the problem de- scription. For this purpose, we consider a description language MOSEL, which adds syntactic sugar to the classical Monadic Second Order Logic, and allows defining regular languages in a concise and natural way. We provide algorithms, along with optimizations, for transforming MOSEL descriptions into DFAs and vice-versa. These allow us to compute the syntactic edit distance of the incorrect answer from the correct one in terms of their logical representations. We report an experimental study that evaluates hundreds of answers submitted by (real) students by comparing grades/feedback computed by our tool with human graders. Our conclusion is that the tool is able to assign partial grades in a meaningful way, and should be preferred over human graders for both scale and consistency.

  • Equivalence of Extended Symbolic Finite Transducers

    25th International Conference on Computer-Aided Verification

    Symbolic Finite Transducers augment classic transducers with symbolic alphabets represented as parametric theories. Such extension enables succinctness and the use of potentially infinite alphabets while preserving closure and decidability properties. Extended Symbolic Finite Transducers further extend these objects by allowing transitions to read consecutive input elements in a single step. While when the alphabet is finite this extension does not add expressiveness, it does so when the alphabet is symbolic. We show how such increase in expressiveness causes decision problems such as equivalence to become undecidable and closure properties such as composition to stop holding. We also investigate how the automata counterpart, Extended Symbolic Finite Automata, differs from Symbolic Finite Automata. We then introduce the subclass of Cartesian Extended Symbolic Finite Transducers in which guards are limited to conjunctions of unary predicates. Our main result is an equivalence algorithm for such subclass in the single-valued case. Finally, we model real world problems with Cartesian Extended Symbolic Finite Transducers and use the equivalence algorithm to prove their correctness.

  • Global Progress in Dynamically Interleaved Multiparty Sessions

    19th International Conference on Concurrency Theory

    A multiparty session forms a unit of structured interactions among many participants which follow a prescribed scenario specified as a global type signature. This paper develops, besides a more traditional communication type system, a novel static interaction type system for global progress in dynamically interleaved multiparty sessions.

  • High-Performance Complex Event Processing over Hierarchical Data

    ACM TODS

    While complex event processing (CEP) constitutes a considerable portion of the so called Big Data analytics, current CEP systems can only process data having a simple structure, and are otherwise limited in their ability to efficiently support complex continuous queries on structured or semi-structured information. However, XML-like streams represent a very popular form of data exchange, comprising large portions of social network and RSS feeds, financial feeds, configuration files, and similar applications requiring advanced CEP queries. In this paper, we present the XSeq language and system that support CEP on XML streams, via an extension of XPath that is both powerful and amenable to an efficient implementation. Specifically, the XSeq language extends XPath with natural operators to express sequential and Kleene-* patterns over XML streams, while remaining highly amenable to efficient execution. In fact, XSeq is designed to take full advantage of the recently proposed Visibly Pushdown Automata (VPA), where higher expressive power can be achieved without compromising the computationally attractive properties of finite state automata. Besides the efficiency and expressivity benefits, the choice of VPA as the underlying model also enables XSeq go beyond XML streams and be easily applicable to any data with both sequential and hierarchical structures, including JSON messages, RNA sequences, and software traces. Therefore, we illustrate the XSeq’s power for CEP applications through examples from different domains and provide formal results on its expressiveness and complexity. Finally, we present several optimization techniques for XSeq queries. Our extensive experiments indicate that XSeq brings outstanding performance to CEP applications: two orders of magnitude improvement is obtained over the same queries executed in general-purpose XML engines.

  • Operating System Support For Augmented Reality Applications

    14th Workshop on Hot Topics in Operating Systems

    Augmented reality (AR) takes natural user input (NUI), such as gestures, voice, and eye gaze, and produces digital visual overlays on top of reality seen by a user. Today, multiple shipping AR applications exist, most notably titles for the Microsoft Kinect and smartphone applications such as Layar, Wikitude, and Junaio. Despite this activity, little attention has been paid to operating system support for AR applications. Instead, each AR application today does its own sensing and rendering, with the help of user-level libraries like OpenCV or the Microsoft Kinect SDK. In this paper, we explore how operating systems should evolve to support AR applications. Because AR applications work with fundamentally new inputs and outputs, an OS that supports AR applications needs to re-think the input and display abstractions exposed to applications. Unlike mouse and keyboard, which form explicit, separate channels for user input, NUI requires continuous sensing of the real-world environment, which often has sensitive data mixed with user input. Hence, the OS input abstractions must ensure that user privacy is not violated, and the OS must provide a fine-grained permission system for access to recognized objects like a user's face and skeleton. In addition, because visual outputs of AR applications mix real-world and virtual objects, the synthetic window abstraction in traditional GUIs is no longer viable, and OSes must rethink the display abstractions and their management. We discuss research directions for solving these and other issues and building an OS that let multiple applications share one (augmented) reality.

  • Streaming Tree Transducers

    40th International Colloquium on Automata, Languages and Programming

    We introduce streaming tree transducers as an analyzable and expressive model for transforming hierarchically structured data in a single pass. Given a linear encoding of the input tree, the transducer makes a single left-to-right pass through the input, and computes the linear encoding of the output tree in linear time using a ?nite-state control, a ?nite number of string variables, and a visibly pushdown stack. We establish that the model is closed under regular look ahead: allowing the transducer to make decisions based on a regular property of the remaining input suffix does not increase expressiveness. The expressiveness of the model coincides with transductions de?nable using monadic second-order logic (MSO). Finally, we establish a NExpTime upper bound for checking functional inequivalence of two streaming tree transducers.

  • Regular Functions, Cost Register Automata, and Generalized Min-Cost Problems

    LICS13

    Motivated by the successful application of the theory of regular languages to formal verification of finite-state systems, there is a renewed interest in developing a theory of analyzable functions from strings to numerical values that can provide a foundation for analyzing quantitative properties of finite-state systems. In this paper, we propose a deterministic model for associating costs with strings that is parameterized by operations of interest (such as addition, scaling, and min), a notion of regularity that provides a yardstick to measure expressiveness, and study decision problems and theoretical properties of resulting classes of cost functions. Our definition of regularity relies on the theory of string-to-tree transducers, and allows associating costs with events that are conditional upon regular properties of future events. Our model of cost register automata allows computation of regular functions using multiple "write-only" registers whose values can be combined using the allowed set of operations. We show that classical shortest-path algorithms as well as algorithms designed for computing discounted costs, can be adopted for solving the min-cost problems for the more general classes of functions specified in our model. Cost register automata with min and increment give a deterministic model that is equivalent to weighted automata, an extensively studied nondeterministic model, and this connection results in new insights and new open problems.

  • Static Analysis of String Encoders and Decoders

    Verification, Model Checking and Abstract Interpretation, 2013

    There has been significant interest in static analysis of programs that manipulate strings, in particular in the context of web security. Many types of security vulnerabilities are exposed through flaws in programs such as string encoders, decoders, and sanitizers. Recent work has focused on combining automata and satisfiability modulo theories techniques to address security issues in those programs. These techniques scale to larger alphabets such as Unicode, that is a de facto character encoding standard used in web software. One approach has been to use character predicates to generalize finite state transducers. This technique has made it possible to perform precise analysis of a large class of typical sanitization routines. However, it has not been able to cope well with decoders, that often require to read more than one character at a time. In order to overcome this limitation we introduce a conservative generalization of Symbolic Finite Transducers (SFTs) called Extended Symbolic Finite Transducers (ESFTs) that incorporates the notion of a bounded lookahead. We demonstrate the advantage ESFTs on analyzing programs for which previous approaches did not scale. In our evaluation we use a UTF-16 to UTF-8 translator (utf8encoder) and a UTF-8 to UTF-16 translator (utf8decoder). We show, among other properties, that utf8encoder and utf8decoder are functionally correct.

  • Automated Grading of DFA Constructions

    23th International Joint Conference on Artificial Intelligence

    One challenge in making online education more effective is to develop automatic grading software that can provide meaningful feedback. This paper provides a solution to automatic grading of the standard computation-theory problem that asks a student to construct a deterministic finite automa- ton (DFA) from the given description of its language. We focus on how to assign partial grades for incorrect answers. Each student’s answer is compared to the correct DFA using a hybrid of three techniques devised to capture different classes of errors. First, in an attempt to catch syntactic mis- takes, we compute edit distance between the two DFA descriptions. Second, we consider the entropy of the symmetric difference of the languages of the two DFAs, and compute a score that estimates the fraction of the number of strings on which the student answer is wrong. Our third technique is aimed at capturing mistakes in reading of the problem de- scription. For this purpose, we consider a description language MOSEL, which adds syntactic sugar to the classical Monadic Second Order Logic, and allows defining regular languages in a concise and natural way. We provide algorithms, along with optimizations, for transforming MOSEL descriptions into DFAs and vice-versa. These allow us to compute the syntactic edit distance of the incorrect answer from the correct one in terms of their logical representations. We report an experimental study that evaluates hundreds of answers submitted by (real) students by comparing grades/feedback computed by our tool with human graders. Our conclusion is that the tool is able to assign partial grades in a meaningful way, and should be preferred over human graders for both scale and consistency.

  • Equivalence of Extended Symbolic Finite Transducers

    25th International Conference on Computer-Aided Verification

    Symbolic Finite Transducers augment classic transducers with symbolic alphabets represented as parametric theories. Such extension enables succinctness and the use of potentially infinite alphabets while preserving closure and decidability properties. Extended Symbolic Finite Transducers further extend these objects by allowing transitions to read consecutive input elements in a single step. While when the alphabet is finite this extension does not add expressiveness, it does so when the alphabet is symbolic. We show how such increase in expressiveness causes decision problems such as equivalence to become undecidable and closure properties such as composition to stop holding. We also investigate how the automata counterpart, Extended Symbolic Finite Automata, differs from Symbolic Finite Automata. We then introduce the subclass of Cartesian Extended Symbolic Finite Transducers in which guards are limited to conjunctions of unary predicates. Our main result is an equivalence algorithm for such subclass in the single-valued case. Finally, we model real world problems with Cartesian Extended Symbolic Finite Transducers and use the equivalence algorithm to prove their correctness.

CS 536

3.8(7)