Machine learning (ML) presents new challenges for reproducible software engineering, as the artif... more Machine learning (ML) presents new challenges for reproducible software engineering, as the artifacts required for repeatably training models are not just versioned code, but also hyperparameters, code dependencies, and the exact version of the training data. Existing systems for tracking the lineage of ML artifacts, such as TensorFlow Extended or MLFlow, are invasive, requiring developers to refactor their code that now is controlled by the external system. In this paper, we present an alternative approach, we call implicit provenance, where we instrument a distributed file system and APIs to capture changes to ML artifacts, that, along with file naming conventions, mean that full lineage can be tracked for TensorFlow/Keras/Pytorch programs without requiring code changes. We address challenges related to adding strongly consistent metadata extensions to the distributed file system, while minimizing provenance overhead, and ensuring transparent eventual consistent replication of ext...
�Next week: �examination of your solutions �You will be called �remember: bonus points… �What is ... more �Next week: �examination of your solutions �You will be called �remember: bonus points… �What is the next assignment: �Theory assignment (no programming) �Extra voluntary assignment
Modern enterprise applications are currently undergoing a complete paradigm shift away from tradi... more Modern enterprise applications are currently undergoing a complete paradigm shift away from traditional transactional processing to combined analytical and transactional processing. This challenge of combining two opposing query types in a single database management system results in additional requirements for transaction management as well. In this paper, we discuss our approach to achieve high throughput for transactional query processing while allowing concurrent analytical queries. We present our approach to distributed snapshot isolation and optimized two-phase commit protocols.
Proceedings of the ACM on Programming Languages, 2020
Oz is a programming language designed to support multiple programming paradigms in a clean factor... more Oz is a programming language designed to support multiple programming paradigms in a clean factored way that is easy to program despite its broad coverage. It started in 1991 as a collaborative effort by the DFKI (Germany) and SICS (Sweden) and led to an influential system, Mozart, that was released in 1999 and widely used in the 2000s for practical applications and education. We give the history of Oz as it developed from its origins in logic programming, starting with Prolog, followed by concurrent logic programming and constraint logic programming, and leading to its two direct precursors, the concurrent constraint model and the Andorra Kernel Language (AKL). We give the lessons learned from the Oz effort including successes and failures and we explain the principles underlying the Oz design. Oz is defined through a kernel language, which is a formal model similar to a foundational calculus, but that is designed to be directly useful to the programmer. The kernel language is orga...
This Manual corresponds to SICStus Prolog release 2.1. #8 The Prolog library comprises a number o... more This Manual corresponds to SICStus Prolog release 2.1. #8 The Prolog library comprises a number of packages which are thought to be useful in a number of applications. Note that the predicates in the Prolog library are built-in predicates. One has to explicity load each package to get access to its predicates. To load a library package Package, you will normally enter a query. I ?- use_module(library(Package)). Library packages may be compiled and consulted as well as loaded.
From its early days, the World Wide Web space has demonstrated strong agglomeration trends with a... more From its early days, the World Wide Web space has demonstrated strong agglomeration trends with a very small number of web sites capturing the larger part of the Internet population. At a first glance, agglomeration over the virtual space sounds as a paradox. Web sites are numerous and highly diversified and can be easily reached from everywhere and anybody, with no particular transportation or search cost. However, Internet users use only a small number of sites for searching for information and products, interacting with others and socialize, thus producing dense concentrations and locational patterns similar to those observed in the physical space where few cities and industrial clusters host the huge majority of population and the entire industrial activity. Is that depending on the attractiveness of the popular web sites or are there agglomeration economies providing incentives to users to be in a location which have been visited by other users or pointed-in by other sites? Thi...
Proceedings of the 6th ACM Multimedia Systems Conference, 2015
In recent years, adaptive HTTP streaming protocols have become the de-facto standard in the indus... more In recent years, adaptive HTTP streaming protocols have become the de-facto standard in the industry for the distribution of live and video-on-demand content over the Internet. This paper presents SmoothCache 2.0, a distributed cache platform for adaptive HTTP live streaming content based on peer-to-peer (P2P) overlays. The contribution of this work is twofold. From a systems perspective, to the best of our knowledge, it is the only P2P platform which supports recent live streaming protocols based on HTTP as a transport and the concept of adaptive bitrate switching. From an algorithmic perspective, the system describes a novel set of overlay construction and prefetching techniques that realize: i) substantial savings in terms of the bandwidth load on the source of the stream, and ii) CDN-quality user experience in terms of playback latency and the watched bitrate. In order to support our claims, we conduct a methodical evaluation on thousands of real consumer machines.
Machine learning (ML) presents new challenges for reproducible software engineering, as the artif... more Machine learning (ML) presents new challenges for reproducible software engineering, as the artifacts required for repeatably training models are not just versioned code, but also hyperparameters, code dependencies, and the exact version of the training data. Existing systems for tracking the lineage of ML artifacts, such as TensorFlow Extended or MLFlow, are invasive, requiring developers to refactor their code that now is controlled by the external system. In this paper, we present an alternative approach, we call implicit provenance, where we instrument a distributed file system and APIs to capture changes to ML artifacts, that, along with file naming conventions, mean that full lineage can be tracked for TensorFlow/Keras/Pytorch programs without requiring code changes. We address challenges related to adding strongly consistent metadata extensions to the distributed file system, while minimizing provenance overhead, and ensuring transparent eventual consistent replication of ext...
�Next week: �examination of your solutions �You will be called �remember: bonus points… �What is ... more �Next week: �examination of your solutions �You will be called �remember: bonus points… �What is the next assignment: �Theory assignment (no programming) �Extra voluntary assignment
Modern enterprise applications are currently undergoing a complete paradigm shift away from tradi... more Modern enterprise applications are currently undergoing a complete paradigm shift away from traditional transactional processing to combined analytical and transactional processing. This challenge of combining two opposing query types in a single database management system results in additional requirements for transaction management as well. In this paper, we discuss our approach to achieve high throughput for transactional query processing while allowing concurrent analytical queries. We present our approach to distributed snapshot isolation and optimized two-phase commit protocols.
Proceedings of the ACM on Programming Languages, 2020
Oz is a programming language designed to support multiple programming paradigms in a clean factor... more Oz is a programming language designed to support multiple programming paradigms in a clean factored way that is easy to program despite its broad coverage. It started in 1991 as a collaborative effort by the DFKI (Germany) and SICS (Sweden) and led to an influential system, Mozart, that was released in 1999 and widely used in the 2000s for practical applications and education. We give the history of Oz as it developed from its origins in logic programming, starting with Prolog, followed by concurrent logic programming and constraint logic programming, and leading to its two direct precursors, the concurrent constraint model and the Andorra Kernel Language (AKL). We give the lessons learned from the Oz effort including successes and failures and we explain the principles underlying the Oz design. Oz is defined through a kernel language, which is a formal model similar to a foundational calculus, but that is designed to be directly useful to the programmer. The kernel language is orga...
This Manual corresponds to SICStus Prolog release 2.1. #8 The Prolog library comprises a number o... more This Manual corresponds to SICStus Prolog release 2.1. #8 The Prolog library comprises a number of packages which are thought to be useful in a number of applications. Note that the predicates in the Prolog library are built-in predicates. One has to explicity load each package to get access to its predicates. To load a library package Package, you will normally enter a query. I ?- use_module(library(Package)). Library packages may be compiled and consulted as well as loaded.
From its early days, the World Wide Web space has demonstrated strong agglomeration trends with a... more From its early days, the World Wide Web space has demonstrated strong agglomeration trends with a very small number of web sites capturing the larger part of the Internet population. At a first glance, agglomeration over the virtual space sounds as a paradox. Web sites are numerous and highly diversified and can be easily reached from everywhere and anybody, with no particular transportation or search cost. However, Internet users use only a small number of sites for searching for information and products, interacting with others and socialize, thus producing dense concentrations and locational patterns similar to those observed in the physical space where few cities and industrial clusters host the huge majority of population and the entire industrial activity. Is that depending on the attractiveness of the popular web sites or are there agglomeration economies providing incentives to users to be in a location which have been visited by other users or pointed-in by other sites? Thi...
Proceedings of the 6th ACM Multimedia Systems Conference, 2015
In recent years, adaptive HTTP streaming protocols have become the de-facto standard in the indus... more In recent years, adaptive HTTP streaming protocols have become the de-facto standard in the industry for the distribution of live and video-on-demand content over the Internet. This paper presents SmoothCache 2.0, a distributed cache platform for adaptive HTTP live streaming content based on peer-to-peer (P2P) overlays. The contribution of this work is twofold. From a systems perspective, to the best of our knowledge, it is the only P2P platform which supports recent live streaming protocols based on HTTP as a transport and the concept of adaptive bitrate switching. From an algorithmic perspective, the system describes a novel set of overlay construction and prefetching techniques that realize: i) substantial savings in terms of the bandwidth load on the source of the stream, and ii) CDN-quality user experience in terms of playback latency and the watched bitrate. In order to support our claims, we conduct a methodical evaluation on thousands of real consumer machines.
Uploads
Papers by seif haridi