Byung-Gon Chun

My Research Areas

My research focuses on building operating and networked system platforms for smartphones, Big Data analytics, cloud computing, and mobile/cloud security. I’m interested in conferences such as SOSP, OSDI, NSDI, SOCC, MobiSys, USENIX ATC, USENIX Security, Oakland, and CCS. I present an overview of my research areas below.

Mobile-Cloud Systems

Mobile applications have become ubiquitous thanks to smartphones and tablets. These applications provide ever richer functionality with cloud services. I have worked on systems that serve data and computation efficiently between mobile devices and the cloud for performance and energy: systems that transparently and intelligently manage data (Mobius), and systems that dynamically adjust application execution between mobile devices and the cloud (e.g., offloading computation from device to cloud) (CloneCloud).

Intelligent Data Serving for Mobile Apps
Mobile application development is challenging for several reasons: intermittent and limited network connectivity, tight power constraints, server-side scalability concerns, and a number of fault tolerance issues. Developers handcraft complex solutions that include client-side caching, conflict resolution, disconnection tolerance, and backend database sharding. To simplify mobile app development, I have developed Mobius, which introduces MUD (Messaging Unified with Data). MUD presents the programming abstraction of a logical table of data that spans devices and clouds. Applications using Mobius can asynchronously read from/write to MUD tables. Applications can also receive notifications when tables change via continuous queries on the tables. The system combines dynamic client-side caching (with intelligent policies chosen on the server-side, based on usage patterns across multiple applications), notification services, flexible query processing, and a scalable and highly available cloud storage system.

[CCS+12] B.-G. Chun, C. Curino, R. Sears, A. Shraer, S. Madden, and R. Ramakrishnan. Mobius: Unified messaging and data serving for mobile apps. In ACM MobiSys, 2012.

Elastic Execution between Mobile Device and Cloud
CloneCloud addresses smartphones’ hardware limitations via seamlessly but partially offloading execution via thread migration from the smartphone to a computational infrastructure hosting a cloud of smartphone clones. The execution between a mobile device and the cloud is elastic and is optimized to improve performance or energy consumption by combining static analysis, dynamic profiling, and cost optimization. Specifically, CloneCloud improves the performance of applications from resource-starved devices such as smartphones, by opportunistically offloading them to available cloud resources in nearby datacenters or cloudlets. The system can support very expensive operations (e.g., image search, virus scanning, and data leak detection) (a) without requiring application designers to explicitly plan for cloning, (b) without draining the smartphone’s battery power, and (c) with significant performance improvement. Our evaluation shows that CloneCloud can adapt application partitioning to different environments, and it helps some applications achieve as much as a 20x execution speed-up and a 20-fold decrease of energy spent on the mobile device.

[CIM+11] B.-G. Chun, S. Ihm, P. Maniatis, M. Naik, and A. Patti. CloneCloud: Elastic execution between mobile device and cloud. In ACM EuroSys, 2011.
[CM09] B.-G. Chun and P. Maniatis. Augmented smart phone applications through clone cloud execution. In USENIX HotOS, 2009.

Predicting Application Performance Using Program Analysis
Predicting how applications will behave for a given input workload is key to helping users and operators better manage those applications. I proposed a new prediction framework that can automatically predict program performance with high accuracy by combining program analysis techniques with machine learning algorithms. The system extracts program execution features (information about program execution runs) through program instrumentation. It uses machine learning techniques to select features relevant to performance and creates prediction models as a function of the selected features. It shows a significant improvement over models that are oblivious to program features.

[KLY+13] Yongin Kwon, Sangmin Lee, Hayoon Yi, Donghyun Kwon, Seungjun Yang, Byung-Gon Chun, Ling Huang, Petros Maniatis, Mayur Naik, Yunheung Paek. Mantis: Automatic Performance Prediction for Smartphone Applications. In USENIX ATC, 2013.
[HJY+10] L. Huang, J. Jia, B. Yu, B.-G. Chun, P. Maniatis, and M. Naik. Predicting execution time of computer programs using sparse polynomial regression. In NIPS, 2010.

Big Data Systems

Modern clouds run complex applications with varying characteristics. It is important to understand what the workload looks like and how it impacts application performance and operational efficiency. I have been investigating big data system designs driven by the broad and deep analysis of workloads.

Workload-driven Big Data Analytics System Design
I have performed a detailed measurement study of production Hadoop clusters at Yahoo!. Hadoop is an open-source MapReduce implementation. We go beyond prior efforts by presenting a view that is both broad (i.e., considering all jobs in the cluster over an extended period of time) and deep (i.e., examining behaviors at multiple levels – users, jobs, tasks down to file system operations and physical resource utilization). We show that these workloads are extremely heterogeneous and predictable. While this heterogeneity has been observed in point cases in previous work, we show that heterogeneity of workloads is pervasive, occurring at different granularities and metrics (e.g., execution time, storage, input or output size, etc.). We also group jobs by the program that generates them, and show how recurrent and predictable they are. Based on the insights, we re-architect big data systems from the ground up.

Big Data Operating Systems

Current Big Data processing frameworks target specific data processing workloads. I have worked on a project that builds a common “operating system” layer for Big Data processing workloads. The layer supports common abstractions for different kinds of Big Data processing applications such as MapReduce, MPI, ML, and graph processing applications, and provides hooks for intelligently choosing what implementation to use.

[CCC+13] Byung-Gon Chun, Tyson Condie, Carlo Curino, Chris Douglas, Shravan Narayanamurthy, Raghu Ramakrishnan, Sriram Rao, Russell Sears, Markus Weimer. REEF: Retainable Evaluator Execution Framework (Demo Paper). In VLDB, 2013.

Secure Systems

Mobile devices such as smartphones, tablets, and glasses and cloud computing pose interesting security challenges: information leakage from personal devices and attacks on cloud services. I have been investigating securing systems by efficiently tracking information flow.

Detecting Private Information Leakage from Smartphones

TaintDroid is a smartphone system that provides users with adequate control over and visibility into how third-party applications use their private data in real time. The system is an efficient, system-wide dynamic taint tracking and analysis system, which can simultaneously track multiple sources of sensitive data. TaintDroid leverages Android’s virtualized execution environment. It provides real-time analysis by tracking taint information at the bytecode interpreter level and interposing operating system calls (e.g., IPC, networking, and file system I/O). TaintDroid incurs just 14% performance overhead on a CPU-bound micro-benchmark and imposes negligible overhead on interactive third-party applications. Using TaintDroid to monitor the behavior of 30 popular third-party Android applications, we found 68 instances of potential misuse of users’ private information (e.g., location information, device identifiers, and phone numbers) across 20 applications. TaintDroid informs users how their sensitive data is used by third-party applications and provides security service providers with valuable insights. TaintDroid source code is publicly available and it has been actively used by the system and security community.

[WGC+10] W. Enck, P. Gilbert, B.-G. Chun, L. P. Cox, J. Jung, P. McDaniel, and A. N. Sheth. Taintdroid: An information-flow tracking system for realtime privacy monitoring on smartphones. In USENIX OSDI, 2010.

I am also interested in improving mobile security with offline analysis and online protection approaches. Smartphones and “app” markets are raising concerns as to how third-party applications may misuse or improperly handle users’ privacy-sensitive data. Fortunately, unlike in the PC world, we have a unique opportunity to improve the security of mobile applications thanks to the centralized nature of app distribution through popular app markets. Thorough validation of apps, applied as part of the app market admission process, has the potential to significantly enhance mobile device security. I believe that automated offline validation of smartphone apps at the app-market level is a promising approach for drastically improving the security of smartphones. AppInspector, a project that I am working on, is such a system that analyzes apps and generates reports of potential security and privacy violations.

AppInspector first installs and loads the app on a virtual smartphone. An input generator running on the host PC then injects user interface events and sensor input. The smartphone application runtime is augmented with an execution explorer that aids in traversing possible execution paths of the app. While the app runs, an information-flow and action tracking component monitors privacy-sensitive information flows and generates logs. Finally, AppInspector provides security analysis tools which can be used after execution completes to interpret the logs and generate a report. I plan to continue this project, build the system, and deploy it in app markets or provide it as a third-party service running in cloud computing infrastructure.

[GCC+11] P. Gilbert, B.-G. Chun, L. P. Cox, and J. Jung. Automated Security Validation of Mobile Apps at App Markets. 2nd International Workshop on Mobile Cloud Computing and Services (MCS 2011), June 2011.

Securing Systems with Small Trusted Primitives

Traditionally, secure systems that handle Byzantine faults are built under the assumption that no trusted part exists in a node. For example, widely-known BFT replicated state machines (e.g., PBFT) can provide both safety and liveness with up to one-third faulty replicas, but if this fault assumption is violated there is no guarantee of safety and liveness. This bound has proven optimal when nodes are untrusted; thus, any improvement requires relaxing guarantees or restricting faults. I proposed a new way of building the systems with a new primitive, Attested Append-Only Memory (A2M), that attests only one (sequence number, value) binding for a particular sequence number. With the primitive I restructured well-known protocols, PBFT, SUNDR, and Q/U, and showed new protocols enabled by A2M tolerate more faults than existing ones. I further applied A2M to build a long-term service that preserves key-value bindings correctly despite short-term correlated faults.

[MC11] P. Maniatis and B.-G. Chun. Small trusted primitives for dependable systems. Operating Systems Review, 2011.
[CMS+09] B.-G. Chun, P. Maniatis, S. Shenker, and J. Kubiatowicz. Tiered fault tolerance for long-term integrity. In USENIX FAST, 2009.
[CMS+07] B.-G. Chun, P. Maniatis, S. Shenker, and J. Kubiatowicz. Attested append-only memory: Making adversaries stick to their word. In ACM SOSP, 2007.

Protecting Private Data from Web Services

DataPreserver is a novel architecture that protects customers’ sensitive information from web services in a protected access interface that discloses data in accordance only with the interface properties and the customer’s desires. Depending on the needs of the particular service and the level of dependability the customer requires, secure data preservers offer a variety of implementation choices (e.g., physical isolation, software isolation through virtualization or trusted hardware) for the same logical isolation between customer data and service code, which leads to greater dependability, greater privacy, and a more flexible choice between cost and performance.

[KMC11] J.Kannan, P.Maniatis, andB.-G.Chun. Secure data preservers for web services. In USENIX WebApps, 2011.

Networked Systems

I have worked in a number of projects that address challenges in networked systems.

Exploiting Parallelism to Build General-Purpose Scalable Software Routers

Recent advances in server technology such as a high-speed backbone between processing cores and memory enable high-speed parallel processing. We revisited the problem of scaling software routers and proposed a new software router architecture, RouteBricks, that parallelizes router functionality both across multiple servers and across multiple cores within a single server to exploit parallelism at every opportunity. RouteBricks carefully assigns packets to cores to avoid contention and copying. It routes packets across servers with a topology to maximize throughput while minimizing packet reorderings. A 35Gbps parallel router prototype has been demonstrated with four commodity Intel Nehalem servers. The software router can be linearly scaled through the use of additional servers. The prototype router is fully programmable using the familiar Linux environment and is built entirely from off-the-shelf, general-purpose server hardware. This work received a best paper award at SOSP, a prestigious operating systems conference, and the source code is publicly available and has been actively used in the systems and networking community.

[DEA+09] M. Dobrescu, N. Egi, K. Argyraki, B.-G. Chun, K. Fall, G. Iannaccone, A. Knies, M. Manesh, and S. Ratnasamy. Routebricks: Exploiting parallelism to scale software routers. In ACM SOSP, 2009.

Scalable OS Network I/O
We proposed MegaPipe, a new API for efficient, scalable network I/O for message-oriented workloads. The design of MegaPipe centers around the abstraction of a channel, a per-core, bidirectional pipe between the kernel and user space, used to exchange both I/O requests and event notifications. On top of the channel abstraction, we introduce three key concepts of MegaPipe: partitioning, lightweight socket (lwsocket), and batching. By embracing a clean-slate design approach, MegaPipe is able to exploit new opportunities for improved performance and ease of programmability. In macrobenchmarks on an 8-core server, MegaPipe improves the performance of a modified version of memcached between 15% and 320%. For a workload based on real-world HTTP traces, MegaPipe boosts the throughput of nginx by 75%.

[HMC+12] S. Han, S. Marshall, B.-G. Chun, and S. Ratnasamy. Megapipe: A new programming interface for scalable network i/o. In USENIX OSDI, 2012.

Data-Centric Networking
The Internet usage has shifted from host-centric to data-centric applications. The Data-Oriented Network Architecture (DONA) is a “clean-slate” design for naming and name resolution of the Internet for data-centric applications. The heart of DONA is a self-certifying name-based anycast primitive, sitting on top of the IP layer, which provides name persistence, authenticity, and reliability. With DONA, applications receive anycast support directly from the Internet, thus eliminating the necessity of specialized content distribution solutions like Akamai. DONA is pioneering work that introduced pub-sub style communication to the core Internet infrastructure for content-centric networking.

[KCC+07] T. Koponen, M. Chawla, B.-G. Chun, A. Ermolinskiy, K. H. Kim, S. Shenker, and I. Stoica. A data- oriented (and beyond) network architecture. In ACM SIGCOMM, 2007.

Capturing Networked System Complexity
Building networked systems is difficult. Hence, the systems and networking community sets great store in simple system designs. However, the simplicity of system designs often relies on intuition and qualitative discussion rather than rigorous quantitative metrics. To rectify this problem, we proposed a metric that quantifies complexity of algorithmic components of networked system designs. The metric is derived from the ensemble of distributed state dependencies and is based on how much work is needed to keep a state consistent from its inputs. We applied the metric to Inter- net routing, classical distributed systems, and resource discovery. We discovered interesting insights into designing “simple” networked systems.

[CRK08] B.-G. Chun, S. Ratnasamy, and E. Kohler. NetComplex: A complexity metric for networked system designs. In USENIX NSDI, 2008.

Written by bgchun

March 2, 2013 at 6:30 pm

%d bloggers like this: