Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Engineering

2018

Series

Institution
Keyword
Publication

Articles 1 - 30 of 126

Full-Text Articles in Physical Sciences and Mathematics

Paul Baran, Network Theory, And The Past, Present, And Future Of Internet, Christopher S. Yoo Dec 2018

Paul Baran, Network Theory, And The Past, Present, And Future Of Internet, Christopher S. Yoo

All Faculty Scholarship

Paul Baran’s seminal 1964 article “On Distributed Communications Networks” that first proposed packet switching also advanced an underappreciated vision of network architecture: a lattice-like, distributed network, in which each node of the Internet would be homogeneous and equal in status to all other nodes. Scholars who have subsequently embraced the concept of a lattice-like network approach have largely overlooked the extent to which it is both inconsistent with network theory (associated with the work of Duncan Watts and Albert-László Barabási), which emphasizes the importance of short cuts and hubs in enabling networks to scale, and the actual way, the Internet …


Open Source Foundations For Spatial Decision Support Systems, Jochen Albrecht Dec 2018

Open Source Foundations For Spatial Decision Support Systems, Jochen Albrecht

Publications and Research

Spatial Decision Support Systems (SDSS) were a hot topic in the 1990s, when researchers tried to imbue GIS with additional decision support features. Successful practical developments such as HAZUS or CommunityViz have since been built, based on commercial desktop software and without much heed for theory other than what underlies their process models. Others, like UrbanSim, have been completely overhauled twice but without much external scrutiny. Both the practical and the theoretical foundations of decision support systems have developed considerably over the past 20 years. This article presents an overview of these developments and then looks at what corresponding tools …


Gmaim: An Analytical Pipeline For Microrna Splicing Profiling Using Generative Model, Kan Liu Dec 2018

Gmaim: An Analytical Pipeline For Microrna Splicing Profiling Using Generative Model, Kan Liu

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

MicroRNAs (miRNAs) are a class of short (~22 nt) single strand RNA molecules predominantly found in eukaryotes. Being involved in many major biological processes, miRNAs can regulate gene expression by targeting mRNAs to facilitate their degradation or translational inhibition. The imprecise splicing of miRNA splicing which introduces severe variability in terms of sequences of miRNA products and their corresponding downstream gene expression regulation. For example, to study biogenesis of miRNAs, usually, biologists can deplete a gene in the miRNA biogenesis pathway and study the change of miRNA sequences, which can cause impression of miRNAs. Although high-throughput sequencing technologies such as …


Facepet: Enhancing Bystanders' Facial Privacy With Smart Wearables/Internet Of Things, Alfredo J. Perez, Sherali Zeadally, Luis Y. Matos Garcia, Jaouad A. Mouloud, Scott Griffith Dec 2018

Facepet: Enhancing Bystanders' Facial Privacy With Smart Wearables/Internet Of Things, Alfredo J. Perez, Sherali Zeadally, Luis Y. Matos Garcia, Jaouad A. Mouloud, Scott Griffith

Information Science Faculty Publications

Given the availability of cameras in mobile phones, drones and Internet-connected devices, facial privacy has become an area of major interest in the last few years, especially when photos are captured and can be used to identify bystanders’ faces who may have not given consent for these photos to be taken and be identified. Some solutions to protect facial privacy in photos currently exist. However, many of these solutions do not give a choice to bystanders because they rely on algorithms that de-identify photos or protocols to deactivate devices and systems not controlled by bystanders, thereby being dependent on the …


Scale-Out Algorithm For Apache Storm In Saas Environment, Ravi Kiran Puttaswamy Dec 2018

Scale-Out Algorithm For Apache Storm In Saas Environment, Ravi Kiran Puttaswamy

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

The main appeal of the Cloud is in its cost effective and flexible access to computing power. Apache Storm is a data processing framework used to process streaming data. In our work we explore the possibility of offering Apache Storm as a software service. Further, we take advantage of the cgroups feature in Storm to divide the computing power of worker machine into smaller units to be offered to users. We predict that the compute bounds placed on the cgroups could be used to approximate the state of the workflow. We discuss the limitations of the current schedulers in facilitating …


Reducing The Tail Latency Of A Distributed Nosql Database, Jun Wu Dec 2018

Reducing The Tail Latency Of A Distributed Nosql Database, Jun Wu

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

The request latency is an important performance metric of a distributed database, such as the popular Apache Cassandra, because of its direct impact on the user experience. Specifically, the latency of a read or write request is defined as the total time interval from the instant when a user makes the request to the instant when the user receives the request, and it involves not only the actual read or write time at a specific database node, but also various types of latency introduced by the distributed mechanism of the database. Most of the current work focuses only on reducing …


Deep Unsupervised Pixelization, Chu Han, Qiang Wen, Shengfeng He, Qianshu Zhu, Yinjie Tan, Guoqiang Han, Tien-Tsin Wong Dec 2018

Deep Unsupervised Pixelization, Chu Han, Qiang Wen, Shengfeng He, Qianshu Zhu, Yinjie Tan, Guoqiang Han, Tien-Tsin Wong

Research Collection School Of Computing and Information Systems

In this paper, we present a novel unsupervised learning method for pixelization. Due to the difficulty in creating pixel art, preparing the paired training data for supervised learning is impractical. Instead, we propose an unsupervised learning framework to circumvent such difficulty. We leverage the dual nature of the pixelization and depixelization, and model these two tasks in the same network in a bi-directional manner with the input itself as training supervision. These two tasks are modeled as a cascaded network which consists of three stages for different purposes. GridNet transfers the input image into multi-scale grid-structured images with different aliasing …


Automatically `Verifying’ Discrete-Time Complex Systems Through Learning, Abstraction And Refinement, Jingyi Wang, Jun Sun, Shengchao Qin, Cyrille Jegourel Dec 2018

Automatically `Verifying’ Discrete-Time Complex Systems Through Learning, Abstraction And Refinement, Jingyi Wang, Jun Sun, Shengchao Qin, Cyrille Jegourel

Research Collection School Of Computing and Information Systems

Precisely modeling complex systems like cyber-physical systems is challenging, which often render model-based system verification techniques like model checking infeasible. To overcome this challenge, we propose a method called LAR to automatically ‘verify’ such complex systems through a combination of learning, abstraction and refinement from a set of system log traces. We assume that log traces and sampling frequency are adequate to capture ‘enough’ behaviour of the system. Given a safety property and the concrete system log traces as input, LAR automatically learns and refines system models, and produces two kinds of outputs. One is a counterexample with a bounded …


Co-Location Resistant Virtual Machine Placement In Cloud Data Centers, Amit Agarwal, Nguyen Binh Duong Ta Dec 2018

Co-Location Resistant Virtual Machine Placement In Cloud Data Centers, Amit Agarwal, Nguyen Binh Duong Ta

Research Collection School Of Computing and Information Systems

Due to increasing number of avenues for conducting cross-virtual machine (VM) side-channel attacks, the security of public IaaS cloud data centers is a growing concern. These attacks allow an adversary to steal private information from a target user whose VM instance is co-located with that of the adversary. To reduce the probability of malicious co-location, we propose a novel VM placement algorithm called “Previously Co-Located Users First”. We perform a theoretical and empirical analysis of our proposed algorithm to evaluate its resource efficiency and security. Our results, obtained using real-world cloud traces containing millions of VM requests and thousands of …


Early Prediction Of Merged Code Changes To Prioritize Reviewing Tasks, Yuanrui Fan, Xin Xia, David Lo, Shanping Li Dec 2018

Early Prediction Of Merged Code Changes To Prioritize Reviewing Tasks, Yuanrui Fan, Xin Xia, David Lo, Shanping Li

Research Collection School Of Computing and Information Systems

Modern Code Review (MCR) has been widely used by open source and proprietary software projects. Inspecting code changes consumes reviewers much time and effort since they need to comprehend patches, and many reviewers are often assigned to review many code changes. Note that a code change might be eventually abandoned, which causes waste of time and effort. Thus, a tool that predicts early on whether a code change will be merged can help developers prioritize changes to inspect, accomplish more things given tight schedule, and not waste reviewing effort on low quality changes. In this paper, motivated by the above …


Deploying, Improving And Evaluating Edge Bundling Methods For Visualizing Large Graphs, Jieting Wu Nov 2018

Deploying, Improving And Evaluating Edge Bundling Methods For Visualizing Large Graphs, Jieting Wu

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

A tremendous increase in the scale of graphs has been witnessed in a wide range of fields, which demands efficient and effective visualization techniques to assist users in better understandings of large graphs. Conventional node-link diagrams are often used to visualize graphs, whereas excessive edge crossings can easily incur severe visual clutter in the node-link diagram of a large graph. Edge bundling can effectively remedy visual clutter and reveal high-level graph structures. Although significant efforts have been devoted to developing edge bundling, three challenging problems remain. First, edge bundling techniques are often computationally expensive and are not easy to deploy …


Controller Evolution And Divergence: A Software Perspective, Balaji Balasubramaniam Nov 2018

Controller Evolution And Divergence: A Software Perspective, Balaji Balasubramaniam

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

Successful controllers evolve as they are refined, extended, and adapted to new systems and contexts. This evolution occurs in the controller design and also in its software implementation. Model-based design and controller synthesis can help to synchronize this evolution of design and software, but such synchronization is rarely complete as software tends to also evolve in response to elements rarely present in a control model, leading to mismatches between the control design and the software.

In this thesis, we perform a first-of-its-kind study on the evolution of two popular open-source safety-critical autopilot control software -- ArduPilot, and Paparazzi, to better …


Evoalloy: An Evolutionary Approach For Analyzing Alloy Specifications, Jianghao Wang Nov 2018

Evoalloy: An Evolutionary Approach For Analyzing Alloy Specifications, Jianghao Wang

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

Using mathematical notations and logical reasoning, formal methods precisely define a program’s specifications, from which we can instantiate valid instances of a system. With these techniques, we can perform a variety of analysis tasks to verify system dependability and rigorously prove the correctness of system properties. While there exist well-designed automated verification tools including ones considered lightweight, they still lack a strong adoption in practice. The essence of the problem is that when applied to large real world applications, they are not scalable and applicable due to the expense of thorough verification process. In this thesis, I present a new …


Delta Debugging Microservice Systems, Xiang Zhou, Xin Peng, Tao Xie, Jun Sun, Wenhai Li, Chao Ji, Dan Ding Nov 2018

Delta Debugging Microservice Systems, Xiang Zhou, Xin Peng, Tao Xie, Jun Sun, Wenhai Li, Chao Ji, Dan Ding

Research Collection School Of Computing and Information Systems

Debugging microservice systems involves the deployment and manipulation of microservice systems on a containerized environment and faces unique challenges due to the high complexity and dynamism of microservices. To address these challenges, in this paper, we propose a debugging approach for microservice systems based on the delta debugging algorithm, which is to minimize failureinducing deltas of circumstances (e.g., deployment, environmental configurations) for effective debugging. Our approach includes novel techniques for defining, deploying/manipulating, and executing deltas following the idea of delta debugging. In particular, to construct a (failing) circumstance space for delta debugging to minimize, our approach defines a set of …


Multi-Robot Coordination And Scheduling For Deactivation & Decommissioning, Sebastian A. Zanlongo Nov 2018

Multi-Robot Coordination And Scheduling For Deactivation & Decommissioning, Sebastian A. Zanlongo

FIU Electronic Theses and Dissertations

Large quantities of high-level radioactive waste were generated during WWII. This waste is being stored in facilities such as double-shell tanks in Washington, and the Waste Isolation Pilot Plant in New Mexico. Due to the dangerous nature of radioactive waste, these facilities must undergo periodic inspections to ensure that leaks are detected quickly. In this work, we provide a set of methodologies to aid in the monitoring and inspection of these hazardous facilities. This allows inspection of dangerous regions without a human operator, and for the inspection of locations where a person would not be physically able to enter.

First, …


A Comprehensive Framework To Replicate Process-Level Concurrency Faults, Supat Rattanasuksun Nov 2018

A Comprehensive Framework To Replicate Process-Level Concurrency Faults, Supat Rattanasuksun

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

Concurrency faults are one of the most damaging types of faults that can affect the dependability of today’s computer systems. Currently, concurrency faults such as process-level races, order violations, and atomicity violations represent the largest class of faults that has been reported to various Linux bug repositories. Clearly, existing approaches for testing such faults during software development processes are not adequate as these faults escape in-house testing efforts and are discovered during deployment and must be debugged.

The main reason concurrency faults are hard to test is because the conditions that allow these to occur can be difficult to replicate, …


Supporting Diverse Customers And Prioritized Traffic In Next-Generation Passive Optical Networks, Naureen Hoque Nov 2018

Supporting Diverse Customers And Prioritized Traffic In Next-Generation Passive Optical Networks, Naureen Hoque

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

The already high demand for more bandwidth usage has been growing rapidly. Access network traffic is usually bursty in nature and the present traffic trend is mostly video-dominant. This motivates the need for higher transmission rates in the system. At the same time, the deployment costs and maintenance expenditures have to be reasonable. Therefore, Passive Optical Networks (PON) are considered promising next-generation access technologies. As the existing PON standards are not suitable to support future-PON services and applications, the FSAN (Full Service Access Network) group and the ITU-T (Telecommunication Standardization Sector of the International Telecommunication Union) have worked on developing …


Mrsh-Mem: Approximate Matching On Raw Memory Dumps, Lorenz Liebler, Frank Breitinger Nov 2018

Mrsh-Mem: Approximate Matching On Raw Memory Dumps, Lorenz Liebler, Frank Breitinger

Electrical & Computer Engineering and Computer Science Faculty Publications

This paper presents the fusion of two subdomains of digital forensics: (1) raw memory analysis and (2) approximate matching. Specifically, this paper describes a prototype implementation named MRSH-MEM that allows to compare hard drive images as well as memory dumps and therefore can answer the question if a particular program (installed on a hard drive) is currently running / loaded in memory. To answer this question, we only require both dumps or access to a public repository which provides the binaries to be tested. For our prototype, we modified an existing approximate matching algorithm named MRSH-NET and combined it with …


Is There Space For Violence?: A Data-Driven Approach To The Exploration Of Spatial-Temporal Dimensions Of Conflict, Tin Seong Kam, Vincent Zhi Nov 2018

Is There Space For Violence?: A Data-Driven Approach To The Exploration Of Spatial-Temporal Dimensions Of Conflict, Tin Seong Kam, Vincent Zhi

Research Collection School Of Computing and Information Systems

With recent increases in incidences of political violence globally, the world has now become more uncertain and less predictable. Of particular concern is the case of violence against civilians, who are often caught in the crossfire between armed state or non-state actors. Classical methods of studying political violence and international relations need to be updated. Adopting the use of data analytic tools and techniques of studying big data would enable academics and policy makers to make sense of a rapidly changing world.


An Interpretable Neural Fuzzy Inference System For Predictions Of Underpricing In Initial Public Offerings, Di Wang, Xiaolin Qian, Chai Quek, Ah-Hwee Tan, Chunyan Miao, Xiaofeng Zhang, Geok See Ng, You Zhou Nov 2018

An Interpretable Neural Fuzzy Inference System For Predictions Of Underpricing In Initial Public Offerings, Di Wang, Xiaolin Qian, Chai Quek, Ah-Hwee Tan, Chunyan Miao, Xiaofeng Zhang, Geok See Ng, You Zhou

Research Collection School Of Computing and Information Systems

Due to their aptitude in both accurate data processing and human comprehensible reasoning, neural fuzzy inference systems have been widely adopted in various application domains as decision support systems. Especially in real-world scenarios such as decision making in financial transactions, the human experts may be more interested in knowing the comprehensive reasons of certain advices provided by a decision support system in addition to how confident the system is on such advices. In this paper, we apply an integrated autonomous computational model termed genetic algorithm and rough set incorporated neural fuzzy inference system (GARSINFIS) to predict underpricing in initial public …


Optical Wireless Data Center Networks, Abdelbaset S. Hamza Oct 2018

Optical Wireless Data Center Networks, Abdelbaset S. Hamza

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

Bandwidth and computation-intensive Big Data applications in disciplines like social media, bio- and nano-informatics, Internet-of-Things (IoT), and real-time analytics, are pushing existing access and core (backbone) networks as well as Data Center Networks (DCNs) to their limits. Next generation DCNs must support continuously increasing network traffic while satisfying minimum performance requirements of latency, reliability, flexibility and scalability. Therefore, a larger number of cables (i.e., copper-cables and fiber optics) may be required in conventional wired DCNs. In addition to limiting the possible topologies, large number of cables may result into design and development problems related to wire ducting and maintenance, heat …


Keyword-Based Patent Citation Prediction Via Information Theory, Farshad Madani, Martin Zwick, Tugrul U. Daim Oct 2018

Keyword-Based Patent Citation Prediction Via Information Theory, Farshad Madani, Martin Zwick, Tugrul U. Daim

Engineering and Technology Management Faculty Publications and Presentations

Patent citation shows how a technology impacts other inventions, so the number of patent citations (backward citations) is used in many technology prediction studies. Current prediction methods use patent citations, but since it may take a long time till a patent is cited by other inventors, identifying impactful patents based on their citations is not an effective way. The prediction method offered in this article predicts patent citations based on the content of patents. In this research, Reconstructability Analysis (RA), which is based on information theory and graph theory, is applied to predict patent citations based on keywords extracted from …


A Dexterous, Glove-Based Teleoperable Low-Power Soft Robotic Arm For Delicate Deep-Sea Biological Exploration, Brennan T. Phillips, Kaitlyn P. Becker, Shunichi Kurumaya, Kevin C. Galloway, Griffin Whittredge, Daniel M. Vogt, Clark B. Teeple, Michelle H. Rosen, Vincent A. Pieribone, David F. Gruber, Robert J. Wood Oct 2018

A Dexterous, Glove-Based Teleoperable Low-Power Soft Robotic Arm For Delicate Deep-Sea Biological Exploration, Brennan T. Phillips, Kaitlyn P. Becker, Shunichi Kurumaya, Kevin C. Galloway, Griffin Whittredge, Daniel M. Vogt, Clark B. Teeple, Michelle H. Rosen, Vincent A. Pieribone, David F. Gruber, Robert J. Wood

Publications and Research

Modern marine biologists seeking to study or interact with deep-sea organisms are confronted with few options beyond industrial robotic arms, claws, and suction samplers. This limits biological interactions to a subset of “rugged” and mostly immotile fauna. As the deep sea is one of the most biologically diverse and least studied ecosystems on the planet, there is much room for innovation in facilitating delicate interactions with a multitude of organisms. The biodiversity and physiology of shallow marine systems, such as coral reefs, are common study targets due to the easier nature of access; SCUBA diving allows for in situ delicate …


Indutivo: Contact-Based, Object-Driven Interactions With Inductive Sensing, Jun Gong, Xin Yang, Teddy Seyed, Josh Urban Davis, Xing-Dong Yang Oct 2018

Indutivo: Contact-Based, Object-Driven Interactions With Inductive Sensing, Jun Gong, Xin Yang, Teddy Seyed, Josh Urban Davis, Xing-Dong Yang

Dartmouth Scholarship

We present Indutivo, a contact-based inductive sensing technique for contextual interactions. Our technique recognizes conductive objects (metallic primarily) that are commonly found in households and daily environments, as well as their individual movements when placed against the sensor. These movements include sliding, hinging, and rotation. We describe our sensing principle and how we designed the size, shape, and layout of our sensor coils to optimize sensitivity, sensing range, recognition and tracking accuracy. Through several studies, we also demonstrated the performance of our proposed sensing technique in environments with varying levels of noise and interference conditions. We conclude by presenting demo …


Phasorsec: Protocol Security Filters For Wide Area Measurement Systems, Prashant Anantharaman, Kartik Palani, Rafael Brantley, Sergey Bratus, Sean W. Smith Oct 2018

Phasorsec: Protocol Security Filters For Wide Area Measurement Systems, Prashant Anantharaman, Kartik Palani, Rafael Brantley, Sergey Bratus, Sean W. Smith

Dartmouth Scholarship

The addition of synchrophasors to the power grid to improve observability comes at the cost of an increased attack surface: the wide area measurement system. A common source of zero-days, that can be used to exploit the system, is improper input validation. The strict availability and timing requirements of the grid make it critical that input validation be done right and in a timely fashion. PhasorSec is a hardened security filter for the synchrophasor communication protocol, C37.118. PhasorSec is built using language theoretic principles which treat all input as a language with a specific grammar that defines what input must …


Battery-Free Eye Tracker On Glasses, Tianxing Li, Xia Zhou Oct 2018

Battery-Free Eye Tracker On Glasses, Tianxing Li, Xia Zhou

Dartmouth Scholarship

This paper presents a battery-free wearable eye tracker that tracks both the 2D position and diameter of a pupil based on its light absorption property. With a few near-infrared (NIR) lights and photodiodes around the eye, NIR lights sequentially illuminate the eye from various directions while photodiodes sense spatial patterns of reflected light, which are used to infer pupil’s position and diameter on the fly via a lightweight inference algorithm. The system also exploits characteristics of different eye movement stages and adjusts its sensing and computation accordingly for further energy savings. A prototype is built with off-the-shelf hardware components and …


Wireless Sensor Networks For Smart Communications, Mu Zhou, Qilian Liang, Hongyi Wu, Weixiao Meng, Kunjie Xu Oct 2018

Wireless Sensor Networks For Smart Communications, Mu Zhou, Qilian Liang, Hongyi Wu, Weixiao Meng, Kunjie Xu

Electrical & Computer Engineering Faculty Publications

(First paragraph) In the first edition of the special issue titled “Wireless Sensor Networks for Smart Communications”, a total of 22 manuscripts were received and 6 of these were accepted. This issue demonstrated that network congestion, user mobility, and adjacent spectrum interference are the main reasons for the degradation ofcommunication quality inWireless Sensor Networks (WSNs).


Interpretable Multimodal Retrieval For Fashion Products, Lizi Liao, Xiangnan He, Bo Zhao, Chong-Wah Ngo, Tat-Seng Chua Oct 2018

Interpretable Multimodal Retrieval For Fashion Products, Lizi Liao, Xiangnan He, Bo Zhao, Chong-Wah Ngo, Tat-Seng Chua

Research Collection School Of Computing and Information Systems

Deep learning methods have been successfully applied to fashion retrieval. However, the latent meaning of learned feature vectors hinders the explanation of retrieval results and integration of user feedback. Fortunately, there are many online shopping websites organizing fashion items into hierarchical structures based on product taxonomy and domain knowledge. Such structures help to reveal how human perceive the relatedness among fashion products. Nevertheless, incorporating structural knowledge for deep learning remains a challenging problem. This paper presents techniques for organizing and utilizing the fashion hierarchies in deep learning to facilitate the reasoning of search results and user intent. The novelty of …


Efficient Attribute-Based Encryption With Blackbox Traceability, Shengmin Xu, Guomin Yang, Yi Mu, Ximeng Liu Oct 2018

Efficient Attribute-Based Encryption With Blackbox Traceability, Shengmin Xu, Guomin Yang, Yi Mu, Ximeng Liu

Research Collection School Of Computing and Information Systems

Traitor tracing scheme can be used to identify a decryption key is illegally used in public-key encryption. In CCS’13, Liu et al. proposed an attribute-based traitor tracing (ABTT) scheme with blackbox traceability which can trace decryption keys embedded in a decryption blackbox/device rather than tracing a well-formed decryption key. However, the existing ABTT schemes with blackbox traceability are based on composite order group and the size of the decryption key depends on the policies and the number of system users. In this paper, we revisit blackbox ABTT and introduce a new primitive called attribute-based set encryption (ABSE) based on key-policy …


Augmenting And Structuring User Queries To Support Efficient Free-Form Code Search, Raphael Sirres, Tegawendé F. Bissyande, Dongsun Kim, David Lo, Jacques Klein, Kisub Kim, Yves Le Traon Oct 2018

Augmenting And Structuring User Queries To Support Efficient Free-Form Code Search, Raphael Sirres, Tegawendé F. Bissyande, Dongsun Kim, David Lo, Jacques Klein, Kisub Kim, Yves Le Traon

Research Collection School Of Computing and Information Systems

Source code terms such as method names and variable types are often different from conceptual words mentioned in a search query. This vocabulary mismatch problem can make code search inefficient. In this paper, we present COde voCABUlary (CoCaBu), an approach to resolving the vocabulary mismatch problem when dealing with free-form code search queries. Our approach leverages common developer questions and the associated expert answers to augment user queries with the relevant, but missing, structural code entities in order to improve the performance of matching relevant code examples within large code repositories. To instantiate this approach, we build GitSearch, a code …