Open Access. Powered by Scholars. Published by Universities.®
- Discipline
-
- Software Engineering (12)
- OS and Networks (11)
- Databases and Information Systems (10)
- Engineering (7)
- Programming Languages and Compilers (7)
-
- Theory and Algorithms (7)
- Computer Engineering (6)
- Artificial Intelligence and Robotics (5)
- Computer and Systems Architecture (4)
- Numerical Analysis and Scientific Computing (3)
- Other Computer Sciences (3)
- Digital Communications and Networking (2)
- Graphics and Human Computer Interfaces (2)
- Biomedical Engineering and Bioengineering (1)
- Controls and Control Theory (1)
- Data Science (1)
- Education (1)
- Electrical and Computer Engineering (1)
- Environmental Monitoring (1)
- Environmental Sciences (1)
- Information Security (1)
- Library and Information Science (1)
- Logic and Foundations (1)
- Mathematics (1)
- Other Statistics and Probability (1)
- Robotics (1)
- Science and Mathematics Education (1)
- Institution
-
- Selected Works (8)
- Singapore Management University (5)
- Old Dominion University (2)
- Portland State University (2)
- Southern Illinois University Carbondale (2)
-
- University of Massachusetts Amherst (2)
- University of Tennessee, Knoxville (2)
- Air Force Institute of Technology (1)
- California State University, San Bernardino (1)
- City University of New York (CUNY) (1)
- Florida International University (1)
- Illinois State University (1)
- Montclair State University (1)
- San Jose State University (1)
- University of Nebraska - Lincoln (1)
- University of New Mexico (1)
- Western Kentucky University (1)
- Western University (1)
- Keyword
-
- Cloud computing (2)
- Computer science (2)
- HPC (2)
- MVC (2)
- Simulation (2)
-
- 3-Layered Architecture (1)
- Access (1)
- Algorithms (1)
- Android (1)
- Artificial intelligence (1)
- Backscatter communication (1)
- Benchmarking (1)
- Brain damage -- Medical statistics -- Analysis (1)
- Brain damage -- Models (1)
- Chemical Reaction (1)
- Cloud Computing (1)
- Cloud Simulators (1)
- College of Engineering (1)
- Communication networks (1)
- Component-based architecture (1)
- Computer science education (1)
- Computer vision (1)
- Concentration (1)
- Concurrency (1)
- Core utilization (1)
- Cross-Layer Design (1)
- Cs1 (1)
- Cs2 (1)
- Cyber Physical System (1)
- Data mining (1)
- Publication
-
- Research Collection School Of Computing and Information Systems (5)
- Johnny Wong (4)
- Doctoral Dissertations (3)
- Konstantin Läufer (3)
- ASA Multidisciplinary Research Symposium (2)
-
- Computer Science Faculty Publications (2)
- Systems Science Faculty Publications and Presentations (2)
- Annual Symposium on Biomathematics and Ecology Education and Research (1)
- Computer Science ETDs (1)
- Department of Computer Science Faculty Scholarship and Creative Works (1)
- Department of Computer Science and Engineering: Dissertations, Theses, and Student Research (1)
- Electronic Theses, Projects, and Dissertations (1)
- Electronic Thesis and Dissertation Repository (1)
- Master's Projects (1)
- Masters Theses (1)
- Masters Theses & Specialist Projects (1)
- Publications and Research (1)
- Suraj Kothari (1)
- Theses and Dissertations (1)
- Works of the FIU Libraries (1)
- Publication Type
Articles 1 - 30 of 34
Full-Text Articles in Systems Architecture
Automated Program Profiling And Analysis For Managing Heterogeneous Memory Systems, Adam Palmer Howard
Automated Program Profiling And Analysis For Managing Heterogeneous Memory Systems, Adam Palmer Howard
Masters Theses
Many promising memory technologies, such as non-volatile, storage-class memories and high-bandwidth, on-chip RAMs, are beginning to emerge. Since each of these new technologies present tradeoffs distinct from conventional DRAMs, next-generation systems are likely to include multiple tiers of memory storage, each with their own type of devices. To efficiently utilize the available hardware, such systems will need to alter their data management strategies to consider the performance and capabilities provided by each tier.
This work explores a variety of cross-layer strategies for managing application data in heterogeneous memory systems. We propose different program profiling-based techniques to automatically partition program allocation …
Design And Implementation Of A Stand-Alone Tool For Metabolic Simulations, Milad Ghiasi Rad
Design And Implementation Of A Stand-Alone Tool For Metabolic Simulations, Milad Ghiasi Rad
Department of Computer Science and Engineering: Dissertations, Theses, and Student Research
In this thesis, we present the design and implementation of a stand-alone tool for metabolic simulations. This system is able to integrate custom-built SBML models along with external user’s input information and produces the estimation of any reactants participating in the chain of the reactions in the provided model, e.g., ATP, Glucose, Insulin, for the given duration using numerical analysis and simulations. This tool offers the food intake arguments in the calculations to consider the personalized metabolic characteristics in the simulations. The tool has also been generalized to take into consideration of temporal genomic information and be flexible for simulation …
Reengineering A 2-Tier Database Application With Software Architecture, Hong G. Jung
Reengineering A 2-Tier Database Application With Software Architecture, Hong G. Jung
ASA Multidisciplinary Research Symposium
The purpose of this research is to demonstrate how to reengineer a legacy Database Application using to a target system with MVC and 3-layered architecture. A Coffee Inventory Management database application is used for legacy application. The benefits of the reengineering are discussed.
Software Reengineering: Reverse Engineering With Using 4+1 Architectural Views And Forward Engineering With Mvc Architecture, Shane Mueller
Software Reengineering: Reverse Engineering With Using 4+1 Architectural Views And Forward Engineering With Mvc Architecture, Shane Mueller
ASA Multidisciplinary Research Symposium
As software complexity is increasing exponentially in our modern era, software architecture becomes increasingly important. The separation of concerns through architecture allows each programmer the opportunity to limit their need of understanding to only the portion of code for which they are responsible for, thus saving large amounts of time.
Adaft: A Resource-Efficient Framework For Adaptive Fault-Tolerance In Cyber-Physical Systems, Ye Xu
Adaft: A Resource-Efficient Framework For Adaptive Fault-Tolerance In Cyber-Physical Systems, Ye Xu
Doctoral Dissertations
Cyber-physical systems frequently have to use massive redundancy to meet application requirements for high reliability. While such redundancy is required, it can be activated adaptively, based on the current state of the controlled plant. Most of the time the physical plant is in a state that allows for a lower level of fault-tolerance. Avoiding the continuous deployment of massive fault-tolerance will greatly reduce the workload of CPSs. In this dissertation, we demonstrate a software simulation framework (AdaFT) that can automatically generate the sub-spaces within which our adaptive fault-tolerance can be applied. We also show the theoretical benefits of AdaFT, and …
Spring11: Pdc In Cs1/2 And A Mobile/Cloud Intermediate Mobile/Cloud Intermediate Software Design Course, Joseph P. Kaylor, Konstantin Läufer, Chandra N. Sekharan, George K. Thiruvathukal
Spring11: Pdc In Cs1/2 And A Mobile/Cloud Intermediate Mobile/Cloud Intermediate Software Design Course, Joseph P. Kaylor, Konstantin Läufer, Chandra N. Sekharan, George K. Thiruvathukal
Konstantin Läufer
Recent changes in the environment of Loyola University Chicago’s Department of Computer Science include a better differentiation of our four undergraduate majors, growing interest in computing among science majors, and an increased demand for graduates with mobile and cloud skills. In our continued effort to incorporate parallel and distributed computing topics into the undergraduate curriculum, we are focusing on these three existing courses: CS1: In response to a request from the physics department, we started to offer a CS1 section aimed at majors in physics and other hard sciences this spring semester. This section includes some material on numerical methods …
Building Capable, Energy-Efficient, Flexible Visualization And Sensing Clusters From Commodity Tablets, Thomas Delgado Dias, Xian Yan, Konstantin Läufer, George K. Thiruvathukal
Building Capable, Energy-Efficient, Flexible Visualization And Sensing Clusters From Commodity Tablets, Thomas Delgado Dias, Xian Yan, Konstantin Läufer, George K. Thiruvathukal
Konstantin Läufer
We explore the application of clusters of commodity tablet devices to problems spanning a “trilogy” of concerns: visualization, sensing, and computation. We conjecture that such clusters provide a low-cost, energy-efficient, flexible, and ultimately effective platform to tackle a wide range of problems within this trilogy. This is a work in progress, and we now elaborate our position and give a preliminary status report. A wide range of Android tablet devices are available in terms of price and capabilities. “You get what you pay for” w.r.t. display resolution, sensors, and chipset---corresponding to the trilogy. $200 gets one a 1280x800-pixel touch display, …
Network Technologies Used To Aggregate Environmental Data, Paul Stasiuk, Konstantin Läufer, George K. Thiruvathukal
Network Technologies Used To Aggregate Environmental Data, Paul Stasiuk, Konstantin Läufer, George K. Thiruvathukal
Konstantin Läufer
The goal of the Loyola Weather Service (lws) project is to design and build a system of functioning environmental monitoring widgets that can intelligently and autonomously control the environment around them based on set thresholds and triggers. The widgets will also have the ability to aggregate their data and easily display this data in various ways: through a user interface in the room that the widget is placed, via a web application, and programmatically via a RESTful web service.
Open Source Artificial Intelligence In A Biological/Ecological Context, Trevor Grant
Open Source Artificial Intelligence In A Biological/Ecological Context, Trevor Grant
Annual Symposium on Biomathematics and Ecology Education and Research
No abstract provided.
Green Cloud - Load Balancing, Load Consolidation Using Vm Migration, Manh Duc Do
Green Cloud - Load Balancing, Load Consolidation Using Vm Migration, Manh Duc Do
Masters Theses & Specialist Projects
Recently, cloud computing is a new trend emerging in computer technology with a massive demand from the clients. To meet all requirements, a lot of cloud data centers have been constructed since 2008 when Amazon published their cloud service. The rapidly growing data center leads to the consumption of a tremendous amount of energy even cloud computing has better improved in the performance and energy consumption, but cloud data centers still absorb an immense amount of energy. To raise company’s income annually, the cloud providers start considering green cloud concepts which gives an idea about how to optimize CPU’s usage …
Ideas & Graphs, Martin Zwick
Ideas & Graphs, Martin Zwick
Systems Science Faculty Publications and Presentations
A graph can specify the skeletal structure of an idea, onto which meaning can be added by interpreting the structure.
This paper considers graphs (but not hypergraphs) consisting of four nodes, and suggests meanings that can be associated with several different directed and undirected graphs.
Drawing on Bennett's "systematics," specifically on the Tetrad that systematics offers as a model of 'activity,' the analysis here shows that the Tetrad is versatile model of problem-solving, regulation and control, and other processes.
Ancr—An Adaptive Network Coding Routing Scheme For Wsns With Different-Success-Rate Links †, Xiang Ji, Anwen Wang, Chunyu Li, Chun Ma, Yao Peng, Dajin Wang, Qingyi Hua, Feng Chen, Dingyi Fang
Ancr—An Adaptive Network Coding Routing Scheme For Wsns With Different-Success-Rate Links †, Xiang Ji, Anwen Wang, Chunyu Li, Chun Ma, Yao Peng, Dajin Wang, Qingyi Hua, Feng Chen, Dingyi Fang
Department of Computer Science Faculty Scholarship and Creative Works
As the underlying infrastructure of the Internet of Things (IoT), wireless sensor networks (WSNs) have been widely used in many applications. Network coding is a technique in WSNs to combine multiple channels of data in one transmission, wherever possible, to save node’s energy as well as increase the network throughput. So far most works on network coding are based on two assumptions to determine coding opportunities: (1) All the links in the network have the same transmission success rate; (2) Each link is bidirectional, and has the same transmission success rate on both ways. However, these assumptions may not be …
Deshadownet: A Multi-Context Embedding Deep Network For Shadow Removal, Liangqiong Qu, Jiandong Tian, Shengfeng He, Yandong Tang, Rynson W. H. Lau
Deshadownet: A Multi-Context Embedding Deep Network For Shadow Removal, Liangqiong Qu, Jiandong Tian, Shengfeng He, Yandong Tang, Rynson W. H. Lau
Research Collection School Of Computing and Information Systems
Shadow removal is a challenging task as it requires the detection/annotation of shadows as well as semantic understanding of the scene. In this paper, we propose an automatic and end-to-end deep neural network (DeshadowNet) to tackle these problems in a unified manner. DeshadowNet is designed with a multi-context architecture, where the output shadow matte is predicted by embedding information from three different perspectives. The first global network extracts shadow features from a global view. Two levels of features are derived from the global network and transferred to two parallel networks. While one extracts the appearance of the input image, the …
Design Of Batrun Distributed Processing System, Fredy Tandiary, Suraj C. Kothari, Ashish Dixit
Design Of Batrun Distributed Processing System, Fredy Tandiary, Suraj C. Kothari, Ashish Dixit
Suraj Kothari
This paper discusses the design of BATRUN Distributed Processing System (DPS). We have developed this system to automate the execution of jobs in a cluster of workstations where machines belong to different owners. The objective is to use a general purpose cluster as one massive computer for processing large applications. In contrast to a dedicated cluster, the scheduling in BATRUN DPS must ensure that only the idle cycles are used for distributed computing and local users, when they are operating, have the full control of their machines. BATRUN DPS has several unique features: (1) group-based scheduling policy to ensure execution …
Quo Vadis-A Framework For Intelligent Routing In Large Communication Networks., Armin Mikler, Johnny S. Wong, Vasant Honavar
Quo Vadis-A Framework For Intelligent Routing In Large Communication Networks., Armin Mikler, Johnny S. Wong, Vasant Honavar
Johnny Wong
This paper presents Quo Vadis, an evolving framework for intelligent traffic management in very large communication networks. Quo Vadis is designed to exploit topological properties of large networks as well as their spatio-temporal dynamics to optimize multiple performance criteria through cooperation among nodes in the network. It employs a distributed representation of network state information using local load measurements supplemented by a less precise global summary. Routing decisions in Quo Vadis are based on parameterized heuristics designed to optimize various performance metrics in an anticipatory or pro-active as well as compensatory or reactive mode and to minimize the overhead associated …
An Object Oriented Approach To Modeling And Simulation Of Routing In Large Communication Networks, Armin Mikler, Johnny S. Wong, Vasant Honavar
An Object Oriented Approach To Modeling And Simulation Of Routing In Large Communication Networks, Armin Mikler, Johnny S. Wong, Vasant Honavar
Johnny Wong
The complexity (number of entities, interactions between entities, and resulting emergent dynamic behavior) of large communication environments which contain hundreds of nodes and links make simulation an important tool for the study of such systems. Given the difficulties associated with complete analytical treatment of complex dynamical systems, it is often the only practical tool that is available. This paper presents an example of a flexible, modular, object-oriented toolbox designed to support modeling and experimental analysis of a large family of heuristic knowledge representation and decision functions for adaptive self-managing communication networks with particular emphasis on routing strategies. It discusses in …
Quo Vadis - Adaptive Heuristics For Routing In Large Communication Networks, Armin Mikler, Johnny S. Wong, Vasant Honavar
Quo Vadis - Adaptive Heuristics For Routing In Large Communication Networks, Armin Mikler, Johnny S. Wong, Vasant Honavar
Johnny Wong
This paper presents Quo Vadis, an evolving framework for intelligent traffic management in very large communication networks. Quo Vadis is designed to exploit topological properties of large networks as well as their spatio-temporal dynamics to optimize multiple performance criteria through cooperation among nodes in the network. It employs a distributed representation of network state information using local load measurements supplemented by a less precise global summary. Routing decisions in Quo Vadis are based on parameterized heuristics designed to optimize various performance metrics in an anticipatory or pro-active as well as compensatory or reactive mode and to minimize the overhead associated …
Utility-Theoretic Heuristics For Intelligent Adaptive Routing In Large Communcation Networks, Armin Mikler, Vasant Honavar, Johnny S. Wong
Utility-Theoretic Heuristics For Intelligent Adaptive Routing In Large Communcation Networks, Armin Mikler, Vasant Honavar, Johnny S. Wong
Johnny Wong
Utility theory offers an elegant and powerful theoretical framework for design and analysis of autonomous adaptive communication networks. Routing of messages in such networks presents a real-time instance of a multi-criterion quasi-optimization problem in a dynamic and uncertain environment. In this paper, we examine several heuristic decision functions that can be used to guide messages along a near-optimal (e.g., minimum delay) path in a large network. We present an analysis of properties of such heuristics under a set of simplifying assumptions about the network topology and load dynamics. In particular, we identify the conditions under which one such utility-theoretic heuristic …
Custom T-Shirt Designs, Ranjan Khadka
Custom T-Shirt Designs, Ranjan Khadka
Electronic Theses, Projects, and Dissertations
Custom T-shirt Designs is a web-based application. The purpose of this project is to provide a website that would allow customers to be able to customize T-shirt and place an order of custom T-shirt. Customers can sign up, sign in, select T-shirt color, add text, choose Font, choose Font color, upload an image, apply filters to images, transform text or images, choose T-shirt size and save designs for future references. Customers would be able to add the design to a cart, manage cart and checkout with their credit card to purchase for the order and view their previous orders. The …
Exploiting Android System Services Through Bypassing Service Helpers, Yachong Gu, Yao Cheng, Lingyun Ying, Yemian Lu, Qi Li, Purui Su
Exploiting Android System Services Through Bypassing Service Helpers, Yachong Gu, Yao Cheng, Lingyun Ying, Yemian Lu, Qi Li, Purui Su
Research Collection School Of Computing and Information Systems
Android allows applications to communicate with system service via system service helper so that applications can use various functions wrapped in the system services. Meanwhile, system services leverage the service helpers to enforce security mechanisms, e.g. input parameter validation, to protect themselves against attacks. However, service helpers can be easily bypassed, which poses severe security and privacy threats to system services, e.g., privilege escalation, function execution without users’ interactions, system service crash, and DoS attacks. In this paper, we perform the first systematic study on such vulnerabilities and investigate their impacts. We develop a tool to analyze all system services …
Switching Between Page Replacement Algorithms Based On Work Load During Runtime In Linux Kernel, Praveen Subramaniyam
Switching Between Page Replacement Algorithms Based On Work Load During Runtime In Linux Kernel, Praveen Subramaniyam
Master's Projects
Today’s computers are equipped with multiple processor cores to execute multiple programs effectively at a single point of time. This increase in the number of cores needs to be equipped with a huge amount of physical memory to keep multiple applications in memory at a time and to effectively switch between them, without getting affected by the low speed disk memory. The physical memory of today’s world has become so cheap such that all the computer systems are always equipped with sufficient amount of physical memory required effectively to run most of the applications. Along with the memory, the sizes …
Programming Models' Support For Heterogeneous Architecture, Wei Wu
Programming Models' Support For Heterogeneous Architecture, Wei Wu
Doctoral Dissertations
Accelerator-enhanced computing platforms have drawn a lot of attention due to their massive peak computational capacity. Heterogeneous systems equipped with accelerators such as GPUs have become the most prominent components of High Performance Computing (HPC) systems. Even at the node level the significant heterogeneity of CPU and GPU, i.e. hardware and memory space differences, leads to challenges for fully exploiting such complex architectures. Extending outside the node scope, only escalate such challenges.
Conventional programming models such as data- ow and message passing have been widely adopted in HPC communities. When moving towards heterogeneous systems, the lack of GPU integration causes …
Characterizing And Improving Power And Performance In Hpc Networks, Taylor L. Groves
Characterizing And Improving Power And Performance In Hpc Networks, Taylor L. Groves
Computer Science ETDs
Networks are the backbone of modern HPC systems. They serve as a critical piece of infrastructure, tying together applications, analytics, storage and visualization. Despite this importance, we have not fully explored how evolving communication paradigms and network design will impact scientific workloads. As networks expand in the race towards Exascale (1×10^18 floating point operations a second), we need to reexamine this relationship so that the HPC community better understands (1) characteristics and trends in HPC communication; (2) how to best design HPC networks to save power or enhance the performance; (3) how to facilitate scalable, informed, and dynamic decisions within …
Machs: Mitigating The Achilles Heel Of The Cloud Through High Availability And Performance-Aware Solutions, Manar Jammal
Machs: Mitigating The Achilles Heel Of The Cloud Through High Availability And Performance-Aware Solutions, Manar Jammal
Electronic Thesis and Dissertation Repository
Cloud computing is continuously growing as a business model for hosting information and communication technology applications. However, many concerns arise regarding the quality of service (QoS) offered by the cloud. One major challenge is the high availability (HA) of cloud-based applications. The key to achieving availability requirements is to develop an approach that is immune to cloud failures while minimizing the service level agreement (SLA) violations. To this end, this thesis addresses the HA of cloud-based applications from different perspectives. First, the thesis proposes a component’s HA-ware scheduler (CHASE) to manage the deployments of carrier-grade cloud applications while maximizing their …
Improving Discovery And Patron Experience Through Data Mining, Boyuan Guan, Jamie Rogers
Improving Discovery And Patron Experience Through Data Mining, Boyuan Guan, Jamie Rogers
Works of the FIU Libraries
As information professionals, we know simple database searches are imperfect. With rich and expansive digital collections, patrons may not find content that is buried in a long list of results. So, how do we improve discovery of pertinent materials and offer serendipitous experience? Following the example of recommendation functionality in online applications like Netflix, we have developed a recommendation function for our digital library system that provides relevant content beyond the narrow scope of patrons' original search parameters. This session will outline the reasoning, methodology, and design of the recommendation system as well as preliminary results from implementation.
Design And Implementation Of An Rfid-Based Customer Shopping Behavior Mining System, Zimu Zhou, Longfei Shangguan, Xiaolong Zheng, Lei Yang, Yunhao Liu
Design And Implementation Of An Rfid-Based Customer Shopping Behavior Mining System, Zimu Zhou, Longfei Shangguan, Xiaolong Zheng, Lei Yang, Yunhao Liu
Research Collection School Of Computing and Information Systems
Shopping behavior data is of great importance in understanding the effectiveness of marketing and merchandising campaigns. Online clothing stores are capable of capturing customer shopping behavior by analyzing the click streams and customer shopping carts. Retailers with physical clothing stores, however, still lack effective methods to comprehensively identify shopping behaviors. In this paper, we show that backscatter signals of passive RFID tags can be exploited to detect and record how customers browse stores, which garments they pay attention to, and which garments they usually pair up. The intuition is that the phase readings of tags attached to items will demonstrate …
Neural Collaborative Filtering, Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, Tat-Seng Chua
Neural Collaborative Filtering, Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, Tat-Seng Chua
Research Collection School Of Computing and Information Systems
In recent years, deep neural networks have yielded immense success on speech recognition, computer vision and natural language processing. However, the exploration of deep neural networks on recommender systems has received relatively less scrutiny. In this work, we strive to develop techniques based on neural networks to tackle the key problem in recommendation --- collaborative filtering --- on the basis of implicit feedback.Although some recent work has employed deep learning for recommendation, they primarily used it to model auxiliary information, such as textual descriptions of items and acoustic features of musics. When it comes to model the key factor in …
A Sandbox In Which To Learn And Develop Soar Agents, Daniel Lugo
A Sandbox In Which To Learn And Develop Soar Agents, Daniel Lugo
Theses and Dissertations
It is common for military personnel to leverage simulations (and simulators) as cost-effective tools to train and become proficient at various tasks (e.g., flying an aircraft and/or performing a mission, among others). These training simulations often need to represent humans within the simulated world in a realistic manner. Realistic implies creating simulated humans that exhibit behaviors that mimic real-world decision making and actions. Typically, to create the decision-making logic, techniques developed from the domain of artificial intelligence are used. Although there are several approaches to developing intelligent agents; we focus on leveraging and open source project called Soar, to define …
On Leveraging Multi-Path Transport In Mobile Networks, Yeon-Sup Lim
On Leveraging Multi-Path Transport In Mobile Networks, Yeon-Sup Lim
Doctoral Dissertations
Multi-Path TCP (MPTCP) is a new transport protocol that enables mobile devices to simultaneously use several physical paths through multiple network interfaces. MPTCP is particularly useful for mobile devices, which usually have multiple wireless interfaces such as IEEE 802.11 (WiFi), cellular (3G/LTE), and Bluetooth. However, applying MPTCP to mobile devices introduces new concerns since they operate in harsh environments with resource constraints due to intermittent path availability and limited power supply. The goal of this thesis is to resolve these problems so as to be able to practically deploy MPTCP in mobile devices. The first part of the thesis develops …
Detecting Similar Repositories On Github, Yun Zhang, David Lo, Pavneet Singh Kochhar, Xin Xia, Quanlai Li, Jianling Sun
Detecting Similar Repositories On Github, Yun Zhang, David Lo, Pavneet Singh Kochhar, Xin Xia, Quanlai Li, Jianling Sun
Research Collection School Of Computing and Information Systems
GitHub contains millions of repositories among which many are similar with one another (i.e., having similar source codes or implementing similar functionalities). Finding similar repositories on GitHub can be helpful for software engineers as it can help them reuse source code, build prototypes, identify alternative implementations, explore related projects, find projects to contribute to, and discover code theft and plagiarism. Previous studies have proposed techniques to detect similar applications by analyzing API usage patterns and software tags. However, these prior studies either only make use of a limited source of information or use information not available for projects on GitHub. …