Open Access. Powered by Scholars. Published by Universities.®

Computer Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Data Storage Systems

PDF

2013

Institution
Keyword
Publication
Publication Type

Articles 1 - 30 of 43

Full-Text Articles in Computer Engineering

Data Management In Cloud Environments: Nosql And Newsql Data Stores, Katarina Grolinger, Wilson A. Higashino, Abhinav Tiwari, Miriam Am Capretz Dec 2013

Data Management In Cloud Environments: Nosql And Newsql Data Stores, Katarina Grolinger, Wilson A. Higashino, Abhinav Tiwari, Miriam Am Capretz

Electrical and Computer Engineering Publications

: Advances in Web technology and the proliferation of mobile devices and sensors connected to the Internet have resulted in immense processing and storage requirements. Cloud computing has emerged as a paradigm that promises to meet these requirements. This work focuses on the storage aspect of cloud computing, specifically on data management in cloud environments. Traditional relational databases were designed in a different hardware and software era and are facing challenges in meeting the performance and scale requirements of Big Data. NoSQL and NewSQL data stores present themselves as alternatives that can handle huge volume of data. Because of the …


Disaster Data Management In Cloud Environments, Katarina Grolinger Dec 2013

Disaster Data Management In Cloud Environments, Katarina Grolinger

Electronic Thesis and Dissertation Repository

Facilitating decision-making in a vital discipline such as disaster management requires information gathering, sharing, and integration on a global scale and across governments, industries, communities, and academia. A large quantity of immensely heterogeneous disaster-related data is available; however, current data management solutions offer few or no integration capabilities and limited potential for collaboration. Moreover, recent advances in cloud computing, Big Data, and NoSQL have opened the door for new solutions in disaster data management.

In this thesis, a Knowledge as a Service (KaaS) framework is proposed for disaster cloud data management (Disaster-CDM) with the objectives of 1) facilitating information gathering …


Big Data, Predictive Analytics, And Data Visualization In The Construction Engineering, Joseph Shrestha Dec 2013

Big Data, Predictive Analytics, And Data Visualization In The Construction Engineering, Joseph Shrestha

Joseph Shrestha

The term “big data” is associated with one or more of the four characteristics: volume, variety, and velocity. The technologies associated with the big data has already been proven in other sectors. Internet giants like Google, Facebook, Netflix, etc. uses big data collected from their users to present advertisements, friend recommendations, and TV shows and movies relevant to the specific user. Big data is also used for insurance fraud detection, improving bus system by reducing congestion, predictive flight arrival time, weather forecast, and genomic analysis.
For the construction industry, volume and variety becomes particularly relevant. From project planning to the …


Tascked: The Sanity Promoting Task Manager, Jake Tobin Dec 2013

Tascked: The Sanity Promoting Task Manager, Jake Tobin

Computer Science and Software Engineering

Personal task managers or various forms of to-do lists are abundant in our modern computing age. With the explosion of mobile computing technology, it is easier than ever to take notes digitally and make the data seemingly instantly available anywhere on the Internet. There is a fairly well defined core set of features in personal task managers available for public consumption, but it seems nothing that is publicly available provides feedback to the user or suggestions based on user history. Tascked is a task management solution, which records user history and solicits user feedback on progress. This allows the system …


An Open Source, Line Rate Datagram Protocol Facilitating Message Resiliency Over An Imperfect Channel, Christina Marie Smith Dec 2013

An Open Source, Line Rate Datagram Protocol Facilitating Message Resiliency Over An Imperfect Channel, Christina Marie Smith

Graduate Theses and Dissertations

Remote Direct Memory Access (RDMA) is the transfer of data into buffers between two compute nodes that does not require the involvement of a CPU or Operating System (OS). The idea is borrowed from Direct Memory Access (DMA) which allows memory within a compute node to be transferred without transiting through the CPU. RDMA is termed a zero-copy protocol as it eliminates the need to copy data between buffers within the protocol stack. Because of this and other features, RDMA promotes reliable, high throughput and low latency transfer for packet-switched networking. While the benefits of RMDA are well known and …


A Simple Integration Of Social Relationship And Text Data For Identifying Potential Customers In Microblogging, Guansong Pang, Shengyi Jiang, Dongyi Chen Dec 2013

A Simple Integration Of Social Relationship And Text Data For Identifying Potential Customers In Microblogging, Guansong Pang, Shengyi Jiang, Dongyi Chen

Research Collection School Of Computing and Information Systems

Identifying potential customers among a huge number of users in microblogging is a fundamental problem for microblog marketing. One challenge in potential customer detection in microblogging is how to generate an accurate characteristic description for users, i.e., user profile generation. Intuitively, the preference of a user’s friends (i.e., the person followed by the user in microblogging) is of great importance to capture the characteristic of the user. Also, a user’s self-defined tags are often concise and accurate carriers for the user’s interests. In this paper, for identifying potential customers in microblogging, we propose a method to generate user profiles via …


Adaptive Computer‐Generated Forces For Simulator‐Based Training, Expert Systems With Applications, Teck-Hou Teng, Ah-Hwee Tan, Loo-Nin Teow Dec 2013

Adaptive Computer‐Generated Forces For Simulator‐Based Training, Expert Systems With Applications, Teck-Hou Teng, Ah-Hwee Tan, Loo-Nin Teow

Research Collection School Of Computing and Information Systems

Simulator-based training is in constant pursuit of increasing level of realism. The transition from doctrine-driven computer-generated forces (CGF) to adaptive CGF represents one such effort. The use of doctrine-driven CGF is fraught with challenges such as modeling of complex expert knowledge and adapting to the trainees’ progress in real time. Therefore, this paper reports on how the use of adaptive CGF can overcome these challenges. Using a self-organizing neural network to implement the adaptive CGF, air combat maneuvering strategies are learned incrementally and generalized in real time. The state space and action space are extracted from the same hierarchical doctrine …


A Privacy-Aware Distributed Storage And Replication Middleware For Heterogeneous Computing Platform, Jilong Liao Dec 2013

A Privacy-Aware Distributed Storage And Replication Middleware For Heterogeneous Computing Platform, Jilong Liao

Masters Theses

Cloud computing is an emerging research area that has drawn considerable interest in recent years. However, the current infrastructure raises significant concerns about how to protect users' privacy, in part due to that users are storing their data in the cloud vendors' servers. In this paper, we address this challenge by proposing and implementing a novel middleware, called Uno, which separates the storage of physical data and their associated metadata. In our design, users' physical data are stored locally on those devices under a user's full control, while their metadata can be uploaded to the commercial cloud. To ensure the …


D-Tunes: Configuration Engine For Geo-Replicated Cloud Storage, Jiawei Wang, Sanjay Rao Oct 2013

D-Tunes: Configuration Engine For Geo-Replicated Cloud Storage, Jiawei Wang, Sanjay Rao

The Summer Undergraduate Research Fellowship (SURF) Symposium

When developing a web-based application, developers are facing stringent requirements to balance the latency, scalability and availability for their cloud database. Application developers need a specific replication configuration strategy based on the requirement of their application. To deal with this problem, some geo-replicated cloud strategy systems have emerged recently, like Cassandra. This project serves to design a web tool that can help configure the best replication strategies for geo-distributed data stores, which uses quorum-based protocols. Currently, our web tool D-Tunes, require a minimum input from users and generate specific scripts based on the inputs user provided. The program running these …


Is Real-Time Mobile Content-Based Image Retrieval Feasible?, Colin G. Graber, Anup Mohan, Yung-Hsiang Lu Oct 2013

Is Real-Time Mobile Content-Based Image Retrieval Feasible?, Colin G. Graber, Anup Mohan, Yung-Hsiang Lu

The Summer Undergraduate Research Fellowship (SURF) Symposium

Content-based image retrieval (CBIR) is a method of searching through a database of images by using another image as a query instead of text. Recent advances in the processing power of smart phones and tablets, collectively known as mobile devices, have prompted researchers to attempt to construct mobile CBIR systems. Most of the research that has been conducted on mobile CBIR has focused on improving either its accuracy or its run-time, but not both simultaneously. We set out to answer the question: is real-time CBIR with manageable accuracy possible on current mobile devices? To find the answer to this question, …


Error Recovered Hierarchical Classification, Shiai Zhu, Xiao-Yong Wei, Chong-Wah Ngo Oct 2013

Error Recovered Hierarchical Classification, Shiai Zhu, Xiao-Yong Wei, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Hierarchical classification (HC) is a popular and efficient way for detecting the semantic concepts from the images. However, the conventional HC, which always selects the branch with the highest classification response to go on, has the risk of propagating serious errors from higher levels of the hierarchy to the lower levels. We argue that the highestresponse-first strategy is too arbitrary, because the candidate nodes are considered individually which ignores the semantic relationship among them. In this paper, we propose a novel method for HC, which is able to utilize the semantic relationship among candidate nodes and their children to recover …


Image Search By Graph-Based Label Propagation With Image Representation From Dnn, Yingwei Pan, Yao Ting, Kuiyuan Yang, Houqiang Li, Chong-Wah Ngo, Jingdong Wang, Tao Mei Oct 2013

Image Search By Graph-Based Label Propagation With Image Representation From Dnn, Yingwei Pan, Yao Ting, Kuiyuan Yang, Houqiang Li, Chong-Wah Ngo, Jingdong Wang, Tao Mei

Research Collection School Of Computing and Information Systems

Our objective is to estimate the relevance of an image to a query for image search purposes. We address two limitations of the existing image search engines in this paper. First, there is no straightforward way of bridging the gap between semantic textual queries as well as users’ search intents and image visual content. Image search engines therefore primarily rely on static and textual features. Visual features are mainly used to identify potentially useful recurrent patterns or relevant training examples for complementing search by image reranking. Second, image rankers are trained on query-image pairs labeled by human experts, making the …


Annotation For Free: Video Tagging By Mining User Search Behavior, Yao Ting, Tao Mei, Chong-Wah Ngo, Shipeng Li Oct 2013

Annotation For Free: Video Tagging By Mining User Search Behavior, Yao Ting, Tao Mei, Chong-Wah Ngo, Shipeng Li

Research Collection School Of Computing and Information Systems

The problem of tagging is mostly considered from the perspectives of machine learning and data-driven philosophy. A fundamental issue that underlies the success of these approaches is the visual similarity, ranging from the nearest neighbor search to manifold learning, to identify similar instances of an example for tag completion. The need to searching for millions of visual examples in high-dimensional feature space, however, makes the task computationally expensive. Moreover, the results can suffer from robustness problem, when the underlying data, such as online videos, are rich of semantics and the similarity is difficult to be learnt from low-level features. This …


Rapport: Semantic-Sensitive Namespace Management In Large-Scale File Systems, Yu Hua, Hong Jiang, Yifeng Zhu, Dan Feng Sep 2013

Rapport: Semantic-Sensitive Namespace Management In Large-Scale File Systems, Yu Hua, Hong Jiang, Yifeng Zhu, Dan Feng

Yifeng Zhu

Explosive growth in volume and complexity of data exacerbates the key challenge to effectively and efficiently manage data in a way that fundamentally improves the ease and efficacy of their use. Existing large-scale file systems rely on hierarchically structured namespace that leads to severe performance bottlenecks and renders it impossible to support real-time queries on multi-dimensional attributes. This paper proposes a novel semantic-sensitive scheme, called Rapport, to provide dynamic and adaptive namespace management and support complex queries. The basic idea is to build files’ namespace by utilizing their semantic correlation and exploiting dynamic evolution of attributes to support namespace management. …


Web-Scale Near-Duplicate Search: Techniques And Applications, Chong-Wah Ngo, Changsheng Xu, Wessel Kraaij, Abdulmotaleb El Saddik Sep 2013

Web-Scale Near-Duplicate Search: Techniques And Applications, Chong-Wah Ngo, Changsheng Xu, Wessel Kraaij, Abdulmotaleb El Saddik

Research Collection School Of Computing and Information Systems

This paper presents some of the most recent advances in the research on Web-scale near-duplicate search and explores the potential for bringing this research a substantial step further. It contains high-quality contributions addressing various aspects of the Web-scale near-duplicate search problem in a number of relevant domains. The topics range from feature representation, matching, and indexing from different novel aspects to the adaptation of current technologies for mobile media search and photo archaeology mining.


Near-Duplicate Video Retrieval: Current Research And Future Trends, Jiajun Liu, Zi Huang, Hongyun Cai, Heng Tao Shen, Chong-Wah Ngo, Wei Wang Aug 2013

Near-Duplicate Video Retrieval: Current Research And Future Trends, Jiajun Liu, Zi Huang, Hongyun Cai, Heng Tao Shen, Chong-Wah Ngo, Wei Wang

Research Collection School Of Computing and Information Systems

The exponential growth of online videos, along with increasing user involvement in video-related activities, has been observed as a constant phenomenon during the last decade. User's time spent on video capturing, editing, uploading, searching, and viewing has boosted to an unprecedented level. The massive publishing and sharing of videos has given rise to the existence of an already large amount of near-duplicate content. This imposes urgent demands on near-duplicate video retrieval as a key role in novel tasks such as video search, video copyright protection, video recommendation, and many more. Driven by its significance, near-duplicate video retrieval has recently attracted …


Click-Boosting Random Walk For Image Search Reranking, Xiaopeng Yang, Yongdong Zhang, Ting Yao, Zheng-Jun Zha, Chong-Wah Ngo Aug 2013

Click-Boosting Random Walk For Image Search Reranking, Xiaopeng Yang, Yongdong Zhang, Ting Yao, Zheng-Jun Zha, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Image reranking is an effective way for improving the retrieval performance of keyword-based image search engines. A fundamental issue underlying the success of existing image reranking approaches is the ability in identifying potentially useful recurrent patterns or relevant training examples from the initial search results. Ideally, these patterns and examples can be leveraged to upgrade the ranks of visually similar images, which are also likely to be relevant. The challenge, nevertheless, originates from the fact that keyword-based queries are used to be ambiguous, resulting in difficulty in predicting the search intention. Mining useful patterns and examples without understanding query is …


The Taxation Of Cloud Computing And Digital Content, David Shakow Jul 2013

The Taxation Of Cloud Computing And Digital Content, David Shakow

All Faculty Scholarship

“Cloud computing” raises important and difficult questions in state tax law, and for Federal taxes, particularly in the foreign tax area. As cloud computing solutions are adopted by businesses, items we view as tangible are transformed into digital products. In this article, I will describe the problems cloud computing poses for tax systems. I will show how current law is applied to cloud computing and will identify the difficulties current approaches face as they are applied to this developing technology.

My primary interest is how Federal tax law applies to cloud computing, particularly as the new technology affects international transactions. …


The Smart Way To Manage Research Data, Craig Napier, Despina Clancy, Tim Davies, Katie Elcombe Jun 2013

The Smart Way To Manage Research Data, Craig Napier, Despina Clancy, Tim Davies, Katie Elcombe

Craig Napier

The University of Wollongongs' $62 million SMART (Simulation, Modelling, Analysis, Research, Teaching) Infrastructure Facility will become a research and development powerhouse with an unprecedented level of impact within the broader infrastructure sector nationally and overseas [1]. With a vision to be a world class intellectual leader and educator in 'integrated' infrastructure planning and management and the capacity to host 200 PhD students, comprising 30 integrated research laboratories, data demands and volume are increasing exponetially.


More Like This, Geoffrey Lawson Jun 2013

More Like This, Geoffrey Lawson

Computer Science and Software Engineering

No abstract provided.


Tmc Simulator, Stuart Heater Jun 2013

Tmc Simulator, Stuart Heater

Computer Science and Software Engineering

The goal of this project was to design and implement a graphical user interface which simulates TriTech's VisiCad Inform computer- aided dispatch well enough for trainees to learn how to efficiently and accurately use the software in a risk-free environment. The simulator should also allow the training proctor to actively create new incidents during training in order to ensure that the trainees are able to respond properly. The structure of this project allowed me to work with both more- and less-experienced programmers, particularly those who are far more experienced with networking and hardware than myself. It was my first time …


Accurate Hardware Raid Simulator, Darrin Kalung Weng Jun 2013

Accurate Hardware Raid Simulator, Darrin Kalung Weng

Master's Theses

Computer data storage is growing at an astonishing rate. With cloud computing and the growth of the Internet enterprise storage has been predicted to grow at rates as high as 300\% per year. To fulfill this need technologies such as Redundant Array of Independent Disks or RAID are being used in industry today. Not only does RAID increase I/O performance but also provides redundancy measures to protect against hardware failure. Even though RAID has existed for some time now and is well understood, proprietary optimizations such as command scheduling and cache strategies that are employed by current RAID controllers are …


Efficient Extraction Of Json Information In Sas Using The Scanover Function, Murphy Choy, Kyong Jin Shim May 2013

Efficient Extraction Of Json Information In Sas Using The Scanover Function, Murphy Choy, Kyong Jin Shim

Research Collection School Of Computing and Information Systems

JSON, otherwise known as JavaScript Object Notation, is a popular data interchange format which provides a humanreadable format. It is language independent and can be read easily in a variety of computer languages. With the riseof Twitter and other unstructured data, there has been a move to incorporate such data as a way of disseminatinginformation. Twitter currently provides a simple API for users to extract tweets using JSON format. While SAS doesnot currently have a direct way of reading JSON, the SCANOVER function in SAS data step provides user with asimple and effective approach to getting JSON information into SAS …


Tower Of Babel: A Crowdsourcing Game Building Sentiment Lexicons For Resource-Scarce Languages, Yoonsung Hong, Haewoon Kwak, Youngmin Baek, Sue. Moon May 2013

Tower Of Babel: A Crowdsourcing Game Building Sentiment Lexicons For Resource-Scarce Languages, Yoonsung Hong, Haewoon Kwak, Youngmin Baek, Sue. Moon

Research Collection School Of Computing and Information Systems

With the growing amount of textual data produced by online social media today, the demands for sentiment analysis are also rapidly increasing; and, this is true for worldwide. However, non-English languages often lack sentiment lexicons, a core resource in performing sentiment analysis. Our solution, Tower of Babel (ToB), is a language-independent sentiment-lexicon-generating crowdsourcing game. We conducted an experiment with 135 participants to explore the difference between our solution and a conventional manual annotation method. We evaluated ToB in terms of effectiveness, efficiency, and satisfactions. Based on the result of the evaluation, we conclude that sentiment classification via ToB is accurate, …


Vsfs: A Versatile Searchable File System For Hpc Analytics, Lei Xu, Ziling Huang, Hong Jiang, Lei Tian, David Swanson Apr 2013

Vsfs: A Versatile Searchable File System For Hpc Analytics, Lei Xu, Ziling Huang, Hong Jiang, Lei Tian, David Swanson

CSE Technical Reports

Big-data/HPC analytics applications have urgent needs for file-search services to drastically reduce the scale of the input data to accelerate analytics. Unfortunately, the existing solutions either are poorly scalable for large-scale systems, or lack well-integrated interface to allow applications to easily use them. We propose a distributed searchable file system, VSFS, which provide a novel and flexible POSIX-compatible searchable file system namespace that can be seamlessly integrate with any legacy code without modification. Additionally, to provide real-time indexing and searching performance, VSFS uses DRAM-based distributed consistent hashing ring to manages all file-index. The results of our evaluation show that VSFS …


Predicting Sql Injection And Cross Site Scripting Vulnerabilities Through Mining Input Sanitization Patterns, Lwin Khin Shar, Hee Beng Kuan Tan Apr 2013

Predicting Sql Injection And Cross Site Scripting Vulnerabilities Through Mining Input Sanitization Patterns, Lwin Khin Shar, Hee Beng Kuan Tan

Research Collection School Of Computing and Information Systems

ContextSQL injection (SQLI) and cross site scripting (XSS) are the two most common and serious web application vulnerabilities for the past decade. To mitigate these two security threats, many vulnerability detection approaches based on static and dynamic taint analysis techniques have been proposed. Alternatively, there are also vulnerability prediction approaches based on machine learning techniques, which showed that static code attributes such as code complexity measures are cheap and useful predictors. However, current prediction approaches target general vulnerabilities. And most of these approaches locate vulnerable code only at software component or file levels. Some approaches also involve process attributes that …


Searching Visual Instances With Topology Checking And Context Modeling, Wei Zhang, Chong-Wah Ngo Apr 2013

Searching Visual Instances With Topology Checking And Context Modeling, Wei Zhang, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Instance Search (INS) is a realistic problem initiated by TRECVID, which is to retrieve all occurrences of the querying object, location, or person from a large video collection. It is a fundamental problem with many applications, and also a challenging problem different from the traditional concept or near-duplicate (ND) search, since the relevancy is defined at instance level. True responses could exhibit various visual variations, such as being small on the image with different background, or showing a non-homography spatial configuration. Based on the Bag-of-Words model, we propose two techniques tailored for Instance Search. Specifically, we explore the use of …


Circular Reranking For Visual Search, Ting Yao, Chong-Wah Ngo Apr 2013

Circular Reranking For Visual Search, Ting Yao, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Search reranking is regarded as a common way to boost retrieval precision. The problem nevertheless is not trivial especially when there are multiple features or modalities to be considered for search, which often happens in image and video retrieval. This paper proposes a new reranking algorithm, named circular reranking, that reinforces the mutual exchange of information across multiple modalities for improving search performance, following the philosophy that strong performing modality could learn from weaker ones, while weak modality does benefit from interacting with stronger ones. Technically, circular reranking conducts multiple runs of random walks through exchanging the ranking scores among …


Maximize Your Decision-Making: A Review Of Emerging Technology Trends, Carol Watson Mar 2013

Maximize Your Decision-Making: A Review Of Emerging Technology Trends, Carol Watson

Continuing Legal Education Presentations

Provides overview of new technologies (both software and hardware) that could become the "next big thing."


Cloud Computing Trace Characterization And Synthetic Workload Generation, Salvatore Capra Mar 2013

Cloud Computing Trace Characterization And Synthetic Workload Generation, Salvatore Capra

Theses and Dissertations

This thesis researches cloud computing workload characteristics and synthetic workload generation. A heuristic presented in the work guides the process of workload trace characterization and synthetic workload generation. Analysis of a cloud trace provides insight into client request behaviors and statistical parameters. A versatile workload generation tool creates client connections, controls request rates, defines number of jobs, produces tasks within each job, and manages task durations. The test system consists of multiple clients creating workloads and a server receiving request, all contained within a virtual machine environment. Statistical analysis verifies the synthetic workload experimental results are consistent with real workload …