Open Access. Powered by Scholars. Published by Universities.®

Computer Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

PDF

Technological University Dublin

Cosine similarity

Publication Year

Articles 1 - 2 of 2

Full-Text Articles in Computer Engineering

Feature Augmentation For Improved Topic Modeling Of Youtube Lecture Videos Using Latent Dirichlet Allocation, Nakul Srikumar Jan 2021

Feature Augmentation For Improved Topic Modeling Of Youtube Lecture Videos Using Latent Dirichlet Allocation, Nakul Srikumar

Dissertations

Application of Topic Models in text mining of educational data and more specifically, the text data obtained from lecture videos, is an area of research which is largely unexplored yet holds great potential. This work seeks to find empirical evidence for an improvement in Topic Modeling by pre- extracting bigram tokens and adding them as additional features in the Latent Dirichlet Allocation (LDA) algorithm, a widely-recognized topic modeling technique. The dataset considered for analysis is a collection of transcripts of video lectures on Machine Learning scraped from YouTube. Using the cosine similarity distance measure as a metric, the experiment showed …


Content-Based Filtering Recommendation Approach To Label Irish Legal Judgements, Sandesh Gangadhar Jan 2020

Content-Based Filtering Recommendation Approach To Label Irish Legal Judgements, Sandesh Gangadhar

Dissertations

Machine learning approaches are applied across several domains to either simplify or automate tasks which directly result in saved time or cost. Text document labelling is one such task that requires immense human knowledge about the domain and efforts to review, understand and label the documents. The company Stare Decisis summarises legal judgements and labels them as they are made available on Irish public legal source www.courts.ie. This research presents a recommendation-based approach to reduce the time for solicitors at Stare Decisis by reducing many numbers of available labels to pick from to a concentrated few that potentially contains the …