Figure: BLEU score and ROUGE-1 score of the model outputs on the reddit text summarization task

Abstract

Summaries play a vital role in today’s vast online consumption of information. However, not all content have meaningful summaries specially for crucial information channels such as social networking sites. One website in particular is Reddit, which as opposed to other channels provide a secure and safe platform allowing users to share helpful information while remaining anonymous. In this project, we seek to create a deep learning neural network that automatically summarizes Reddit submissions.

Three architectures were explored including an LSTM-based sequence to sequence (Seq2Seq) and a Text-to-Text Transfer Transformer (T5-Small and T5-Large) model. We found that T5-Large model provides the best text summarization, but the Seq2Seq model may also provide better summaries in some cases. In terms of performance metrics, T5-small had the best BLEU score of 12.88% while the T5-large had the highest ROUGE-1 score of 7.03%. The framework and result of the project may be extended toother use cases such as medical encoding, financial research, and video scripting which helps stakeholders by augmenting the text summarization task performed in these industries.