Automatic text summarization is a field that is experiencing much interest and research lately because of its many applications (automatically summarizing research papers, long reports, entire books, web pages, news, etc).
There are two ways of attempting an automatic text summarization: extractive and abstractive. The extractive is the easiest one, consisting of a “collage” of sentences extracted from the original document; the abstractive approach on the contrary focuses on obtaining a shorter version of the original text using different words and sentences, rephrasing the text in the same way as a human would do.
It is a difficult task because it implies a sort of semantic “understanding” of the text, something which it’s really borderline with strong AI objectives and that researchers have not yet fully obtained.
Deep learning for automatic summarization
Deep learning technologies resulted to be very promising for this particular task, because they try to mimic the way human brain works, handling several levels of abstraction and non linearly transforming a certain input into a certain output (in this process the output of one layer becomes the input of the other layer and so on). Obviously more are the layers, more is the deepness. Deep neural networks are widely used in NLP problems because their architecture works well with the complex structure of the language, for example each layer can manage a different task and then hand in the output to the next layer.
The LSTM CNN based approach
Long short term memory neural networks (for short LSTM) are a kind of recurrent neural network particularly fitted to sequence prediction problems because they are able to learn order dependencies. They have an internal state that represents context information: they keep “in mind” inputs for a certain, not a priori fixed amount of time, and they can thus generate outputs based on contextual flexible information, because they depend not only on the current state but also on previous inputs. Classical recurrent neural networks (RNN) are not capable of correctly handling the problem (referred to as the “vanishing gradient problem”) represented by the impact of each given input on the network, which can unpredictably be insignificant or massive. LSTM networks were designed to solve this problem and they are so capable of handling successfully “sequential processing task in which we suspect that a hierarchical decomposition may exist, but do not know in advance what this decomposition is” (Felix A. Gers, et al., Learning to Forget: Continual Prediction with LSTM, 2000), just like a text in which the comprehension of what is being said may depend on what is written before but also on what lies ahead in a continuous going back and forth.
For this reason, LSTM networks perform extremely well on complicated problems like speech recognition, translation, speech synthesis, audio and video analysis, and abstractive summarization where there is a high, complex hierarchy between one part of a text or of a sentence and another part.
Convolutional Neural Networks (CNN) are another kind of neural network that are particularly good when used for problems with spatial inputs, like images, videos or also letters (as they have a spatial as well as a symbolic aspect). They are applied to computer vision, object recognition, image classification, and so on.
Abstractive summarization using the LSTM CNN model
LSTM networks are not capable of handling problems where the input is spatial, but are particularly good where sequences are implied. And here is where the combination of the two comes in: a CNN-LSTM model can be used whenever you have inputs that have spatial characteristics but that come not isolated but as a sequence.
The CNN layer is used for feature extraction and then the LSTM layer works on sequence prediction using as inputs the outputs of the CNN layer. More in detail, the CNN layer is used to extract “semantic phrases” from sentences and then the LSTM network is used to produce text summaries that rephrase and shorten the original text.
If you think of it, a sentence, a paragraph or a document do have a spatial and also a temporal structure, so abstractive text summarization using lstm cnn based deep learning is truly a sensible approach that is gaining more and more adepts and that currently outperforms other approaches to abstractive text summarization achieving the highest values for ROUGE1, ROUGE2, and ROUGE-L evaluations.