DATASET
Dataset consists of pairs of tweets and corresponding articles , we collected data from five domains from multiple tweeter handles :
Dataset consists of approximately 70K tweet and article pairs
Neural sequence-to-sequence models have provided a viable new approach for abstractive text summarization (meaning they are not restricted to simply selecting and rearranging passages from the original text). However, these models have two shortcomings: they are liable to reproduce factual details inaccurately, and they tend to repeat themselves. See et. al. [5] propose a novel architecture of hybrid pointer-generator network that can copy words from the source text via pointing, which aids accurate reproduction of information, while retaining the ability to produce novel words through the generator. The network can be viewed as a balance between extractive and abstractive approaches. We use coverage to keep track of what has been summarized, which discourages repetition. The ability to produce OOV(Out-of-vocabulary) words is one of the primary advantages of pointer-generator models.A processed data will be fed to the model for training. Model is Encoder-Decoder model with attention mechanism. For the encoder LSTM cell with stack_bidirectional_dynamic_rnn is used. For the decoder step LSTM with BasicDecoder for training, and BeamSearchDecoder for inference has been used. We use the code available at [3] and [2]. Approach 3: SVO based approach In this approach, we experiment with linguistic information of a news article. We take inspiration from LexRank algorithm that ranks the sentences in terms of their importance using graph based information. We first resolve the coreferences from the article and extract tuples. We then rank these tuples using their importance score. We are experimenting with different scoring and ranking techniques including TextRank and LexRank. A tweet can be generated by using the high importance tuples and corresponding highly ranked sentence having that tuple
In this approach, we experiment with linguistic information of a news article. We take inspiration from LexRank algorithm that ranks the sentences in terms of their importance using graph based information. We first resolve the coreferences from the article and extract tuples. We then rank these tuples using their importance score. We are experimenting with different scoring and ranking techniques including TextRank and LexRank. A tweet can be generated by using the high importance tuples and corresponding highly ranked sentence having that tuple
In this approach we follow the work of Putra et.al, where it compares the results of neural headline generation with topic sentence and first sentence as input to a encoder-decoder model. We modified the work to generate the tweets as there is a striking resemblance between the two. Idea revolves around to reduce the input from full article text to only important sentences to avoid large number of words which could cause vanishing gradient problem, We have used Spacy, a powerful python module, along with regular expression to extract the topic sentence from each news article. Also for encoder-decoder, we have used vanilla implementation of it in OpenNMT with default parameter configuration . We are also experimenting with different types of inputs for this approach such as using one or multiple summaries in place of first sentence or topic sentence
Dataset consists of pairs of tweets and corresponding articles , we collected data from five domains from multiple tweeter handles :
Dataset consists of approximately 70K tweet and article pairs
Recall-Oriented Understudy for Gisting Evaluation is a set of metrics for evaluating automatic summarization of texts as well as machine translation. It works by comparing an automatically produced summary or translation against a set of reference summaries.
The results are compared with the standard ROGUE and BLEU scores and are shared on the github link of the project.
Easy access to internet, simple and user-friendly platforms and urge to connect with peers are some of the contributing factors in popularity of social media. Social media platforms like Twitter, Facebook attract users from the wide spectrum of sections and age groups. Every second, millions of people across the globe can be seen interacting on social media. These platforms are known for their quick data generation and information dissemination. Because of its reach and popularity, it is imperative for all types of industries to tap into this userbase. With growing internet connectivity, news publishing agencies like CNBC, BBC publish large amount of news articles on the internet to reach wider reader-base. Articles on these news portals are generally detailed and contain rich annotations like images and videos. It should be noted that short text snippets with most of the key information might be sufficient for majority of the reader-base than a long article and a 5-minute video. Posting articles snippets to platforms like Twitter helps in reaching wider audience as well as fast information dissemination. But generating a specific tweet for each article is expected to be a tedious and time consuming task. Automated generation of tweets for these articles can contribute in diverse applications like faster communication, wider reach for an advertisement campaign and concise information dispersal in critical scenarios
Border Patrol arrests on the US-Mexico border continued to drop in September, marking a steady decline since the high earlier this spring when Trump administration officials struggled to stem the flow of migrants attempting to enter the US.There were nearly 40,000 arrests on the southern border in September, which was the lowest month this fiscal year, according to a source familiar with the data. For comparison, there were nearly 133,000 apprehensions in May.Apprehensions along the southern border were higher in fiscal year 2019 than in any fiscal year since 2007, with slightly more than 850,000 arrests, according to Border Patrol data.Border Patrol apprehensions, which are used as a measure of illegal crossings, are particularly important to President Donald Trump, who views them as a barometer of the situation along the southern border, according to administration officials.There was a dramatic spike in border crossings this year, led by an uptick in families and children from the Northern Triangle countries of Guatemala, Honduras and El Salvador coming to the US. The surge caused severe overcrowding in Border Patrol facilities and led to a number of administration measures aimed at reducing the flow of migrants
US-Mexico border arrests continued to drop in September
Border Patrol arrests on the US-Mexico border continued to drop in September, marking a steady decline since the high earlier this spring when Trump administration officials struggled to stem the flow of migrants attempting to enter the US.
Approach | ROUGE score (1-gram) | BLEU score (1-gram) |
---|---|---|
Abstractive | 0.33 | 0.16 |
Extractive | 0.18 | 0.16 |
Extractive+Abstractive | 0.23 | 0.19 |