Revolutionising Content Consumption: The Necessity of an AI based RSS-Feed Summarizer

In today’s fast-paced digital world, information overload is an ever-present challenge. Readers often face the task of sifting through countless articles to find valuable insights.
At Ynpact, we are committed to staying informed about the latest developments in Amazon Web Services. However, with the overwhelming volume of weekly updates, we needed a way to filter and summarise this information efficiently. This led us to the idea of using Claude Generative AI model through Amazon Bedrock, helping us focus on the most relevant insights without having to sift through countless articles, all thanks to our RSS-feed summarizer.

By using AI to summarise the content, the tool allows users to:

Save time: Quickly scan essential points instead of reading full articles.
Stay updated: Access information in a format that is both comprehensive and concise.
Filter noise: Avoid irrelevant content and focus on key insights.

Architecture

The summarizer tool leverages several modern technologies, including cloud computing and machine learning, to offer a scalable solution for summarising and processing vast amounts of content.

This tool relies on Amazon Web Services (AWS) for its infrastructure, ensuring reliability, scalability, and seamless processing by using:

AWS Lambda: For serverless execution of the summarization and post-processing jobs.
Amazon S3: To store the summarised content, making it accessible and easily manageable.
Amazon Bedrock (AI/ML): The summarization models are invoked using Bedrock, ensuring that each feed gets high-quality AI-generated summaries.

Scheduled Event: Every Monday at 8 AM, an EventBridge cron rule triggers the RSS Summarizer Lambda function to capture the latest updates from the past week.
Extract RSS Feed: The Lambda function parses the RSS feed and extracts articles published within the week.
Send Raw Data to Bedrock: The raw text from each article is sent to Amazon Bedrock for AI processing.
Receive Summarised Content: Bedrock returns concise summaries, focused on specific AWS services (e.g., EC2, Lambda, S3).
Store Summaries in S3: Summarised articles are compiled into a JSON file and uploaded to an S3 bucket.
S3 Put Object Event: Storing the JSON file triggers an S3 event.
Notification Lambda Trigger: The S3 event triggers a Lambda function that retrieves the JSON file.
Send to Slack: Summaries are sent to a Slack channel as a consolidated message for the team.

AI plays a central role in transforming raw RSS data into concise summaries. By leveraging GenIA machine learning models through AWS Bedrock, the tool ensures that summaries maintain the core essence of the articles while cutting out unnecessary details.

Technical Breakdown

Fetching RSS feed

The tool retrieves the RSS feed URL, which is specified in the environment variable and uses feedparser to parse the XML data from the RSS feed, extracting the relevant articles published in the past 7 days.

Processing Each Article

For each article, the following steps occur:

Fetching the Full Article Content: The tool fetches the full content of the article using the article’s URL. This content is parsed using BeautifulSoup, which removes unnecessary elements like scripts and styles, leaving only the relevant text.
Generating Summaries: The full text of each article is then passed to the AI model (Claude-v2) using a prompt that requests a concise summary.

Building a Structured Output: The summarised content is wrapped in structured JSON blocks, which are later uploaded to an S3 bucket as a single JSON file.

Prompt

The prompt used for AI model invocation has a well-structured format, designed to provide concise summaries while filtering irrelevant content. Here’s a deeper look at the structure and purpose:

summary_prompt_template = '''Write a concise summary (max 200 characters)
   including all the key facts of this article.
   Do not repeat the same concept.
   Ignore header and footer information.
   Just write the summary between the <summary></summary> XML tags with no text before and after.
   If the article's title doesn't relate to specific topics outlined in {special_instructions} ignore the article and return empty tags <summary></summary>
   <doc>
   {article}
   </doc>
'''

This prompt is designed to:

Summarise the article within 200 characters: Enforces brevity and focus on key facts.
Avoid redundancy: Ensures the summarization doesn’t repeat concepts, making the content more concise and efficient.
Filter irrelevant content: By using the {special_instructions}, the prompt ensures that only articles related to topics like EC2, Lambda, S3, etc., are summarised. Articles outside these domains return an empty <summary></summary> block, saving processing time and storage space.

Special Instructions:

   SPECIAL_INSTRUCTIONS = 'EC2, ECS, DYNAMODB, GENERATIVE AI, CONNECT, EKS, LAMBDA, STEPFUNCTIONS, ECR, CODE BUILD, CODE DEPLOY, S3, FARGATE'

These special instructions are embedded within the prompt and act as filtering criteria for the content. Articles that are not related to these topics are ignored by the model, returning an empty summary. This is critical in an environment with numerous articles, ensuring that only relevant content is processed, summarised, and stored.

Model Invocation

response = bedrock.invoke_model(body=body, modelId=TEXT_MODEL_ID, accept=ACCEPT, contentType=CONTENT_TYPE)

The prompt is sent to AWS Bedrock, which hosts and executes Claude-v2. The model processes the prompt based on the input text and generates a summary according to the constraints laid out (character limit, filtering based on special instructions, etc.).

Benefits to Users and Organisations

The RSS-feed summarizer offers a solution not just for individual users but also for businesses, media organisations, and researchers who need to stay updated on multiple topics without being overwhelmed by content volume.

For organisations, the tool can:

Improve productivity: By streamlining content consumption.
Facilitate decision-making: With quick access to key insights.
Enhance knowledge management: With AI-driven summaries that extract and store valuable information.

Future Opportunities

Looking ahead, the tool can evolve to integrate more advanced features such as:

Personalised summarization: Tailoring summaries to the user’s reading preferences.
Multi-language support: Allowing global users to access summaries in their native languages.

Deeper insights: Using natural language processing (NLP) to highlight not just the content but also trends or sentiments in the feeds.

Conclusion

The RSS-feed summarizer is an essential tool for modern content consumption. By leveraging AWS and AI technologies, it reduces information overload, making content consumption more efficient and manageable. This innovation represents a significant step toward the future of content delivery, catering to the growing need for speed and accuracy in information dissemination.

Ready to Give it a Try?If you’re interested in integrating the RSS-feed summarizer tool into your workflow or want to explore its capabilities, feel free to fill out the contact form on our site, and we’ll be happy to provide you with more information.