美文网首页
MetaGPT智能体开发入门3-订阅智能体(OSS)实践

MetaGPT智能体开发入门3-订阅智能体(OSS)实践

作者: 肖锋钢 | 来源:发表于2024-01-16 22:58 被阅读0次

本节课的任务
通过动手实现一个OSS(Open Source Software)订阅智能体, 来了解MetaGPT如何解决一些日常工作场景中遇到的问题。

主要完成如下任务:

  • 为OSS实现两个Action:
    • Action 1:实现对Github Trending页面的爬取,并获取每一个项目的 名称、URL链接、描述
    • Action 2:独立完成对Huggingface Papers页面的爬取,先获取到每一篇Paper的链接(标题元素中的href标签),并通过链接访问标题的描述页面(例如:https://huggingface.co/papers/2312.03818),在页面中获取一篇Paper的 标题、摘要
  • OSS自动生成总结内容的目录,然后根据二级标题进行分块,每块内容做出对应的总结,形成一篇资讯文档;
  • OSS定时为通知渠道发送以上总结的资讯文档(尝试实现邮箱发送的功能)

使用MetaGPT实现订阅智能体的步骤

image.png

如上图,使用MetaGPT实现订阅智能体基本需要如下步骤:

  1. 实现OSS Agent(基于Role),并实现Agent需要的爬虫Action和分析Action
  2. 实现触发(trigger,即如何触发Agent进行Action,比如爬取和分析)
  3. 实现回调(callback,即完成后干啥事,比如推送到discord、微信,或者发送邮箱)
  4. 最终把上面的OSS Agent、trigger和callback串联起来工作,就是SubscriptionRunner
    当然,你也可以不用SubscriptionRunner,直接基于role.run()来自行编码。但是SubscriptionRunner是一种模式,可以复用。

实现

相关配置

  1. discord需要配置全局代理
    在key.yaml中增加
GLOBAL_PROXY: http://127.0.0.1:8181# 改成自己的代理服务器地址
  1. 配置环境变量
export DISCORD_TOKEN=MTE5NzE4OTU2NzQ3Mjc0NjU1Ng.GqWXK2.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
export DISCORD_CHANNEL_ID=1197190827886xxxxxx

如果是pycharm中


image.png

DISCORD_TOKEN参考官方文档discord readthedocs,"Creating a Bot Account"章节的第7步,从下面页面获取TOKEN,注意TOKEN生成后及时复制。

image.png

DISCORD_CHANNEL_ID即希望Bot发送消息的频道,如下:

image.png

代码

下面代码(oss.py)通过github trending爬取、总结,信息发布到discord, 并通过邮件发送


import asyncio
import os

import fire
import discord
import aiohttp
from bs4 import BeautifulSoup
from typing import Any

from metagpt.actions import Action
from metagpt.config import CONFIG
from metagpt.environment import Environment
from metagpt.logs import logger
from metagpt.roles import Role
from metagpt.roles.role import RoleReactMode
from metagpt.schema import Message
from metagpt.subscription import SubscriptionRunner


class CrawlOSSTrending(Action):
    async def run(self, url: str = "https://github.com/trending"):
        # return "https://github.com/trending"
        async with aiohttp.ClientSession() as client:
            async with client.get(url, proxy=CONFIG.global_proxy) as response:
                response.raise_for_status()
                html = await response.text()

        soup = BeautifulSoup(html, 'html.parser')

        repositories = []

        for article in soup.select('article.Box-row'):
            repo_info = {'name': article.select_one('h2 a').text.strip().replace("\n", "").replace(" ", ""),
                         'url': "https://github.com" + article.select_one('h2 a')['href'].strip()}

            # Description
            description_element = article.select_one('p')
            repo_info['description'] = description_element.text.strip() if description_element else None

            # Language
            language_element = article.select_one('span[itemprop="programmingLanguage"]')
            repo_info['language'] = language_element.text.strip() if language_element else None

            # Stars and Forks
            stars_element = article.select('a.Link--muted')[0]
            forks_element = article.select('a.Link--muted')[1]
            repo_info['stars'] = stars_element.text.strip()
            repo_info['forks'] = forks_element.text.strip()

            # Today's Stars
            today_stars_element = article.select_one('span.d-inline-block.float-sm-right')
            repo_info['today_stars'] = today_stars_element.text.strip() if today_stars_element else None

            repositories.append(repo_info)

        return repositories


class CrawlOSSHugginfacePapers(Action):
    async def run(self, msg: Message) -> str:
        logger.info(f"{msg}")
        return msg.text



TRENDING_ANALYSIS_PROMPT = """# Requirements
You are a GitHub Trending Analyst, aiming to provide users with insightful and personalized recommendations based on the latest
GitHub Trends. Based on the context, fill in the following missing information, generate engaging and informative titles, 
ensuring users discover repositories aligned with their interests.

# The title about Today's GitHub Trending
## Today's Trends: Uncover the Hottest GitHub Projects Today! Explore the trending programming languages and discover key domains capturing developers' attention. From ** to **, witness the top projects like never before.
## The Trends Categories: Dive into Today's GitHub Trending Domains! Explore featured projects in domains such as ** and **. Get a quick overview of each project, including programming languages, stars, and more.
## Highlights of the List: Spotlight noteworthy projects on GitHub Trending, including new tools, innovative projects, and rapidly gaining popularity, focusing on delivering distinctive and attention-grabbing content for users.
---
# Format Example

\```
# [Title]

## Today's Trends
Today, ** and ** continue to dominate as the most popular programming languages. Key areas of interest include **, ** and **.
The top popular projects are Project1 and Project2.

## The Trends Categories
1. Generative AI
    - [Project1](https://github/xx/project1): [detail of the project, such as star total and today, language, ...]
    - [Project2](https://github/xx/project2): ...
...

## Highlights of the List
1. [Project1](https://github/xx/project1): [provide specific reasons why this project is recommended].
...
\```

---
# Github Trending
{trending}
"""


class AnalysisOSSTrending(Action):

    async def run(
            self,
            trending: Any
    ):
        return await self._aask(TRENDING_ANALYSIS_PROMPT.format(trending=trending))


class OssWatcher(Role):
    name: str = "XiaoGang"
    profile: str = "OssWatcher"
    goal: str = "Generate an insightful GitHub Trending and Huggingface papers analysis report."
    constraints: str = "Only analyze based on the provided GitHub Trending and Huggingface papers data."

    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self._init_actions([CrawlOSSTrending, AnalysisOSSTrending])
        self._set_react_mode(RoleReactMode.BY_ORDER.value)

    async def _act(self) -> Message:
        logger.info(f"{self._setting}: to do {self.rc.todo}")
        todo = self.rc.todo
        msg = self.get_memories(k=1)[0]  # find the most recent messages
        new_msg = await todo.run(msg.content)
        msg = Message(content=str(new_msg), role=self.profile, cause_by=type(todo))
        self.rc.memory.add(msg)  # add the new message to memory
        return msg

async def discord_callback(msg: Message):
    intents = discord.Intents.default()
    intents.message_content = True
    client = discord.Client(intents=intents, proxy=CONFIG.global_proxy)
    token = os.environ["DISCORD_TOKEN"]
    channel_id = int(os.environ["DISCORD_CHANNEL_ID"])
    async with client:
        await client.login(token)
        channel = await client.fetch_channel(channel_id)
        lines = []
        for i in msg.content.splitlines():
            if i.startswith(("# ", "## ", "### ")):
                if lines:
                    await channel.send("\n".join(lines))
                    lines = []
            lines.append(i)

        if lines:
            await channel.send("\n".join(lines))


async def mail_callback(msg: Message):
    async_mailer = AsyncMailer()
    await async_mailer.send(os.environ["MAIL_SENDER"], os.environ["MAIL_RECEIVER"], 'GitHub Trending Analysis', msg.content)


async def oss_callback(discord: bool = True, mail: bool = True):
    callbacks = []
    if discord:
        callbacks.append(discord_callback)

    if mail:
        callbacks.append(mail_callback)
    if not callbacks:
        async def _print(msg: Message):
            print(msg.content)

        callbacks.append(_print)

    async def callback(msg: Message):
        await asyncio.gather(*[cb(msg) for cb in callbacks])

    return callback


async def oss_trigger():
    while True:
        yield Message(content="https://github.com/trending")
        await asyncio.sleep(3600 * 24)


async def main(discord: bool = True, mail: bool = True):
    runner = SubscriptionRunner()
    callback = await oss_callback(discord, mail)
    runner.model_rebuild()
    await runner.subscribe(OssWatcher(), oss_trigger(), callback)
    await runner.run()


if __name__ == "__main__":
    fire.Fire(main)

日志

2024-01-18 00:39:45.138 | INFO     | metagpt.const:get_metagpt_package_root:32 - Package root set to D:\workspace\sourcecode\MetaGPT
2024-01-18 00:39:45.281 | INFO     | metagpt.config:get_default_llm_provider_enum:126 - API: LLMProviderEnum.ZHIPUAI
2024-01-18 00:39:48.908 | INFO     | metagpt.config:get_default_llm_provider_enum:126 - API: LLMProviderEnum.ZHIPUAI
2024-01-18 00:39:48.914 | INFO     | __main__:_act:123 - XiaoGang(OssWatcher): to do CrawlOSSTrending
2024-01-18 00:39:50.138 | INFO     | __main__:_act:123 - XiaoGang(OssWatcher): to do AnalysisOSSTrending
 Here's a title for today's GitHub Trending based on the provided data:

**Today's Trends: Explore the Hottest GitHub Projects in Programming Languages and Domains**

---

## Today's Trends

Today, JavaScript and Python continue to dominate as the most popular programming languages. Key areas of interest include generative AI, personal finance, and scalability. Discover the top popular projects like never before, from **TencentARC/PhotoMaker** to **linexjlin/GPTs**.

## The Trends Categories

1. Generative AI
    * [TencentARC/PhotoMaker](https://github.com/TencentARC/PhotoMaker): A powerful photo manipulation tool using AI.
    * [linexjlin/GPTs](https://github.com/linexjlin/GPTs): A collection of leaked GPT-3 prompts.
2. Personal Finance
    * [maybe-finance/maybe](<https://github.com/maybe-finance/maybe>: A comprehensive personal finance and wealth management app.
3. Scalability
    * [binhnguyennus/awesome-scalability](<https://github.com/binhnguyennus/awesome-scalability>: A curated list of patterns for building scalable, reliable, and performant large-scale systems.

## Highlights of the List

1. **TencentARC/PhotoMaker**: This project offers a powerful photo manipulation tool that uses AI to create stunning images. With over 2,000 stars and 150 forks, it's a must-watch repository for AI-driven image processing.
2. **maybe-finance/maybe**: This comprehensive personal finance and wealth management app has earned over 10,000 stars and 741 forks. It's a great resource for anyone looking to manage their finances effectively.
3. **linexjlin/GPTs**: This repository contains a collection of leaked GPT-3 prompts, earning it 22,916 stars and 3,291 forks. It's an interesting resource for those interested in exploring AI-generated text.

Check out these projects and more in the full list above! Stay tuned for more insightful and personalized recommendations based on the latest GitHub Trends.
2024-01-18 00:40:14.373 | INFO     | metagpt.utils.cost_manager:update_cost:48 - Total running cost: $0.000 | Max budget: $10.000 | Current cost: $0.000, prompt_tokens: 2858, completion_tokens: 526

发送Discord效果

image.png

发送到邮箱

这里使用163的邮箱,需要开启smtp服务


image.png
image.png

MAIL_PASSWORD不是邮箱密码,是开启smtp服务时会生成,将MAIL_PASSWORD设置到环境变量中。
另外代码中MAIL_SENDER和MAIL_RECEIVER分别表示发件人和收件人,也通过环境变量设置。

发送邮件的类:

import asyncio
import os
from email.mime.text import MIMEText
from email.header import Header
import aiosmtplib
from aiosmtplib.email import formataddr

from metagpt.logs import logger


class AsyncMailer:
    def __init__(self, smtp_server="smtp.163.com", smtp_port=25):
        self.smtp_server = smtp_server
        self.smtp_port = smtp_port
        self.password = os.environ["MAIL_PASSWORD"]

    async def send(self, sender, receiver, title, content) -> None:
        message = MIMEText(content, 'plain', 'utf-8')
        message['From'] = formataddr((sender.split('@')[0], sender))  # 设置发件人昵称
        message['To'] = formataddr((receiver.split('@')[0], receiver))  # 设置收件人昵称
        # message['Message-ID'] = Header('123456789', 'utf-8')  # 设置邮件id
        message['Content-Type'] = Header('text/plain', 'utf-8')  # 设置邮件内容类型
        message['Content-Transfer-Encoding'] = Header('base64', 'utf-8')  # 设置邮件内容编码
        message['X-Priority'] = Header('3', 'utf-8')  # 设置邮件优先级
        message['X-Mailer'] = Header('Aiosmtplib', 'utf-8')  # 设置邮件客户端
        message['MIME-Version'] = Header('1.0', 'utf-8')  # 设置邮件版本
        message['X-AntiAbuse'] = Header('1', 'utf-8')  # 设置邮件防垃圾邮件
        message['Subject'] = Header(title, 'utf-8')  # 设置邮件主题

        # 异步连接邮件服务器并登录
        smtp_connection = aiosmtplib.SMTP(hostname=self.smtp_server, port=self.smtp_port, local_hostname='localhost')
        await smtp_connection.connect()
        await smtp_connection.login(sender, self.password)

        # 异步发送邮件
        await smtp_connection.sendmail(sender, receiver, message.as_string())

        # 关闭连接
        await smtp_connection.quit()
        logger.info("邮件发送成功!")




async def main():
    async_mailer = AsyncMailer()
    await async_mailer.send(os.environ["MAIL_SENDER"], os.environ["MAIL_RECEIVER"], 'Mail Test', 'Hello World!')

if __name__ == '__main__':
    # 运行示例
    asyncio.run(main())

增加发送邮件的callback


async def mail_callback(msg: Message):
    async_mailer = AsyncMailer()
    await async_mailer.send(os.environ["MAIL_SENDER"], os.environ["MAIL_RECEIVER"], 'GitHub Trending Analysis', msg.content)

async def oss_callback(discord: bool = True, mail: bool = True):
    callbacks = []
    if discord:
        callbacks.append(discord_callback)

    if mail:
        callbacks.append(mail_callback)
    if not callbacks:
        async def _print(msg: Message):
            print(msg.content)

        callbacks.append(_print)

    async def callback(msg: Message):
        await asyncio.gather(*[cb(msg) for cb in callbacks])

    return callback

邮箱发送效果


image.png
image.png

Huggingface Papers页面爬取和总结

下面我们再完成对Huggingface Papers页面的爬取,这个页面是Hugging Face论文页面,分享了与NLP和相关技术领域有关的研究论文、文章和资源,可以在这里找到关于模型、算法、实验等方面的详细信息。这里完成从Huggingface Papers获取每一篇Paper的链接,并通过链接访问标题的描述页面,在页面中获取Paper的 标题、摘要,然后自动生成总结内容的目录,每块内容做出对应的总结,形成一篇资讯文档。

Huggingface Papers页面爬取

通过F12或者右键菜单|检查打开开发者工具


image.png

然后找到如下部分:


image.png
image.png

首先通过bs4获得每篇paper的连接

def hg_article_urls(html_soup):
    _urls = []
    for article in html_soup.select('article.flex.flex-col.overflow-hidden.rounded-xl.border'):
        url = article.select_one('h3 a')['href']
        _urls.append('https://huggingface.co' + url)
    return _urls

需要注意的是需要使用<h3><a href>来进行定位,不能使用<a href>,即应像上面写为

url = article.select_one('h3 a')['href']

上面获取到url,如https://huggingface.co/papers/2401.10020,通过url链接访问paper描述页面,获取标题和摘要。

https://huggingface.co/papers/2401.10020为例:

image.png

通过下面代码获取上图中data-props的信息,因为data-props的内容是json字符串,所以通过json.loads解析为json对象。

        info = soup.select_one('section.pt-8.border-gray-100')
        data_props = json.loads(info.select_one('div.SVELTE_HYDRATER.contents')['data-props'])
image.png

如上图,通过data_props可以获取到paper的id、投票数、发布时间、标题和摘要的信息。
上面作为工具代码保存到了hg_parse.py中, 完整代码如下:

import asyncio
import json

import aiohttp
from bs4 import BeautifulSoup

from metagpt.config import CONFIG
from metagpt.logs import logger


def get_local_html_soup(url, features='html.parser'):
    with open(url, encoding="utf-8") as f:
        html = f.read()
    soup = BeautifulSoup(html, features)
    return soup


async def get_html_soup(url: str):
    async with aiohttp.ClientSession() as client:
        async with client.get(url, proxy=CONFIG.global_proxy) as response:
            response.raise_for_status()
            html = await response.text()

    soup = BeautifulSoup(html, 'html.parser')
    return url, soup


def hg_article_urls(html_soup):
    _urls = []
    for article in html_soup.select('article.flex.flex-col.overflow-hidden.rounded-xl.border'):
        url = article.select_one('h3 a')['href']
        _urls.append('https://huggingface.co' + url)
    return _urls


def hg_article_infos(_url, html_soup):
    logger.info(f'Parsing {_url}')
    _article = {}
    info = html_soup.select_one('section.pt-8.border-gray-100')
    data_props = json.loads(info.select_one('div.SVELTE_HYDRATER.contents')['data-props'])
    paper = data_props['paper']
    _article['url'] = _url
    _article['id'] = paper['id']
    _article['title'] = paper['title']
    _article['upvotes'] = paper['upvotes']
    _article['publishedAt'] = paper['publishedAt']
    _article['summary'] = paper['summary']
    return _article


async def get_hg_articles():
    _, _soup = await get_html_soup("https://huggingface.co/papers")
    hg_urls = hg_article_urls(_soup)
    _soups = await asyncio.gather(*[get_html_soup(url) for url in hg_urls])
    hg_articles = map(lambda param: hg_article_infos(param[0], param[1]), _soups)

    return list(hg_articles)

if __name__ == "__main__":
    import asyncio
    for article in asyncio.run(get_hg_articles()):
        print(article)


在前面的oss.py中增加Huggingface Papers页面爬取的Action:

class CrawlOSSHuggingfacePapers(Action):
    async def run(self, msg: Message) -> str:
        logger.info(f"{msg}")
        return await get_hg_articles()

Huggingface Papers页面总结

页面总结Action主要是写Prompt,参考github trending的Prompt实现AnalysisOSSHuggingfacePapers:

HG_PAPERS_ANALYSIS_PROMPT = """# Requirements
You are a Haggingface Papers Analyst, aiming to provide users with insightful and personalized consultation based on the latest
Haggingface Papers abstract. Based on the context, fill in the following missing information, generate engaging and informative titles, 
ensuring users discover articles aligned with their interests.

# The title about Today's Haggingface Papers Consultation
## Today's Haggingface Papers Consultation: Uncover the Hottest Haggingface Papers Today! Explore the trending programming languages and discover key domains capturing developers' attention. From ** to **, witness the top papers like never before.
## The Papers Categories: Dive into Today's Haggingface Papers Domains! Explore featured papers in domains such as ** and **. Get a quick overview of each paper, including upvotes, and more.
## Highlights of the List: Spotlight noteworthy papers on Haggingface Papers, including new tools, new methods, innovative papers, and rapidly gaining popularity, focusing on delivering distinctive and attention-grabbing content for users.
---
# Format Example

\```
# [Title]

## Today's Haggingface Papers Consultation
Today, ** and ** continue to dominate as the most popular research areas. Key areas of interest include **, ** and **.
The top popular papers are Paper1 and Paper2.

## The Papers Categories
1. Large Language Model
    - [Paper1](https://huggingface.co/papers/paper1): [Abstract of the paper, such as upvotes total ...]
    - [Paper2](https://huggingface.co/papers/paper2): ...
...

## Highlights of the List
1. [Paper1](https://huggingface.co/papers/paper1): [provide specific reasons why this paper is recommended].
...
\```

---
# Haggingface Papers
{papers}
"""


class AnalysisOSSHuggingfacePapers(Action):
    async def run(
            self,
            papers: Any
    ):
        return await self._aask(HG_PAPERS_ANALYSIS_PROMPT.format(papers=papers))

最终Haggingface Papers咨询信息发送到discord和邮箱的效果如下:


image.png image.png

相关文章

  • 多智能体强化学习简介

    一:智能体策略类型 多智能体系统下,每个智能体发出动作获得的奖励会受到其他智能体动作影响。多智能体系统的目标便是学...

  • 智能触控一体机功能有哪些?见科智能会议一体机

    智能触控一体机功能有哪些?智能会议哪家好?就找见科智能。 智能会议触摸一体机,也叫智能会议机。见科智能触摸一体机主...

  • 体智能

    有利骨骼和肌肉的生长发育。户外活动时,日光中的紫外线,可以使皮肤中的一种物质(麦角固醇)转化成维生素D,促进钙磷吸...

  • 体智能

    体适能的概念源起于美国,美国总统体育与竞技委员会于1971年给出的体适能定义被普遍接受,它是指个人能力足以胜任日常...

  • 心智社会--精选片段

    《心智社会》--从细胞到人工智能,人类思维的优雅解读 1、思维智能体 功能(function):智能体如何工作? ...

  • 0109编程-基于Plotly实现的简单智能体思路

    点击这里进入人工智能嘚吧嘚目录,观看全部文章 人工智能是研究什么的?—— 智能体Agent。 研究智能体的什么?—...

  • 第二节 理性智能体

    一、什么是理性智能体? 1、定义 智能体:指任何能通过传感器感知环境和通过执行器作用于环境的实体 理性智能体:对于...

  • DQN算法

    强化学习概要 定义 1、环境指的是智能体执行动作时所处的场景,而智能体则表示强化学习算法。环境首先向智能体发送一个...

  • 《人工智能·现代学习(智能体)》学习笔记

    *智能体和环境 理性智能的概念是研究人工智能方法的核心。智能体的行为取决于环境的性质。任何通过传感器感知环境并通过...

  • 部分可见马尔可夫决策过程(POMDP)(一)

    现实世界中,智能体往往智能观察到部分信息。每个智能体智能感知它周围的环境状态,并不了解系统的状态。部分可见可尔科夫...

网友评论

      本文标题:MetaGPT智能体开发入门3-订阅智能体(OSS)实践

      本文链接:https://www.haomeiwen.com/subject/cjtkodtx.html