Ai行业新闻 -

Anthropic 教程：如何构建自己的司法人工智能助手

By aihubon
Dec 19, 2023 - 2 min read

克劳德是什么？

Claude 是一个大型语言模型，由 Anthropic 创建。Claude 可以作为聊天机器人、摘要工具、代码编写助手等为您提供帮助！最近，Anthropic 宣布 Claude 正在将其上下文大小增加到 100k 个标记，即大约 75,000 个单词！这是一个很大的容量，可以让许多人加快处理大型文档和书籍的速度。以前，光是阅读这么长的文本就需要大约5个小时。现在，该模型将能够在几分钟内阅读、总结、分析文本并回答问题！Anthropic 的克劳德也主要关注安全性。此外，用户声称与他们的法学硕士的互动给人更人性化的感觉。也许新的领导者出现，我们在不久的将来都会使用 Anthropic Apps？

也许吧，但让我们先测试一下我们来的目的并检查一下！

如何使用它？

要使用克劳德，您必须申请抢先体验！

今天我将使用 Anthropic Python SDK 来让我们更轻松地使用模型。您还可以使用 API 或 TypeScript/JavaScript SDK。

法律科技 – 法律人工智能

在复杂的法律事务领域，准确分析和解释法律文件的能力可以产生重大影响。然而，这些文档错综复杂的语言和长度往往使这个过程变得繁琐和耗时。通过这个简单的例子，我们可以尝试探索 Anthropic 的 Claude 如何通过快速浏览这些冗长的文本来帮助传播这些大文本。秒并提取相关信息、全面的见解，包括潜在影响、情绪、影响以及某些法律段落（例如合同中的内容）可能存在的陷阱或警告。

我们在这里探索的有趣之处，不是我们熟悉的能力，比如总结、预测等，而是探索克劳德作为宪政人工智能的宪法，以及对大型、语言复杂的提示的处理。

我们正在建设什么？

简而言之，我们将构建一个非常简单的 API，利用claude-v1-100k模型从这些大型提示中提取信息。

文件

理想情况下，我们将拥有一个可以查询的法律数据库，或者一个更强大的搜索界面来进一步自动化该过程，但是，为了本教程的简洁性，我想使用工作目录中的本地文件。对于初学者，我将使用令牌在 [40000 和 80000] 范围内的文件，但请随意测试限制！请记住，该模型是Claude-v1-100k. 我们看看克劳德如何处理它们。这些文件将采用.pdf格式，因此为了处理它们，我们将使用 PDF 阅读器。

依赖关系

让我们首先创建一个新的目录和虚拟环境。

mkdir claude_tutorialcd claude_tutorialpython3 -m venv venv# Linux/MacOSsource venv/bin/activate# Windowsvenv\Scripts\activate.bat

出于本教程的目的，我们将使用 PyPDF2 和 Anthropic SDK。让我们安装它们吧！

pip install PyPDF2 pycryptodome # PyPDF2 and ycryptodome are used to read PDF filespip install anthropic # Anthropic SDK

此外，我们还可以在 FastApi 服务器中运行它，所以让我们添加这些依赖项

pip install fastapi uvicorn # framework for creating APIs and a server to serve those APIs respectively

现在是构建我们的 API 的时候了

首先导入必要的库。

import osfrom PyPDF2 import PdfReaderimport anthropicfrom fastapi import FastAPI, Response

另外，我还拥有一个 API 密钥，是我通过早期访问获得的。

API_KEY = "sk-ant-..."anthropic_client = anthropic.Client(API_KEY)app = FastAPI()

用法

让我们首先定义我们的函数来读取 pdf 文件并利用 Claude 来分析文档，我们还将定义一个输出结构，以便轻松从响应中提取信息。

首先，让我们创建一个函数来分析给定 PDF 文件中的法律案例。我们将提供文件的路径，然后读取它，检查文本的长度，如果没问题，然后将其发送到API进行分析！

async def mine_case(path: str, input_prompt: str) -> str:    reader = PdfReader(path)    text = "\n".join([page.extract_text() for page in reader.pages])    no_tokens = anthropic.count_tokens(text)    print(f"Number of tokens in text: {no_tokens}")    if no_tokens > 100000:        raise ValueError(f"Text is too long {no_tokens}.")    prompt = prompt = f"""    {anthropic.HUMAN_PROMPT}: here's a case file extract in  tags {text}    {anthropic.HUMAN_PROMPT}:understand then present the key pieces such as case ID, date, Plaintiff, Appellent, what is the case type, jurisdiction, a short summary, sentiment and its impact on business, and adverse findings, and outcome and put them in separate xml tags.        \n\n{anthropic.AI_PROMPT}:\n\ncase:"""    res = anthropic_client.completion(prompt=prompt, model="claude-v1.3-100k", max_tokens_to_sample=1000)    return res["completion"]

请注意，我们有一个提示并使用 XML 标签来构建我们的提示和响应，请随意自定义提示！

请注意，停止标记是\n\nHuman

现在我们可以使用它从我们的案例中提取信息！但首先，让我们创建一个快速端点来调用此函数。

# Add our endpoint that invoke our function we created earlier and set it to return response as xml@app.get("/case", response_class=Response, responses={200: {"content": {"application/xml": {}}}})async def get_case():    # let's define our prompt and pass it to our function        return Response(content= await mine_case('test.pdf'), media_type="application/xml")

现在让我们运行我们的服务器，并导航到我们的服务器localhost以通过 Swagger 测试 api！

uvicorn main:app --reload

结果

 2021Michele Yates v. Pinellas Hematology & Oncology, P.A. , 8:16-cv-00799- WFJ -CPTMichele Yates,Pinellas Hematology & Oncology, P.A.Qui Tam ActionUnited States Court of Appeals For the Eleventh CircuitThe jury in this qui tam case found that Pinellas Hematology & Oncology violated the False Claims Act, 31 U.S.C. § 3729 et seq., on 214 occasions, and that the United States had sustained $755.54 in damages.The district court trebled the damages and imposed statutory minimum penalties of $1,177,000.AdverseThe adverse findings and penalties imposed can negatively impact the business and reputation of the company.The jury found Pinellas Hematology & Oncology violated the False Claims Act by defrauding the federal government through the submission of 214 false claims resulting in damages of $755.54.The district court imposed treble damages of $2266.62 and statutory penalties of $1,177,000.

我们可以到此为止，但作为奖励，让我们添加一个端点来传播研究论文，并为我们提供关键发现的 TL;DR 版本，这可能会为如何影响担任角色的提示提供更多见解。

我们可以简单地定义一个新的提示和一个新的端点，然后开始测试！

# Example prompt    prompt = f"""{anthropic.HUMAN_PROMPT}: here's a research article extract in  tags {text}    {anthropic.HUMAN_PROMPT}: As an expert researcher, peer reviewer and principal investigator, read and understand the article then present the key findings and supporting arguments in bullet points.        \n\n{anthropic.AI_PROMPT}:\n\nsummary of effective dissemination of this research:"""

我们可以继续探索其他用例，如果您想在黑客马拉松之前做一些功课，使用来自可疑媒体的 RSS Feed 构建“健康且安全”的新闻摘要，查找棘手合约语言中的漏洞，创建儿童友好型内容音乐歌词中的故事……

这些应该让您熟悉 Anthropic 的一些技巧，以便更好地与 Claude 互动！

结论

正如你所看到的，我们在几秒钟内洞察了案件（100多页）的关键信息。这意味着Anthropic的Claude能够处理大文本。我们可以进一步阐述的是，用一个提示来总结法庭上案件的发展，以及最重要的论据，还有更多的见解！

谢谢你！– AI未来百科 ; 探索AI的边界与未来！懂您的AI未来站