最新
2.4GB 1.5b
1.1GB 1,798 Pulls 更新于4周前
知识图谱,如微软的Graph RAG,虽然可以增强RAG方法,但其构建成本高昂。Triplex在知识图谱构建方面实现了98%的成本降低,其成本仅为GPT-4的1/60,并可通过SciPhi的R2R实现本地图谱构建。
Triplex是基于SciPhi.AI开发的Phi3-3.8B进行微调的版本,用于从非结构化数据中创建知识图谱。它通过从文本或其他数据源中提取三元组——由主体、谓词和对象组成的基本陈述——来实现。
import json
from transformers import AutoModelForCausalLM, AutoTokenizer
def triplextract(model, tokenizer, text, entity_types, predicates):
input_format = """Perform Named Entity Recognition (NER) and extract knowledge graph triplets from the text. NER identifies named entities of given entity types, and triple extraction identifies relationships between entities using specified predicates.
**Entity Types:**
{entity_types}
**Predicates:**
{predicates}
**Text:**
{text}
"""
message = input_format.format(
entity_types = json.dumps({"entity_types": entity_types}),
predicates = json.dumps({"predicates": predicates}),
text = text)
messages = [{'role': 'user', 'content': message}]
input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt = True, return_tensors="pt").to("cuda")
output = tokenizer.decode(model.generate(input_ids=input_ids, max_length=2048)[0], skip_special_tokens=True)
return output
model = AutoModelForCausalLM.from_pretrained("sciphi/triplex", trust_remote_code=True).to('cuda').eval()
tokenizer = AutoTokenizer.from_pretrained("sciphi/triplex", trust_remote_code=True)
entity_types = [ "LOCATION", "POSITION", "DATE", "CITY", "COUNTRY", "NUMBER" ]
predicates = [ "POPULATION", "AREA" ]
text = """
San Francisco,[24] officially the City and County of San Francisco, is a commercial, financial, and cultural center in Northern California.
With a population of 808,437 residents as of 2022, San Francisco is the fourth most populous city in the U.S. state of California behind Los Angeles, San Diego, and San Jose.
"""
prediction = triplextract(model, tokenizer, text, entity_types, predicates)
print(prediction)
1. format your prompt as the huggingface example above (the 'message' variable)
2. in your terminal, enter the following and paste the formatted prompt
ollama run sciphi/triplex
我们希望Triplex能尽可能广泛地被访问,但我们也需要关注商业问题,因为我们仍处于早期阶段。研究和个人使用是可以的,但我们对商业使用施加了一些限制。
模型的权重受cc-by-nc-sa-4.0许可,但我们将为最近12个月内总收入低于500万美元的组织豁免。如果您想移除GPL许可证要求(双重许可证)或在使用超过收益限制的情况下进行商业使用,请联系我们的团队founders@sciphi.ai。