Models and Weave Integration Demo

이것은 인터랙티브 노트북입니다. 로컬에서 실행하거나 아래 링크를 사용할 수 있습니다:

사전 요구사항

먼저, 필요한 라이브러리를 설치하고, API 키를 설정하고, wandb에 로그인한 다음, 새 wandb 프로젝트를 생성하세요.

Install weave, pandas, unsloth, wandb, litellm, pydantic, torch, and faiss-gpu using pip.

%%capture
!pip install weave wandb pandas pydantic litellm faiss-gpu
python
%%capture
!pip install unsloth
# Also get the latest nightly Unsloth!
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

환경에서 필요한 API 키를 추가하세요.

import os

from google.colab import userdata

os.environ["WANDB_API_KEY"] = userdata.get("WANDB_API_KEY")  # W&B Models and Weave
os.environ["OPENAI_API_KEY"] = userdata.get(
    "OPENAI_API_KEY"
)  # OpenAI - for retrieval embeddings
os.environ["GEMINI_API_KEY"] = userdata.get(
    "GEMINI_API_KEY"
)  # Gemini - for the base chat model

wandb에 로그인하고 새 프로젝트를 생성하세요.

import pandas as pd
import wandb

import weave

wandb.login()

PROJECT = "weave-cookboook-demo"
ENTITY = "wandb-smle"

weave.init(ENTITY + "/" + PROJECT)

Download `ChatModel` from Models Registry and implement `UnslothLoRAChatModel`

우리 시나리오에서는 Llama-3.2 모델이 이미 Model Team에 의해 unsloth library for performance optimization, and is available in the wandb Models Registry. 이 단계에서는 Registry에서 미세 조정된 ChatModel from the Registry and convert it into a weave.Model to make it compatible with the RagModel.

The RagModel referenced below is a top-level weave.Model that can be considered a complete RAG Application. It contains a ChatModel, vector database, and a prompt. The ChatModel is also a weave.Model, which contains code to download an artifact from the wandb Registry. ChatModel can be changed modularly to support any kind of other LLM chat model as part of the RagModel. For more information, view the model in Weave.

To load the ChatModel, unsloth.FastLanguageModel or peft.AutoPeftModelForCausalLM with adapters are used, enabling efficient integration into the app. After downloading the model from the Registry, you can set up the initialization and prediction logic by using the model_post_init method. The required code for this step is available in the Use tab of the Registry and can be copied directly into your implementation 아래 코드는 UnslothLoRAChatModel class to manage, initialize, and use the fine-tuned Llama-3.2 model retrieved from the wandb Models Registry. UnslothLoRAChatModel uses unsloth.FastLanguageModel for optimized inference. The model_post_init method handles downloading and setting up the model, while the predict method processes user queries and generates responses. To adapt the code for your use case, update the MODEL_REG_URL with the correct Registry path for your fine-tuned model and adjust parameters like max_seq_length or dtype based on your hardware or requirements.

from typing import Any

from pydantic import PrivateAttr
from unsloth import FastLanguageModel

import weave

class UnslothLoRAChatModel(weave.Model):
    """
    We define an extra ChatModel class to be able store and version more parameters than just the model name.
    Especially, relevant if we consider fine-tuning (locally or aaS) because of specific parameters.
    """

    chat_model: str
    cm_temperature: float
    cm_max_new_tokens: int
    cm_quantize: bool
    inference_batch_size: int
    dtype: Any
    device: str
    _model: Any = PrivateAttr()
    _tokenizer: Any = PrivateAttr()

    def model_post_init(self, __context):
        # we can simply paste this from the "Use" tab from the registry
        run = wandb.init(project=PROJECT, job_type="model_download")
        artifact = run.use_artifact(f"{self.chat_model}")
        model_path = artifact.download()

        # unsloth version (enable native 2x faster inference)
        self._model, self._tokenizer = FastLanguageModel.from_pretrained(
            model_name=model_path,
            max_seq_length=self.cm_max_new_tokens,
            dtype=self.dtype,
            load_in_4bit=self.cm_quantize,
        )
        FastLanguageModel.for_inference(self._model)

    @weave.op()
    async def predict(self, query: list[str]) -> dict:
        # add_generation_prompt = true - Must add for generation
        input_ids = self._tokenizer.apply_chat_template(
            query,
            tokenize=True,
            add_generation_prompt=True,
            return_tensors="pt",
        ).to("cuda")

        output_ids = self._model.generate(
            input_ids=input_ids,
            max_new_tokens=64,
            use_cache=True,
            temperature=1.5,
            min_p=0.1,
        )

        decoded_outputs = self._tokenizer.batch_decode(
            output_ids[0][input_ids.shape[1] :], skip_special_tokens=True
        )

        return "".join(decoded_outputs).strip()
python
MODEL_REG_URL = "wandb32/wandb-registry-RAG Chat Models/Finetuned Llama-3.2:v3"

max_seq_length = 2048  # Choose any! We auto support RoPE Scaling internally!
dtype = (
    None  # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
)
load_in_4bit = True  # Use 4bit quantization to reduce memory usage. Can be False.

new_chat_model = UnslothLoRAChatModel(
    name="UnslothLoRAChatModelRag",
    chat_model=MODEL_REG_URL,
    cm_temperature=1.0,
    cm_max_new_tokens=max_seq_length,
    cm_quantize=load_in_4bit,
    inference_batch_size=max_seq_length,
    dtype=dtype,
    device="auto",
)
python
await new_chat_model.predict(
    [{"role": "user", "content": "What is the capital of Germany?"}]
)

Integrate the new `ChatModel` version into `RagModel`

미세 조정된 채팅 모델에서 RAG 애플리케이션을 구축하면 전체 파이프라인을 재구축할 필요 없이 맞춤형 구성 요소를 사용하여 대화형 AI를 개선할 수 있습니다. 이 단계에서는 Weave 프로젝트에서 기존 RagModel from our Weave project and update its ChatModel to use the newly fine-tuned model. This seamless swap means that other components like the vector database (VDB) and prompts remain untouched, preserving the application’s overall structure while improving performance. 아래 코드는 Weave 프로젝트의 참조를 사용하여 RagModel object using a reference from the Weave project. The chat_model attribute of the RagModel is then updated to use the new UnslothLoRAChatModel instance created in the previous step. After this, the updated RagModel is published to create a new version. Finally, the updated RagModel is used to run a sample prediction query, verifying that the new chat model is being used.

RagModel = weave.ref(
    "weave:///wandb-smle/weave-cookboook-demo/object/RagModel:cqRaGKcxutBWXyM0fCGTR1Yk2mISLsNari4wlGTwERo"
).get()
python
RagModel.chat_model.chat_model
python
await RagModel.predict("When was the first conference on climate change?")
python
# MAGIC: exchange chat_model and publish new version (no need to worry about other RAG components)
RagModel.chat_model = new_chat_model
python
RagModel.chat_model.chat_model
python
# first publish new version so that in prediction we reference new version
PUB_REFERENCE = weave.publish(RagModel, "RagModel")
python
await RagModel.predict("When was the first conference on climate change?")

Run a `weave.Evaluation`

다음 단계에서는 기존 RagModel using an existing weave.Evaluation. This process ensures that the new fine-tuned chat model is performing as expected within the RAG application. To streamline integration and enable collaboration between the Models and Apps teams, we log evaluation results for both the model’s wandb run and as part of the Weave workspace. Models에서:

평가 요약은 미세 조정된 채팅 모델을 다운로드하는 데 사용된 wandb 실행에 기록됩니다. 여기에는 workspace view에 표시되는 요약 메트릭과 그래프가 포함됩니다.
평가 추적 ID가 실행 구성에 추가되어 Model Team이 더 쉽게 추적할 수 있도록 Weave 페이지에 직접 연결됩니다.

Weave에서:

The artifact or registry link for the ChatModel is stored as an input to the RagModel.
더 나은 컨텍스트를 위해 wandb 실행 ID가 평가 추적의 추가 열로 저장됩니다.

아래 코드는 평가 객체를 검색하고, 업데이트된 RagModel를 사용하여 평가를 실행하고, 결과를 wandb와 Weave 모두에 기록하는 방법을 보여줍니다. 평가 참조(WEAVE_EVAL)가 프로젝트 설정과 일치하는지 확인하세요.

# MAGIC: we can simply get an evaluation with a eval dataset and scorers and use them
WEAVE_EVAL = "weave:///wandb-smle/weave-cookboook-demo/object/climate_rag_eval:ntRX6qn3Tx6w3UEVZXdhIh1BWGh7uXcQpOQnIuvnSgo"
climate_rag_eval = weave.ref(WEAVE_EVAL).get()
python
with weave.attributes({"wandb-run-id": wandb.run.id}):
    # use .call attribute to retrieve both the result and the call in order to save eval trace to Models
    summary, call = await climate_rag_eval.evaluate.call(climate_rag_eval, RagModel)
python
# log to models
wandb.run.log(pd.json_normalize(summary, sep="/").to_dict(orient="records")[0])
wandb.run.config.update(
    {"weave_url": f"https://wandb.ai/wandb-smle/weave-cookboook-demo/r/call/{call.id}"}
)
wandb.run.finish()

새 RAG 모델을 Registry에 저장

업데이트된 RagModel를 Models 및 Apps 팀 모두가 향후 사용할 수 있도록 하기 위해 참조 아티팩트로 wandb Models Registry에 푸시합니다. 아래 코드는 업데이트된 weave object version and name for the updated RagModel and uses them to create reference links. A new artifact is then created in wandb with metadata containing the model’s Weave URL. This artifact is logged to the wandb Registry and linked to a designated registry path. 코드를 실행하기 전에 ENTITY and PROJECT variables match your wandb setup, and the target registry path is correctly specified. This process finalizes the workflow by publishing the new RagModel to the wandb ecosystem for easy collaboration and reuse.

MODELS_OBJECT_VERSION = PUB_REFERENCE.digest  # weave object version
MODELS_OBJECT_NAME = PUB_REFERENCE.name  # weave object name
python
models_url = f"https://wandb.ai/{ENTITY}/{PROJECT}/weave/objects/{MODELS_OBJECT_NAME}/versions/{MODELS_OBJECT_VERSION}"
models_link = (
    f"weave:///{ENTITY}/{PROJECT}/object/{MODELS_OBJECT_NAME}:{MODELS_OBJECT_VERSION}"
)

with wandb.init(project=PROJECT, entity=ENTITY) as run:
    # create new Artifact
    artifact_model = wandb.Artifact(
        name="RagModel",
        type="model",
        description="Models Link from RagModel in Weave",
        metadata={"url": models_url},
    )
    artifact_model.add_reference(models_link, name="model", checksum=False)

    # log new artifact
    run.log_artifact(artifact_model, aliases=[MODELS_OBJECT_VERSION])

    # link to registry
    run.link_artifact(
        artifact_model, target_path="wandb32/wandb-registry-RAG Models/RAG Model"
    )

Getting Started

Evaluations & Datasets

Models & Prompts

Advanced Topics

Production & Monitoring

API & Integration

Models and Weave Integration Demo

사전 요구사항

Download `ChatModel` from Models Registry and implement `UnslothLoRAChatModel`

Integrate the new `ChatModel` version into `RagModel`

Run a `weave.Evaluation`

새 RAG 모델을 Registry에 저장

​사전 요구사항

​Download ChatModel from Models Registry and implement UnslothLoRAChatModel

​Integrate the new ChatModel version into RagModel

​Run a weave.Evaluation

​새 RAG 모델을 Registry에 저장

사전 요구사항

Download `ChatModel` from Models Registry and implement `UnslothLoRAChatModel`

Integrate the new `ChatModel` version into `RagModel`

Run a `weave.Evaluation`

새 RAG 모델을 Registry에 저장