> ## Documentation Index
> Fetch the complete documentation index at: https://wb-21fd5541-feature-automate-reference-docs-generation.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Tutorial eval

애플리케이션을 반복적으로 개선하기 위해서는 개선되고 있는지 평가할 방법이 필요합니다. 이를 위해 일반적인 방법은 변경이 있을 때 동일한 예제 세트로 테스트하는 것입니다. Weave는 평가를 추적하는 일급 방법을 제공합니다`Model` & `Evaluation` 클래스를 통해. 우리는 다양한 사용 사례를 지원할 수 있는 유연성을 제공하기 위해 최소한의 가정만을 하는 API를 구축했습니다.

![Evals hero](https://mintlify.s3.us-west-1.amazonaws.com/wb-21fd5541-feature-automate-reference-docs-generation/ko/images/evals-hero.png)

## 1. `Model`

`Model`s는 프롬프트, 온도 등과 같은 시스템에 대한 정보를 저장하고 버전을 관리합니다.
Weave는 이들이 사용될 때 자동으로 캡처하고 변경이 있을 때 버전을 업데이트합니다.

`Model`s는 `Model`를 상속받고 `predict` 함수 정의를 구현하여 선언됩니다. 이 함수는 하나의 예제를 받아 응답을 반환합니다.

<Important>
  **알려진 문제**: Google Colab을 사용하는 경우, 다음 예제에서 `async`를 제거하세요.
</Important>

<CodeGroup>
  ```python Python
  import json
  import openai
  import weave

  class ExtractFruitsModel(weave.Model):
      model_name: str
      prompt_template: str

      @weave.op()
      async def predict(self, sentence: str) -> dict:
          client = openai.AsyncClient()

          response = await client.chat.completions.create(
              model=self.model_name,
              messages=[
                  {"role": "user", "content": self.prompt_template.format(sentence=sentence)}
              ],
          )
          result = response.choices[0].message.content
          if result is None:
              raise ValueError("No response from model")
          parsed = json.loads(result)
          return parsed
  ```

  ```typescript TypeScript
  // Note: weave.Model is not supported in TypeScript yet.
  // Instead, wrap your model-like function with weave.op

  const model = weave.op(async function myModel({datasetRow}) {
    const prompt = `Extract fields ("fruit": <str>, "color": <str>, "flavor") from the following text, as json: ${datasetRow.sentence}`;
    const response = await openaiClient.chat.completions.create({
      model: 'gpt-3.5-turbo',
      messages: [{ role: 'user', content: prompt }],
      response_format: { type: 'json_object' }
    });
    return JSON.parse(response.choices[0].message.content);
  });
  ```
</CodeGroup>

다음과 같이 `Model` 객체를 일반적으로 인스턴스화할 수 있습니다:

<CodeGroup>
  ```python Python
  import asyncio
  import weave

  weave.init('intro-example')

  model = ExtractFruitsModel(
      model_name='gpt-3.5-turbo-1106',
      prompt_template='Extract fields ("fruit": <str>, "color": <str>, "flavor": <str>) from the following text, as json: {sentence}'
  )
  sentence = "There are many fruits that were found on the recently discovered planet Goocrux. There are neoskizzles that grow there, which are purple and taste like candy."
  print(asyncio.run(model.predict(sentence)))
  # if you're in a Jupyter Notebook, run:
  # await model.predict(sentence)
  ```

  ```typescript TypeScript
  await weave.init('intro-example');

  const sentence = "There are many fruits that were found on the recently discovered planet Goocrux. There are neoskizzles that grow there, which are purple and taste like candy.";
  const result = await model({ datasetRow: { sentence } });
  console.log(result);
  ```
</CodeGroup>

<Note>
  더 자세한 내용은 [Models](/ko/guides/core-types/models) 가이드를 확인하세요.
</Note>

## 2. 예제 수집하기

다음으로, 모델을 평가할 데이터셋이 필요합니다. `Dataset`는 Weave 객체로 저장된 예제 모음입니다. Weave UI에서 데이터셋을 다운로드하고, 탐색하고, 평가를 실행할 수 있습니다.

여기서는 코드에서 예제 목록을 구축하지만, 실행 중인 애플리케이션에서 한 번에 하나씩 로깅할 수도 있습니다.

<CodeGroup>
  ```python Python
  sentences = ["There are many fruits that were found on the recently discovered planet Goocrux. There are neoskizzles that grow there, which are purple and taste like candy.",
  "Pounits are a bright green color and are more savory than sweet.",
  "Finally, there are fruits called glowls, which have a very sour and bitter taste which is acidic and caustic, and a pale orange tinge to them."]
  labels = [
      {'fruit': 'neoskizzles', 'color': 'purple', 'flavor': 'candy'},
      {'fruit': 'pounits', 'color': 'bright green', 'flavor': 'savory'},
      {'fruit': 'glowls', 'color': 'pale orange', 'flavor': 'sour and bitter'}
  ]
  examples = [
      {'id': '0', 'sentence': sentences[0], 'target': labels[0]},
      {'id': '1', 'sentence': sentences[1], 'target': labels[1]},
      {'id': '2', 'sentence': sentences[2], 'target': labels[2]}
  ]
  ```

  ```typescript TypeScript
  const sentences = [
    "There are many fruits that were found on the recently discovered planet Goocrux. There are neoskizzles that grow there, which are purple and taste like candy.",
    "Pounits are a bright green color and are more savory than sweet.",
    "Finally, there are fruits called glowls, which have a very sour and bitter taste which is acidic and caustic, and a pale orange tinge to them."
  ];
  const labels = [
    { fruit: 'neoskizzles', color: 'purple', flavor: 'candy' },
    { fruit: 'pounits', color: 'bright green', flavor: 'savory' },
    { fruit: 'glowls', color: 'pale orange', flavor: 'sour and bitter' }
  ];
  const examples = sentences.map((sentence, i) => ({
    id: i.toString(),
    sentence,
    target: labels[i]
  }));
  ```
</CodeGroup>

그런 다음 데이터셋을 게시합니다:

<CodeGroup>
  ```python Python
  import weave
  # highlight-next-line
  weave.init('intro-example')
  dataset = weave.Dataset(name='fruits', rows=examples)
  # highlight-next-line
  weave.publish(dataset)
  ```

  ```typescript TypeScript
  import * as weave from 'weave';
  // highlight-next-line
  await weave.init('intro-example');
  const dataset = new weave.Dataset({
    name: 'fruits',
    rows: examples
  });
  // highlight-next-line
  await dataset.save();
  ```
</CodeGroup>

<Note>
  더 자세한 내용은 [Datasets](/ko/guides/core-types/datasets) 가이드를 확인하세요.
</Note>

## 3. 점수 함수 정의하기

`Evaluation`s는 지정된 점수 함수 목록 또는 `Model`s의 성능을 예제 세트에서 평가합니다 `weave.scorer.Scorer` 클래스를 사용하여.

<CodeGroup>
  ```python Python
  # highlight-next-line
  import weave
  from weave.scorers import MultiTaskBinaryClassificationF1

  @weave.op()
  def fruit_name_score(target: dict, output: dict) -> dict:
      return {'correct': target['fruit'] == output['fruit']}
  ```

  ```typescript TypeScript
  // highlight-next-line
  import * as weave from 'weave';

  const fruitNameScorer = weave.op(
    function fruitNameScore({target, output}) {
      return { correct: target.fruit === output.fruit };
    }
  );
  ```
</CodeGroup>

<Note>
  자체 점수 함수를 만들려면 [Scorers](/ko/guides/evaluation/scorers) 가이드에서 자세히 알아보세요.

  일부 애플리케이션에서는 사용자 정의 `Scorer` 클래스를 만들고 싶을 수 있습니다 - 예를 들어 표준화된 `LLMJudge` 클래스가 특정 매개변수(예: 채팅 모델, 프롬프트), 각 행의 특정 점수 매기기, 집계 점수의 특정 계산으로 생성되어야 하는 경우입니다. 다음 장인 `Scorer` 클래스 정의에 관한 튜토리얼은 [Model-Based Evaluation of RAG applications](/ko/tutorial-rag#optional-defining-a-scorer-class)에서 자세한 정보를 확인하세요.
</Note>

## 4. 평가 실행하기

이제 `ExtractFruitsModel`를 `fruits` 데이터셋에서 점수 함수를 사용하여 평가할 준비가 되었습니다.

<CodeGroup>
  ```python Python
  import asyncio
  import weave
  from weave.scorers import MultiTaskBinaryClassificationF1

  weave.init('intro-example')

  evaluation = weave.Evaluation(
      name='fruit_eval',
      dataset=dataset, 
      scorers=[
          MultiTaskBinaryClassificationF1(class_names=["fruit", "color", "flavor"]), 
          fruit_name_score
      ],
  )
  print(asyncio.run(evaluation.evaluate(model)))
  # if you're in a Jupyter Notebook, run:
  # await evaluation.evaluate(model)
  ```

  ```typescript TypeScript
  import * as weave from 'weave';

  await weave.init('intro-example');

  const evaluation = new weave.Evaluation({
    name: 'fruit_eval',
    dataset: dataset,
    scorers: [fruitNameScorer],
  });
  const results = await evaluation.evaluate(model);
  console.log(results);
  ```
</CodeGroup>

<Note>
  Python 스크립트에서 실행하는 경우 `asyncio.run`를 사용해야 합니다. 그러나 Jupyter 노트북에서 실행하는 경우 `await`를 직접 사용할 수 있습니다.
</Note>

## 5. 평가 결과 보기

Weave는 각 예측과 점수의 추적을 자동으로 캡처합니다.

평가에 의해 출력된 링크를 클릭하여 Weave UI에서 결과를 확인하세요.

![Evaluation results](https://mintlify.s3.us-west-1.amazonaws.com/wb-21fd5541-feature-automate-reference-docs-generation/ko/images/evals-hero.png)

## 다음 단계는?

다음 방법을 알아보세요:

1. **모델 성능 비교하기**: 다양한 모델을 시도하고 결과 비교하기
2. **내장 점수 함수 탐색하기**: Weave의 내장 점수 함수를 [Scorers guide](/ko/guides/evaluation/scorers)
3. **RAG 앱 구축하기**: [RAG tutorial](/ko/tutorial-rag)을 따라 검색 증강 생성 평가에 대해 알아보기
4. **고급 평가 패턴**: [Model-Based Evaluation](/ko/guides/evaluation/scorers#model-based-evaluation)에 대해 알아보고 LLM을 심사관으로 사용하기