聊聊Spring AI的Evaluator

本文主要研究一下Spring AI的Evaluator

Evaluator

spring-ai-client-chat/src/main/java/org/springframework/ai/evaluation/Evaluator.java

@FunctionalInterface
public interface Evaluator {

    EvaluationResponse evaluate(EvaluationRequest evaluationRequest);

    default String doGetSupportingData(EvaluationRequest evaluationRequest) {
        List<Document> data = evaluationRequest.getDataList();
        return data.stream()
            .map(Document::getText)
            .filter(StringUtils::hasText)
            .collect(Collectors.joining(System.lineSeparator()));
    }

}

Evaluator接口定義了evaluate方法,用于對ai生成的內容進行評估,避免AI沒有產生幻覺式的響應,它有兩個實現,分別是RelevancyEvaluator、FactCheckingEvaluator

EvaluationRequest

org/springframework/ai/evaluation/EvaluationRequest.java

public class EvaluationRequest {

    private final String userText;

    private final List<Document> dataList;

    private final String responseContent;

    public EvaluationRequest(String userText, String responseContent) {
        this(userText, Collections.emptyList(), responseContent);
    }

    public EvaluationRequest(List<Document> dataList, String responseContent) {
        this("", dataList, responseContent);
    }

    public EvaluationRequest(String userText, List<Document> dataList, String responseContent) {
        this.userText = userText;
        this.dataList = dataList;
        this.responseContent = responseContent;
    }

    //......
}   

EvaluationRequest定義了userText、dataList、responseContent屬性,其中userText是用戶的輸入,dataList是上下文數據,比如RAG追加的內容,responseContent是AI模型的響應

EvaluationResponse

org/springframework/ai/evaluation/EvaluationResponse.java

public class EvaluationResponse {

    private final boolean pass;

    private final float score;

    private final String feedback;

    private final Map<String, Object> metadata;

    @Deprecated
    public EvaluationResponse(boolean pass, float score, String feedback, Map<String, Object> metadata) {
        this.pass = pass;
        this.score = score;
        this.feedback = feedback;
        this.metadata = metadata;
    }

    public EvaluationResponse(boolean pass, String feedback, Map<String, Object> metadata) {
        this.pass = pass;
        this.score = 0;
        this.feedback = feedback;
        this.metadata = metadata;
    }

    //......
}   

EvaluationResponse定義了pass、score、feedback、metadata屬性

RelevancyEvaluator

org/springframework/ai/evaluation/RelevancyEvaluator.java

public class RelevancyEvaluator implements Evaluator {

    private static final String DEFAULT_EVALUATION_PROMPT_TEXT = """
                Your task is to evaluate if the response for the query
                is in line with the context information provided.\\n
                You have two options to answer. Either YES/ NO.\\n
                Answer - YES, if the response for the query
                is in line with context information otherwise NO.\\n
                Query: \\n {query}\\n
                Response: \\n {response}\\n
                Context: \\n {context}\\n
                Answer: "
            """;

    private final ChatClient.Builder chatClientBuilder;

    public RelevancyEvaluator(ChatClient.Builder chatClientBuilder) {
        this.chatClientBuilder = chatClientBuilder;
    }

    @Override
    public EvaluationResponse evaluate(EvaluationRequest evaluationRequest) {

        var response = evaluationRequest.getResponseContent();
        var context = doGetSupportingData(evaluationRequest);

        String evaluationResponse = this.chatClientBuilder.build()
            .prompt()
            .user(userSpec -> userSpec.text(DEFAULT_EVALUATION_PROMPT_TEXT)
                .param("query", evaluationRequest.getUserText())
                .param("response", response)
                .param("context", context))
            .call()
            .content();

        boolean passing = false;
        float score = 0;
        if (evaluationResponse.toLowerCase().contains("yes")) {
            passing = true;
            score = 1;
        }

        return new EvaluationResponse(passing, score, "", Collections.emptyMap());
    }

}

RelevancyEvaluator讓AI去評估響應是否與上下文信息一致,給出yes或者no的結果,如果是yes則passing為true,score為1,否則默認passing為false,score為0

示例

@Test
void testEvaluation() {

    dataController.delete();
    dataController.load();

    String userText = "What is the purpose of Carina?";

    ChatResponse response = ChatClient.builder(chatModel)
            .build().prompt()
            .advisors(new QuestionAnswerAdvisor(vectorStore))
            .user(userText)
            .call()
            .chatResponse();
    String responseContent = response.getResult().getOutput().getContent();

    var relevancyEvaluator = new RelevancyEvaluator(ChatClient.builder(chatModel));

    EvaluationRequest evaluationRequest = new EvaluationRequest(userText,
            (List<Content>) response.getMetadata().get(QuestionAnswerAdvisor.RETRIEVED_DOCUMENTS), responseContent);

    EvaluationResponse evaluationResponse = relevancyEvaluator.evaluate(evaluationRequest);

    assertTrue(evaluationResponse.isPass(), "Response is not relevant to the question");

}

這里先用userText去問下AI,然后將responseContent、QuestionAnswerAdvisor.RETRIEVED_DOCUMENTS一起丟給relevancyEvaluator,再用AI去評估一下

FactCheckingEvaluator

org/springframework/ai/evaluation/FactCheckingEvaluator.java

public class FactCheckingEvaluator implements Evaluator {

    private static final String DEFAULT_EVALUATION_PROMPT_TEXT = """
                Evaluate whether or not the following claim is supported by the provided document.
                Respond with "yes" if the claim is supported, or "no" if it is not.
                Document: \\n {document}\\n
                Claim: \\n {claim}
            """;

    private static final String BESPOKE_EVALUATION_PROMPT_TEXT = """
                Document: \\n {document}\\n
                Claim: \\n {claim}
            """;

    private final ChatClient.Builder chatClientBuilder;

    private final String evaluationPrompt;

    /**
     * Constructs a new FactCheckingEvaluator with the provided ChatClient.Builder. Uses
     * the default evaluation prompt suitable for general purpose LLMs.
     * @param chatClientBuilder The builder for the ChatClient used to perform the
     * evaluation
     */
    public FactCheckingEvaluator(ChatClient.Builder chatClientBuilder) {
        this(chatClientBuilder, DEFAULT_EVALUATION_PROMPT_TEXT);
    }

    /**
     * Constructs a new FactCheckingEvaluator with the provided ChatClient.Builder and
     * evaluation prompt.
     * @param chatClientBuilder The builder for the ChatClient used to perform the
     * evaluation
     * @param evaluationPrompt The prompt text to use for evaluation
     */
    public FactCheckingEvaluator(ChatClient.Builder chatClientBuilder, String evaluationPrompt) {
        this.chatClientBuilder = chatClientBuilder;
        this.evaluationPrompt = evaluationPrompt;
    }

    /**
     * Creates a FactCheckingEvaluator configured for use with the Bespoke Minicheck
     * model.
     * @param chatClientBuilder The builder for the ChatClient used to perform the
     * evaluation
     * @return A FactCheckingEvaluator configured for Bespoke Minicheck
     */
    public static FactCheckingEvaluator forBespokeMinicheck(ChatClient.Builder chatClientBuilder) {
        return new FactCheckingEvaluator(chatClientBuilder, BESPOKE_EVALUATION_PROMPT_TEXT);
    }

    /**
     * Evaluates whether the response content in the EvaluationRequest is factually
     * supported by the context provided in the same request.
     * @param evaluationRequest The request containing the response to be evaluated and
     * the supporting context
     * @return An EvaluationResponse indicating whether the claim is supported by the
     * document
     */
    @Override
    public EvaluationResponse evaluate(EvaluationRequest evaluationRequest) {
        var response = evaluationRequest.getResponseContent();
        var context = doGetSupportingData(evaluationRequest);

        String evaluationResponse = this.chatClientBuilder.build()
            .prompt()
            .user(userSpec -> userSpec.text(this.evaluationPrompt).param("document", context).param("claim", response))
            .call()
            .content();

        boolean passing = evaluationResponse.equalsIgnoreCase("yes");
        return new EvaluationResponse(passing, "", Collections.emptyMap());
    }

}

FactCheckingEvaluator旨在評估AI生成的響應在給定上下文中的事實準確性。該評估器通過驗證給定的聲明(claim)是否邏輯上支持提供的上下文(document),幫助檢測和減少AI輸出中的幻覺現象;在使用FactCheckingEvaluator時,claim和document會被提交給AI模型進行評估。為了更高效地完成這一任務,可以使用更小且更高效的AI模型,例如Bespoke的Minicheck。Minicheck 是一種專門設計用于事實核查的小型高效模型,它通過分析事實信息片段和生成的輸出,驗證聲明是否與文檔相符。如果文檔能夠證實聲明的真實性,模型將回答“是”,否則回答“否”。這種模型特別適用于檢索增強型生成(RAG)應用,確保生成的答案基于上下文信息。

示例

@Test
void testFactChecking() {
  // Set up the Ollama API
  OllamaApi ollamaApi = new OllamaApi("http://localhost:11434");

  ChatModel chatModel = new OllamaChatModel(ollamaApi,
                OllamaOptions.builder().model(BESPOKE_MINICHECK).numPredict(2).temperature(0.0d).build())


  // Create the FactCheckingEvaluator
  var factCheckingEvaluator = new FactCheckingEvaluator(ChatClient.builder(chatModel));

  // Example context and claim
  String context = "The Earth is the third planet from the Sun and the only astronomical object known to harbor life.";
  String claim = "The Earth is the fourth planet from the Sun.";

  // Create an EvaluationRequest
  EvaluationRequest evaluationRequest = new EvaluationRequest(context, Collections.emptyList(), claim);

  // Perform the evaluation
  EvaluationResponse evaluationResponse = factCheckingEvaluator.evaluate(evaluationRequest);

  assertFalse(evaluationResponse.isPass(), "The claim should not be supported by the context");

}

這里使用ollama調用bespoke-minicheck模型,其temperature設置為0.0,之后把context與claim都傳遞給factCheckingEvaluator去評估

小結

Spring AI提供了Evaluator接口定義了evaluate方法,用于對ai生成的內容進行評估,避免AI沒有產生幻覺式的響應,它有兩個實現,分別是RelevancyEvaluator、FactCheckingEvaluator。RelevancyEvaluator用于評估相關性,FactCheckingEvaluator用于評估事實準確性。

doc

?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。

推薦閱讀更多精彩內容