Spaces:

pelcra
/

llmlagbench

Sleeping

ppezik commited on Nov 15

Commit

5a9d5b9

verified ·

1 Parent(s): 48f5f74

Update content.py

Files changed (1) hide show

content.py CHANGED Viewed

@@ -14,7 +14,7 @@ LLMLagBench provides a systematic approach for **identifying the earliest probab
 an LLM's training data by evaluating its knowledge of recent events. The benchmark comprises of **1,700+ curated questions**
 about events sampled from news reports published between January 2021 and October 2025. Wwe plan to update the question set regularly. Each
 question could not be accurately answered before the event was reported in news media. We evaluate model
-responses using a **0-2 scale faithfulness metric** and apply the **PELT (Pruned Exact Linear Time)** changepoint
 detection algorithm to identify where model performance exhibits statistically significant drops,
 revealing their actual knowledge cutoffs.

 an LLM's training data by evaluating its knowledge of recent events. The benchmark comprises of **1,700+ curated questions**
 about events sampled from news reports published between January 2021 and October 2025. Wwe plan to update the question set regularly. Each
 question could not be accurately answered before the event was reported in news media. We evaluate model
+responses using a **0-2 scale faithfulness metric** (which is basically accuracy of model responses to queries about time-sensitive knowledge when compared with gold answers) and apply the **PELT (Pruned Exact Linear Time)** changepoint
 detection algorithm to identify where model performance exhibits statistically significant drops,
 revealing their actual knowledge cutoffs.