ppezik commited on
Commit
5a9d5b9
·
verified ·
1 Parent(s): 48f5f74

Update content.py

Browse files
Files changed (1) hide show
  1. content.py +1 -1
content.py CHANGED
@@ -14,7 +14,7 @@ LLMLagBench provides a systematic approach for **identifying the earliest probab
14
  an LLM's training data by evaluating its knowledge of recent events. The benchmark comprises of **1,700+ curated questions**
15
  about events sampled from news reports published between January 2021 and October 2025. Wwe plan to update the question set regularly. Each
16
  question could not be accurately answered before the event was reported in news media. We evaluate model
17
- responses using a **0-2 scale faithfulness metric** and apply the **PELT (Pruned Exact Linear Time)** changepoint
18
  detection algorithm to identify where model performance exhibits statistically significant drops,
19
  revealing their actual knowledge cutoffs.
20
 
 
14
  an LLM's training data by evaluating its knowledge of recent events. The benchmark comprises of **1,700+ curated questions**
15
  about events sampled from news reports published between January 2021 and October 2025. Wwe plan to update the question set regularly. Each
16
  question could not be accurately answered before the event was reported in news media. We evaluate model
17
+ responses using a **0-2 scale faithfulness metric** (which is basically accuracy of model responses to queries about time-sensitive knowledge when compared with gold answers) and apply the **PELT (Pruned Exact Linear Time)** changepoint
18
  detection algorithm to identify where model performance exhibits statistically significant drops,
19
  revealing their actual knowledge cutoffs.
20