Hi! I'm a PhD candidate in Data Science at the Technion, advised by Yevgeni Berzak in the Language, Computation and Cognition (LaCC) Lab.
My research lies at the intersection between Natural Language Processing and Cognition, focusing on cognitively driven readability and text simplification, and the generation of eye-movement scanpaths during reading.
I co-develop EyeBench, a benchmark for predictive modeling from eye movements in reading.
Methods for scoring text readability have been studied for over a century, and are
widely used in research and in user-facing applications in many domains. Thus far,
the development and evaluation of such methods have primarily relied on two types of
offline behavioral data, performance on reading comprehension tests and ratings of text
readability levels. In this work, we instead focus on a fundamental and understudied
aspect of readability, real-time reading ease, captured with online reading measures
using eye tracking. We introduce an evaluation framework for readability scoring
methods which quantifies their ability to account for reading ease, while controlling
for content variation across texts. Applying this evaluation to prominent traditional
readability formulas, modern machine learning systems, frontier Large Language Models
and commercial systems used in education, suggests that they are all poor predictors
of reading ease in English. This outcome holds across native and non-native speakers,
reading regimes, and textual units of different lengths. The evaluation further reveals
that existing methods are often outperformed by word properties commonly used in
psycholinguistics for prediction of reading times. Our results highlight a fundamental
limitation of existing approaches to readability scoring, the utility of psycholinguistics
for readability research, and the need for new, cognitively driven readability scoring
approaches that can better account for reading ease.
@article{grutekeklein2025readability,
title={Readability Formulas, Systems and LLMs are Poor Predictors of Reading Ease},
author={Gruteke Klein, Keren and Frenkel, Shachar and Shubi, Omer and Berzak, Yevgeni},
year={2025},
note={under review},
url={https://arxiv.org/abs/2502.11150}
}
We present EyeBench, the first benchmark designed to evaluate machine learning
models that decode cognitive and linguistic information from eye movements
during reading. EyeBench offers an accessible entry point to the challenging and
underexplored domain of modeling eye tracking data paired with text, aiming to
foster innovation at the intersection of multimodal AI and cognitive science. The
benchmark provides a standardized evaluation framework for predictive models,
covering a diverse set of datasets and tasks, ranging from assessment of reading
comprehension to detection of developmental dyslexia. Progress on the EyeBench
challenge will pave the way for both practical real-world applications, such as
adaptive user interfaces and personalized education, and scientific advances in
understanding human language processing. The benchmark is released as an opensource software package which includes data downloading and harmonization
scripts, baselines and state-of-the-art models, as well as evaluation code, publicly
available at https://github.com/EyeBench/eyebench.
@inproceedings{shubieyebench,
title={EyeBench: Predictive Modeling from Eye Movements in Reading},
author={Shubi, Omer and Reich, David Robert and Gruteke Klein, Keren and Angel, Yuval and Prasse, Paul and J{\"a}ger, Lena Ann and Berzak, Yevgeni},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track}
}
Text simplification is a common practice for making texts easier to read and easier to understand. To which extent does it
achieve these goals, and which participant and text characteristics drive simplification benefits? In this work, we use eye
tracking to address these questions for the first time for the population of adult native (L1) English speakers. We find that 42%
of the readers exhibit reading facilitation effects, while only
2% improve reading comprehension accuracy. We further observe that reading fluency benefits are larger for slower and less
experienced readers, while comprehension benefits are more
substantial in lower comprehension readers, but not vice versa.
Finally, we find that high-complexity original texts are key for
enhancing reading fluency, while large complexity reduction is
more pertinent to improving comprehension. Our study highlights the potential of cognitive measures in the evaluation of
text simplification and distills empirically driven principles for
enhancing simplification effectiveness.
@inproceedings{gruteke2025effect,
title={The effect of text simplification on reading fluency and reading comprehension in L1 English speakers},
author={Gruteke Klein, Keren and Shubi, Omer and Frenkel, Shachar and Berzak, Yevgeni},
booktitle={Proceedings of the Annual Meeting of the Cognitive Science Society},
volume={47},
year={2025}
}
The effect of surprisal on processing difficulty
has been a central topic of investigation in psycholinguistics. Here, we use eyetracking data
to examine three language processing regimes
that are common in daily life but have not been
addressed with respect to this question: information seeking, repeated processing, and the
combination of the two. Using standard regimeagnostic surprisal estimates we find that the prediction of surprisal theory regarding the presence of a linear effect of surprisal on processing times, extends to these regimes. However,
when using surprisal estimates from regimespecific contexts that match the contexts and
tasks given to humans, we find that in information seeking, such estimates do not improve the
predictive power of processing times compared
to standard surprisals. Further, regime-specific
contexts yield near zero surprisal estimates with
no predictive power for processing times in repeated reading. These findings point to misalignments of task and memory representations
between humans and current language models,
and question the extent to which such models
can be used for estimating cognitively relevant
quantities. We further discuss theoretical challenges posed by these results.
@inproceedings{klein-etal-2024-effect,
title = {The Effect of Surprisal on Reading Times in Information Seeking and Repeated Reading},
author = {Gruteke Klein, Keren and Meiri, Yoav and Shubi, Omer and Berzak, Yevgeni},
editor = {Barak, Libby and Alikhani, Malihe},
booktitle = {Proceedings of the 28th Conference on Computational Natural Language Learning},
month = nov,
year = {2024},
address = {Miami, FL, USA},
publisher = {Association for Computational Linguistics},
url = {https://aclanthology.org/2024.conll-1.17/},
doi = {10.18653/v1/2024.conll-1.17},
pages = {219--230}
}
Talks & Presentations
EyeBench — Poster at EurIPS 2025, Copenhagen.
Text Readability Measures Do Not Predict Reading Ease — Poster at CPL 2025, Utrecht.
Differences in Text Simplification Effects in L1 vs L2 — Oral at IndiREAD 2025, Saarbrücken.
Text Simplification Effects on Reading Fluency and Rading Comprehension — Oral at CogSci 2025, San Francisco (Hybrid) and at the Multipleye Workshop 2025, Stuttgart.
Surprisal & Reading Times — Oral at CoNLL 2024, Miami; Poster at ISCOL 2024, Haifa.
Bio
🎓 Education.
PhD (Direct Track) in Data Science, Technion (Sep 2025–Present)
MSc (Direct Track) in Data Science, Technion (May 2024–Sep 2025), GPA 97.6
BSc in Data Science & Engineering, Technion (Oct 2020–Oct 2024)