Hi! I'm Caden.
Email: kh4dien [at] gmail [dot] com
Find me on Github |
Blog
Current
Work
-
Open Source Automated Interpretability for Sparse Autoencoder Features
Caden Juang*, Gonçalo Paulo*, Jacob Drori, Nora Belrose
Eleuther
post | code | demo
-
NNsight and NDIF: Democratizing Access to Foundation Model Internals
Jaden Fiotto-Kaufman*, Alexander R Loftus*, Eric Todd, Jannik Brinkmann, Caden Juang, Koyena Pal, Can Rager, Aaron Mueller, Samuel Marks, Arnab Sen Sharma, Francesca Lucchetti, Michael Ripa, Adam Belfki, Nikhil Prakash, Sumeet Multani, Carla Brodley, Arjun Guha, Jonathan Bell, Byron Wallace, David Bau
Arxiv
paper | code
-
Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals
Joshua Clymer*, Caden Juang*, Severin Field*
Arxiv
paper
Other
Last updated: 07/2024