Learning Formality from Japanese-English Parallel Corpora



In this talk I present the work I did for my master’s thesis regarding the use of Japanese sentence formality markers in constructing a large semi-supervised labeled dataset for English formality. I discuss how models trained on our data are less topically biased and perform better than those trained on human-labeled formality data. Finally, I discuss techniques for adversarially decomposing style and content in latent vector representations of sentences.


This talk was given on December 19th 2020 to a virtual panel of professors at the University of Pennsylvania in fulfillment of the requirements for a Masters degree in Robotics.