Hi, I’m Joseph Chee Chang.

I work on HCI+NLP research

at Carnegie Mellon University.


Collaborative Crowdsourcing for Labeling Machine Learning Datasets

Joseph Chee Chang, Saleema Amershi, Ece Kamar. CHI 2017.
work done during internship at Microsoft Research, Redmond.
rate=25%, N=2,424

Generating comprehensive labeling guidelines for crowdworkers can be challenging for complex datasets. Revolt harnesses crowd disagreements to identify ambiguous concepts in the data and coordinates the crowd to collaboratively create rich structures for requesters to make post-hoc decisions, removing the need for comprehensive guidelines and enabling dynamic label boundaries.

- CHI, Classification, Crowdsourcing, HCI, Labeling, Machine Learning, SIGCHI, Sensemaking

Intentionally Uncertain Input

Supporting Mobile Sensemaking Through Intentionally Uncertain Highlighting

Joseph Chee Chang, Nathan Hahn, and Aniket Kittur. UIST 2016.
rate=21%, N=384

Highlighting can be mentally taxing for learners who are often unsure about how much information they needed to include. We introduce the idea of intentionally uncertain input in the context of highlighting on mobile devices. We present a system that uses force touch and fuzzy bounding boxes to support saving information while users are uncertain about where to highlight.

- HCI, Information Foraging, Interaction, Sensemaking, UIST


Clustering with Crowds and Computation

Joseph Chee Chang, Aniket Kittur, and Nathan Hahn. CHI 2016.
rate=23%, N=2,435


Many crowd clustering approaches have difficulties providing global context to workers in order to generate meaningful categories. Alloy uses a sample-and-search technique to provide global context, and combines the deep semantic knowledge from human computation and the scalability of machine learning models to create rich structures from unorganized documents with high quality and efficiency.

- Crowdsourcing, HCI, Information Synthesis, Machine Learning, SIGCHI, Sensemaking

The Knowledge Accelorator

Big Picture Thinking in Small Pieces

Nathan Hahn, Joseph Chee Chang, and Aniket Kittur. CHI 2016.
rate=23%, N=2,435


People often search to web to find solutions to problems beyond factual question, such as planning road trips, writing an report, or buying a new camera. The Knowledge Accelerator uses crowdworkers to synthesize different information sources on the web in response to a query. We prototyped this system in order to explore crowdsourcing complex, high context tasks in a microtask environment.

- Crowdsourcing, HCI, Information Foraging, Information Retrieval, Information Synthesis, SIGCHI, Sensemaking

Twitter Code-Switching

Recurrent-Neural-Network for Language Detection on Twitter Code-Switching Corpus

Joseph Chee Chang, Chu-Cheng Lin.
final project for the Deep Learning course at CMU.

Code-switching behavior is common on social media for expressing solidarity or to establish authority. While past work on automatic code-switching detection depends on dictionary look-up or named-entity recognition, our recurrent neural network model that relies on only raw features outperformed the top systems in the EMNLP'14 Code-Switching Workshop by 17% in error rate reduction.

- Code-Switching, Deep Learning, NLP, Neural Network, arXiv, pre-print


Learning to Find Translations and Transliterations on the Web

Joseph Chee Chang, Jason S. Chang, and Roger Jang. ACL 2012.
rate=21%, N=369

TermMine is an information extraction system that can automatically mine translation pairs of terms from the web. We used a small set of terms and translations to gather mixed-code text from the web to train a CRF model that can identify translation pairs at run-time.

- ACL, Information Extraction, Machine Learning, NLP, Translation


Supersense Tagging Named Entities on Wikipedia

Joseph Chee Chang, Richard Tsai, and Jason S. Chang. PACLIC 2009.

We introduced a method for classifying named-entities into broad semantic categories in WordNet. We extracted rich features from Wikipedia, allowing us to classify named-entities with high precision and coverage. The result is a large scale named-entity semantic database with 1.2 million entries and over 95% accuracy, covering 80% of all named-entities found on Wikipedia.

- Information Extraction, Machine Learning, NLP, PACLIC