1/11 course introduction
.ppt .pdf
Introduction to Linguistics for Natural Language Processing
1/18 NLP problems as classification, sparse data
.pptx .pdf
read_features.py
1/23 Zipf's law, low-count features, smoothing
.pptx .pdf
1/25 model selection
.pptx .pdf
plot-discriminants.py
1/30 perceptron, neural modeling
.pptx .pdf
use_pcn_sp12.py pcn_modified_sp12.py
2/1 support vector machines, stylometry, sentiment analysis
.pptx .pdf
Federalist papers: Fung (2003)
Sentiment analysis webpage
Sentiment analysis survey (2008)
Mining the Web for Feelings, Not Facts (NYT, August 23, 2009)
For $2 a Star, an Online Retailer Gets 5-Star Product Reviews (NYT, January 26, 2012)
Software That Listens for Lies (NYT, December 3, 2011)
- 2/6 information theory, decision tree, K nearest neighbors
.pptx .pdf
- 2/8 ensemble learning, modeling past tense with neural network
.pptx .pdf
ensemble learning
Rumelhart & McClelland (1986): On Learning the Past Tense of English verbs (using a neural network)
- 2/13 probability theory
.pptx .pdf
- 2/15 Bayesian networks, Naive Bayes classifier not ready yet
.pptx .pdf
Pantel & Lin (1998): SpamCop
- 2/20 generative probabilistic models, language models
.pptx .pdf
- 2/22 sequence classification, POS tagging, Hidden Markov Models
.pptx .pdf
- 2/27 HMMs continued, POS tagging through transformation-based learning
.pptx .pdf
Toutanova & Manning (2000): POS tagging and evaluation
Brill (1995): transformation-based learning
- 2/29 NP chunking and named entity recognition
.pptx .pdf
Tjong Kim Sang & Veenstra (1999): representing text chunks
CoNLL 2000 chunking shared task website
Tjong Kim Sang & Buchholz (2000): CoNLL 2000 shared task results
Bikel et al. (1999): named entity recognition
CoNLL 2003 NER shared task website
Tjong Kim Sang & De Meulder (2003): results on NER shared task
- 3/5 CFGs, PCFGs, parsing with CKY
.pptx .pdf
- 3/7 parsing the Penn Treebank, improving statistical parsing
.pptx .pdf
Marcus et al. (1993): Penn Treebank
Klein & Manning (2003): accurate unlexicalized parsing
Petrov et al. (2006): splitting tags for parsing
- 3/19 Discriminative classifiers, word sense disambiguation, decision list, logistic regression
.pptx .pdf
Yarowsky (1994): decision list (supervised)
Yarowsky (1995): decision list applied to WSD (semi-supervised)
book chapter: Naive Bayes and Logistic Regression
- 3/21 Undirected graphical models, maximum entropy classifier, conditional random fields, applications in NLP
.pptx .pdf
Ratnaparkhi (1998): maximum entropy models and applications
Klinger & Tomanek: generative/discriminative, directed/undirected graphical models
misconception example, from Koller & Friedman's book
parse reranking: Collins 2000 Charniak & Johnson 2005
Sha & Pereira (2003): shallow parsing with CRFs
- 3/26 Unsupervised learning: transitional probability, segmentation
.pptx .pdf
Harris (1954): From phoneme to morpheme
Saffran et al. (1996): word segmentation by infants
Gambell & Yang (2005): segmentation with stress information
Hewlett & Cohen (2011): word segmentation as general chunking
Ando & Lee (2003): Japanese segmentation
- 3/28 Unsupervised learning: model selection, minimal description length, morphology
.pptx .pdf
Goldsmith (2001): unsupervised learning of morphology
Goldsmith (2007): towards a new empiricism
Chan & Lignos (2011): unsupervised learning of morphology
Yang (2009, MS): Who's Afraid of George Kingsley Zipf?
- 4/2 Unsupervised learning: morphology continued; word class induction, k-means clustering
.pptx .pdf
Christodopoulos et al (2010): 2 decades of unsupervised POS induction
Labelle (2005): grammatical category acquisition
- 4/4 Unsupervised learning: distributional learning, agglomerative clustering, lexical category induction
.pptx .pdf
Redington, Chater, & Finch (1998)
Mintz, Newport, & Bever (2002)
Chan (2008)
- 4/9 Unsupervised learning: semantic topic models, latent semantic analysis
.pptx .pdf
Deerwester et al. (1990): Latent Semantic Analysis
LSA website
Hofmann (1999): probabilistic LSA
- 4/11 Semi-supervised learning: co-training, Yarowsky
.pptx .pdf
Banko & Brill (2001): classifier performance on large datasets
Joachims (1999): transductive SVM
Blum & Mitchell (1998): co-training
Yarowsky (1995): word sense disambiguation
Collins & Singer (1999): named entity recognition
- 4/16 Semi-supervised learning: learning relations for information extraction
.pptx .pdf
Riloff (1996): supervised learning of relation patterns
Yangarber (2000): learning relation patterns through bootstrapping
Yangarber (2003): learning relation patterns through countertraining
- 4/18 sentence alignment, paraphrase induction, coreference resolution
.pptx .pdf
Gale & Church (1993): sentence alignment
Barzilay & McKeown (2001): paraphrase induction
Haghighi & Klein (2009): coreference resolution
- 4/23 statistical machine translation
.pptx .pdf
Warren Weaver (1949): memorandum
Brown et al. (1990): statistical translation
Brown et al. (1993): statistical translation
- 4/25 syntax-based translation, history of MT
.pptx .pdf
Chiang (2007): synchronous grammars for translation
- 4/30 question answering, semantic entailment
.pptx .pdf
Banko et al. (2002): AskMSR
Ravichandran & Hovy (2002): learning patterns for question answering
Harabagiu et al. (2004): question answering through deep analysis
Dagan et al. (2006): Recognizing Textual Entailment
- 5/2 course summary, NLP and linguistic theory
.pptx .pdf
Abney (1996)
Pereira (2000)
Lee (2001/4)
Chater & Manning (2006)
Lappin & Shieber (2007)
Bod (2009)