Natural Language Processing is a subfield of Computer Science that inspects the intersection of computation and human language. Idiomatic language is in turn ubiquitous in human language—when someone is a “diamond in the rough”, they are not literally an uncut diamond, rather, they are a person whose goodness is hidden by their surface appearance. However, a “diamond in the rough” is a phrase that can literally mean an uncut diamond—this is to say, it has two different senses, one that is idiomatic and one that is literal. Given a dictionary entry, a human can easily distinguish between idiomatic and literal definitions, however, doing this in an automatic fashion is difficult because it requires asking whether the meaning represented in the definition corresponds with the literal meaning of the phrase. This research leverages Wiktionary, an extremely large, collaboratively authored dictionary, to perform idiom identification in a scalable manner through machine learning algorithms. To do this identification, we are developing two sets of features—traits that we describe a definition-phrase pair with—selectional preference features and graph-based features. Selectional preference features describe traits of language that are used in a way that violates its literal meaning while graph-based features describe where the entry occurs in relation to the other pages within Wiktionary (what links does this page contain?). We are using two kinds of machine learning algorithms for idiom identification—supervised methods and semi-supervised methods. Supervised methods allow us to verify the validity of the features that we develop, and semi-supervised methods allow us to use these features for knowledge discovery because they are ideal for discovering definitions that are not yet marked as idiomatic but that ought to be.