Trey Grainger

Trey Grainger

Greenville, South Carolina, United States
3K followers 500+ connections

About

Experienced Engineering and Data Science Executive with a demonstrated history building…

Services

Activity

Join now to see all activity

Experience

  • Searchkernel Graphic

    Searchkernel

    United States

  • -

  • -

    Greenville, South Carolina, United States

  • -

    Greater Atlanta Area

  • -

  • -

  • -

  • -

    Ontario, Canada

  • -

  • -

  • -

  • -

    San Francisco Bay Area

  • -

    Norcross, GA

  • -

    Norcross, GA

  • -

    Norcross, GA

  • -

    Norcross, GA

  • -

    Norcross, GA

  • -

    Norcross, GA

Education

  • Stanford University Graphic

    Stanford University

    -

    Pursued Master's-level study at Stanford university, studying Information Retrieval & Web Search for university credit under Professor Chris Manning.

  • -

  • -

Publications

  • Application of Statistical Relational Learning to Hybrid Recommendation Systems

    STARAI 2016

    (with Shuo Yang, Mohammed Korayem, Khalifeh AlJadda, Sriraam Natarajan)

    Recommendation systems usually involve exploiting the relations among known features and content that describe items (content-based filtering) or the overlap of similar users who interacted with or rated the target item (collaborative filtering). To combine these two filtering approaches, current model-based hybrid recommendation systems typically require extensive feature engineering to construct a user profile…

    (with Shuo Yang, Mohammed Korayem, Khalifeh AlJadda, Sriraam Natarajan)

    Recommendation systems usually involve exploiting the relations among known features and content that describe items (content-based filtering) or the overlap of similar users who interacted with or rated the target item (collaborative filtering). To combine these two filtering approaches, current model-based hybrid recommendation systems typically require extensive feature engineering to construct a user profile. Statistical Relational Learning (SRL) provides a straightforward way to combine the two approaches. However, due to the large scale of the data used in real world recommendation systems, little research exists on applying SRL models to hybrid recommendation systems, and essentially none of that research has been applied on real big-data-scale systems. In this paper, we proposed a way to adapt the state-of-the-art in SRL learning approaches to construct a real hybrid recommendation system. Furthermore, in order to satisfy a common requirement in recommendation systems (i.e. that false positives are more undesirable and therefore penalized more harshly than false negatives), our approach can also allow tuning the tradeoff between the precision and recall of the system in a principled way. Our experimental results demonstrate the efficiency of our proposed approach as well as its improved performance on recommendation precision

    See publication
  • Entity Type Recognition using an Ensemble of Distributional Semantic Models to Enhance Query Understanding

    IEEE COMPSAC 2016

    (with Walid Shalaby, Khalifeh AlJadda, Mohammed Korayem)

    We present an ensemble approach for categorizing search query entities in the recruitment domain. Understanding the types of entities expressed in a search query (Company, Skill, Job Title, etc.) enables more intelligent information retrieval based upon those entities compared to a traditional keyword-based search. Because search queries are typically very short, leveraging a traditional bag-of-words model to identify entity types…

    (with Walid Shalaby, Khalifeh AlJadda, Mohammed Korayem)

    We present an ensemble approach for categorizing search query entities in the recruitment domain. Understanding the types of entities expressed in a search query (Company, Skill, Job Title, etc.) enables more intelligent information retrieval based upon those entities compared to a traditional keyword-based search. Because search queries are typically very short, leveraging a traditional bag-of-words model to identify entity types would be inappropriate due to the lack of contextual information. Our approach instead combines clues from different sources of varying complexity in order to collect real-world knowledge about query entities. We employ distributional semantic representations of query entities through two models: 1) contextual vectors generated from encyclopedic corpora like Wikipedia, and 2) high dimensional word embedding vectors generated from millions of job postings using word2vec. Additionally, our approach utilizes both entity linguistic properties obtained from WordNet and ontological properties extracted from DBpedia. We evaluate our approach on a data set created at CareerBuilder; the largest job board in the US. The data set contains entities extracted from millions of job seekers/recruiters search queries, job postings, and resume documents. After constructing the distributional vectors of search entities, we use supervised machine learning to infer search entity types. Empirical results show that our approach outperforms the state-of-the-art word2vec distributional semantics model trained on Wikipedia. Moreover, we achieve microaveraged F1 score of 97% using the proposed distributional representations ensemble.

    See publication
  • Macro-optimization of email recommendation response rates harnessing individual activity levels and group affinity trends

    IEEE ICMLA 2016

    (with Khalifeh AlJadda, Mohammed Korayem)

    Recommendation emails are among the best ways to re-engage with customers after they have left a website. While on-site recommendation systems focus on finding the most relevant items for a user at the moment (right item), email recommendations add two critical additional dimensions: who to send recommendations to (right person) and when to send them (right time). It is critical that a recommendation email system not send too many emails to too…

    (with Khalifeh AlJadda, Mohammed Korayem)

    Recommendation emails are among the best ways to re-engage with customers after they have left a website. While on-site recommendation systems focus on finding the most relevant items for a user at the moment (right item), email recommendations add two critical additional dimensions: who to send recommendations to (right person) and when to send them (right time). It is critical that a recommendation email system not send too many emails to too many users in too short of a time window, as users may unsubscribe from future emails or become desensitized and ignore future emails if they receive too many. Also, email service providers may mark such emails as spam if too many of their users are contacted in a short time-window. Optimizing email recommendation systems such that they can yield a maximum response rate for a minimum number of email sends is thus critical for the long-term performance of such a system. In this paper, we present a novel recommendation email system that not only generates recommendations, but which also leverages a combination of individual user activity data, as well as the behavior of the group to which they belong, in order to determine each user’s likelihood to respond to any given set of recommendations within a given time period. In doing this, we have effectively created a meta-recommendation system which recommends sets of recommendations in order to optimize the aggregate response rate of the entire system. The proposed technique has been applied successfully within CareerBuilder’s job recommendation email system to generate a 50% increase in total conversions while also decreasing sent emails by 72%.

    See publication
  • The Semantic Knowledge Graph: A compact, auto-generated model for real-time traversal and ranking of any relationship within a domain

    IEEE DSAA 2016

    (with Khalifeh AlJadda, Mohammed Korayem, Andries Smith)

    This paper describes a new kind of knowledge representation and mining system which we are calling the Semantic Knowledge Graph. At its heart, the Semantic Knowledge Graph leverages an inverted index, along with a complementary uninverted index, to represent nodes (terms) and edges (the documents within intersecting postings lists for multiple terms/nodes). This provides a layer of indirection between each pair of nodes and their…

    (with Khalifeh AlJadda, Mohammed Korayem, Andries Smith)

    This paper describes a new kind of knowledge representation and mining system which we are calling the Semantic Knowledge Graph. At its heart, the Semantic Knowledge Graph leverages an inverted index, along with a complementary uninverted index, to represent nodes (terms) and edges (the documents within intersecting postings lists for multiple terms/nodes). This provides a layer of indirection between each pair of nodes and their corresponding edge, enabling edges to materialize dynamically from underlying corpus statistics. As a result, any combination of nodes can have edges to any other nodes materialize and be scored to reveal latent relationships between the nodes. This provides numerous benefits: the knowledge graph can be built automatically from a real-world corpus of data, new nodes - along with their combined edges - can be instantly materialized from any arbitrary combination of preexisting nodes (using set operations), and a full model of the semantic relationships between all entities within a domain can be represented and dynamically traversed using a highly compact representation of the graph. Such a system has widespread applications in areas as diverse as knowledge modeling and reasoning, natural language processing, anomaly detection, data cleansing, semantic search, analytics, data classification, root cause analysis, and recommendations systems. The main contribution of this paper is the introduction of a novel system - the Semantic Knowledge Graph - which is able to dynamically discover and score interesting relationships between any arbitrary combination of entities (words, phrases, or extracted concepts) through dynamically materializing nodes and edges from a compact graphical representation built automatically from a corpus of data representative of a knowledge domain.

    See publication
  • Improving the Quality of Semantic Relationships Extracted from Massive User Behavioral Data

    IEEE Big Data 2015

    (with Khalifeh Aljadda and Mohammed Korayem)

    As the ability to store and process massive amounts of user behavioral data increases, new approaches continue to arise for leveraging the wisdom of the crowds to gain insights that were previously very challenging to discover by text mining alone. For example, through collaborative filtering, we can learn previously hidden relationships between items based upon users' interactions with them, and we can also perform ontology mining to learn…

    (with Khalifeh Aljadda and Mohammed Korayem)

    As the ability to store and process massive amounts of user behavioral data increases, new approaches continue to arise for leveraging the wisdom of the crowds to gain insights that were previously very challenging to discover by text mining alone. For example, through collaborative filtering, we can learn previously hidden relationships between items based upon users' interactions with them, and we can also perform ontology mining to learn which keywords are semantically-related to other keywords based upon how they are used together by similar users as recorded in search engine query logs. The biggest challenge to this collaborative filtering approach is the variety of noise and outliers present in the underlying user behavioral data. In this paper we propose a novel approach to improve the quality of semantic relationships extracted from user behavioral data. Our approach utilizes millions of documents indexed into an inverted index in order to detect and remove noise and outliers.

  • Query Sense Disambiguation Leveraging Large Scale User Behavioral Data

    IEEE Big Data 2015

    (with Mohammed Korayem, Camilo Ortiz, and Khalifeh Aljadda)

    Term ambiguity - the challenge of having multiple
    potential meanings for a keyword or phrase - can be a major
    problem for search engines. Contextual information is essential
    for word sense disambiguation, but search queries are often
    limited to very few keywords, making the available textual context
    needed for disambiguation minimal or non-existent. In this paper
    we propose a novel system to identify and…

    (with Mohammed Korayem, Camilo Ortiz, and Khalifeh Aljadda)

    Term ambiguity - the challenge of having multiple
    potential meanings for a keyword or phrase - can be a major
    problem for search engines. Contextual information is essential
    for word sense disambiguation, but search queries are often
    limited to very few keywords, making the available textual context
    needed for disambiguation minimal or non-existent. In this paper
    we propose a novel system to identify and resolve term ambiguity
    in search queries using large-scale user behavioral data. The
    proposed system demonstrates that, despite the lack of context
    in most keyword queries, multiple potential senses of a keyword
    or phrase within a search query can be accurately identified,
    disambiguated, and expressed in order to maximize the likelihood
    of fulfilling a user’s information need. The proposed system
    overcomes the immediate lack of context by leveraging largescale
    user behavioral data from historical query logs. Unlike
    traditional word sense disambiguation methods that rely on
    knowledge sources or available textual corpora, our system is
    language-agnostic, is able to easily handle domain-specific terms
    and meanings, and is automatically generated so that it does
    not grow out of date or require manual updating as ambiguous
    terms emerge or undergo a shift in meaning. The system has
    been implemented using the Hadoop eco-system and integrated
    within CareerBuilder’s semantic search engine.

  • Solr In Action

    Manning Publications Co.

    Solr in Action is a comprehensive guide to implementing scalable search using Apache Solr 4. This clearly-written book walks you through well-documented examples ranging from basic keyword searching to scaling a system for billions of documents and queries. You'll gain a deep understanding of how to implement core Solr capabilities such as faceted navigation through search results, matched snippet highlighting, field collapsing and search results grouping, spell checking, query auto-complete…

    Solr in Action is a comprehensive guide to implementing scalable search using Apache Solr 4. This clearly-written book walks you through well-documented examples ranging from basic keyword searching to scaling a system for billions of documents and queries. You'll gain a deep understanding of how to implement core Solr capabilities such as faceted navigation through search results, matched snippet highlighting, field collapsing and search results grouping, spell checking, query auto-complete, querying by functions, and geo-spatial searching.

    Other authors
    See publication
  • Augmenting Recommendation Systems Using a Model of Semantically-related Terms Extracted from User Behavior

    ACM RecSys 2014

    (with Khalifeh Aljadda, Mohammed Korayem, Camilo Ortiz, Chris Russell, David Bernal, Lamar Payson, and Scott Brown)

    Common difficulties like the cold-start problem and a lack of sufficient information about users due to their limited interactions have been major challenges for most recommender systems (RS). To overcome these challenges and many similar ones that result in low accuracy (precision and recall) recommendations, we propose a novel system that extracts semantically-related…

    (with Khalifeh Aljadda, Mohammed Korayem, Camilo Ortiz, Chris Russell, David Bernal, Lamar Payson, and Scott Brown)

    Common difficulties like the cold-start problem and a lack of sufficient information about users due to their limited interactions have been major challenges for most recommender systems (RS). To overcome these challenges and many similar ones that result in low accuracy (precision and recall) recommendations, we propose a novel system that extracts semantically-related search keywords based on the aggregate behavioral data of many users. These semantically-related search keywords can be used to substantially increase the amount of knowledge about a speciffic user's interests based upon even a few searches and thus improve the accuracy of the RS. The proposed system is capable of mining aggregate user search logs to discover semantic relationships between key phrases in a manner that is language agnostic, human understandable, and virtually noise-free. These semantically related keywords are obtained by looking at the links between queries of similar users which, we believe, represent a largely untapped source for discovering latent semantic relationships between search terms.

  • Crowdsourced Query Augmentation through Semantic Discovery of Domain-specific Jargon

    IEEE Big Data 2014

    (with Khalifeh Aljadda, Mohammed Korayem, and Chris Russell)

    Most work in semantic search has thus far focused upon either manually building language-specific taxonomies/ontologies or upon automatic techniques such as clustering or dimensionality reduction to discover latent semantic links within the content that is being searched. The former is very labor intensive and is hard to maintain, while the latter is prone to noise and may be hard for a human to understand or to interact with…

    (with Khalifeh Aljadda, Mohammed Korayem, and Chris Russell)

    Most work in semantic search has thus far focused upon either manually building language-specific taxonomies/ontologies or upon automatic techniques such as clustering or dimensionality reduction to discover latent semantic links within the content that is being searched. The former is very labor intensive and is hard to maintain, while the latter is prone to noise and may be hard for a human to understand or to interact with directly. We believe that the links between similar user’s queries represent a largely untapped source for discovering latent semantic relationships between search terms. The proposed system is capable of mining user search logs to discover semantic relationships between key phrases in a manner that is language agnostic, human understandable, and virtually noise-free.

  • PGMHD: A Scalable Probabilistic Graphical Model for Massive Hierarchical Data Problems

    IEEE Big Data 2014

    (with Khalifeh Aljadda, Mohammed Korayem, Camilo Ortiz, John Miller, and William York)

    In the big data era, scalability has become a crucial requirement for any useful computational model. Probabilistic graphical models are very useful for mining and discovering data insights, but they are not scalable enough to be suitable for big data problems. Bayesian Networks particularly demonstrate this limitation when their data is represented using few random variables while each random variable…

    (with Khalifeh Aljadda, Mohammed Korayem, Camilo Ortiz, John Miller, and William York)

    In the big data era, scalability has become a crucial requirement for any useful computational model. Probabilistic graphical models are very useful for mining and discovering data insights, but they are not scalable enough to be suitable for big data problems. Bayesian Networks particularly demonstrate this limitation when their data is represented using few random variables while each random variable has a massive set of values.

    With hierarchical data - data that is arranged in a treelike structure with several levels - one would expect to see hundreds of thousands or millions of values distributed over even just a small number of levels. When modeling this kind of hierarchical data across large data sets, Bayesian networks become infeasible for representing the probability distributions for the following reasons: i) Each level represents a single random variable with hundreds of thousands of values, ii) The number of levels is usually small, so there are also few random variables, and iii) The structure of the network is predefined since the dependency is modeled top-down from each parent to each of its child nodes, so the network would contain a single linear path for the random variables from each parent to each child node. In this paper we present a scalable probabilistic graphical model to overcome these limitations for massive hierarchical data. We believe the proposed model will lead to an easily-scalable, more readable, and expressive implementation for problems that require probabilisticbased solutions for massive amounts of hierarchical data. We successfully applied this model to solve two different challenging probabilistic-based problems on massive hierarchical data sets for different domains, namely, bioinformatics and latent semantic discovery over search logs.

Join now to see all publications

Projects

  • AI-Powered Search

    - Present

    Great search is all about delivering the right results. Today’s search engines are expected to be smart, understanding the nuances of natural language queries, as well as each user’s preferences and context. AI-Powered Search teaches you the latest machine learning techniques to create search engines that continuously learn from your users and your content, to drive more domain-aware and intelligent search. Written by Trey Grainger, the Chief Algorithms Officer at Lucidworks, this authoritative…

    Great search is all about delivering the right results. Today’s search engines are expected to be smart, understanding the nuances of natural language queries, as well as each user’s preferences and context. AI-Powered Search teaches you the latest machine learning techniques to create search engines that continuously learn from your users and your content, to drive more domain-aware and intelligent search. Written by Trey Grainger, the Chief Algorithms Officer at Lucidworks, this authoritative book empowers you to create and deploy search engines that take advantage of user interactions and the hidden semantic relationships in your content to constantly get smarter and automatically deliver better, more relevant search experiences.

    Other creators
    See project
  • Solr In Action

    -

    Solr in Action is a comprehensive guide to implementing scalable search using Apache Solr. This clearly written book walks you through well-documented examples ranging from basic keyword searching to scaling a system for billions of documents and queries. It will give you a deep understanding of how to implement core Solr capabilities.

    Other creators
    See project

Recommendations received

More activity by Trey

View Trey’s full profile

  • See who you know in common
  • Get introduced
  • Contact Trey directly
Join to view full profile

Other similar profiles

Explore top content on LinkedIn

Find curated posts and insights for relevant topics all in one place.

View top content

Others named Trey Grainger

Add new skills with these courses