Marina Santini

283 Seguidores

63 SlideShares 283 Seguidores 34 Siguiendos

I am a computational linguist with a strong interest in textual and linguistic features, machine learning and intensive textual data processing. My personal challenge is to extract "contextualized" information from big unstructured textual data leveraging on the concept of "genre". The word "genre" means "type of text". Nowadays all kinds of businesses, enterprises and customer care services produce huge amount of data in the form of many different "genres", i.e. emails, memos, notes from call-centers, news, user groups, chats, reports, tweets, Facebook pages, blogs, forums, marketing material and so on. All these textual genres contain valuable but unstructured data. The exploitation of ...

machine learning language technology supervised classification weka computational semantics nlp decision trees sentiment analysis svm noise uppsala university supervised machine learning logistic regression semantic analysis in language technology text analytics gain ratio information gain divide and conquer entropy marina santini genre perceptron mesh wordle mira tag clouds word clouds inductive bias crossvalidation description logics best split similarity semantic analysis semi-supervised learning lexical semantics formal semantics evaluation sampling smoothing text mining unification independence semantics unstructured data opinion mining structured data predicate-argument structure dependency parsing semantic roles thematic roles semantic web owl rdf rules axioms of probability pointwise mutual information query log analysis conditional probability induction naive bayes baseline algorithm emotion wordnet web corpora corpus evaluation margin automatic genre identification selectional restrictions formal languages nearest neighbors flipped classroom semantics in language technology pruning supervised learning events clustering domain-specific statistical inference automata training set hypothesis testing maximum likelihood estimation (mle) spam filtering expectations z-test distance metric probabilities variance statistics algorithms for hmms smoothing for pos tagging markov assumptions pos tagging with hmms hidden markov models (hmms) em for naive bayes hidden and latent variables maximum likelihood estimation expectation-maximization problems for hmms stochastic variables naive bayes classifiers bayesian classification naive bayes in nlp frequency functions joint probabilities instance attributes estimation conditional probabilities evaluation criteria layout semantically-related words meaningful adjacencies finite state automata fsa non-deterministic deterministic regular languages pumping lemma regular expressions finite state machines non-terminals context-free grammars phrase structure grammars cfgs backus-naur form terminals addition rule probability therory probability theorems bayes law marginal probability multiplication rule examination cooperation k-nearest neighbors main theorem feature representation margin and separability the norm maximizing margin margin infused relaxed algorithm support vectors machines max margin max log-likelihood minimum error compositionality corpus-based approaches event representations distributional semantics description logics & the web ontology language the semantics of first-order logic formal and computational representations latent semantic analysis topic models lamba calculus roles semantic role labelling ontologies semantic word clouds quantitative evaluation dissimilarity big data unsupervised classification overlap measure distance modified value difference metric lazy learning eager learning logistic regression/maximum entropy svms statistical software machine learning workbench k-nn classifiers support vector machine structured svms conditional random fields structured perceptron sequence tagging structured mira voting boosting bagging adaboost base learner stacking ensemble learner geographical information venues products news agi contextualized information actionable information query log search information architecture findwise italian swedish sentistrength cyberemotions query logs big textual data stefan th. gries crisis analysis customer analytics actionable intelligence r information discovery hadhoop business intelligence strata job title professional profile semantic-oriented applications affective states natural language processing affect regression hypothesis class type of machine learning reinforcement learning supervised learning definition of machine learning classification empirical error classification in nlp cross-validation types of classification unsupervised learning generalization model assessment statistical methods and natural language processin theorems of probability sample spaces independence and incompatibility notion of probability video lectures flip teaching lab sessions boostrap resampling cascading ensemble recorded future gavagai cross-lingual learning part-of-speech tagging multilingual learning linguistic structure prediction incomplete supervision latent-variable model indirect supervision ambiguous supervision meetups named-entity recognition partial supervision multilinguality structured prediction computational lexical semantics representation of meaning topic sentence academic writing critical thinking argumentation peer reviewing job learning outcomes zellig harris ppmi cosine metric ner named entity recognition standard evaluation per token sequence classifier information extraction sequence labeling e-discovery calendaring standard evaluation per entity word shapes ir-based approaches knowledge-based approaches ibm's watson complex questions answer type taxonomy apple's siri mrr factoid questions ir-based question answering mean reciprocal rank wolframalpha hybrid approaches passage retrieval narrative questions distant supervision knowledge graph relation extractors dbpedia hyponymy corpus lesk word similarity word relatedness graph-based methods wsd thesaurus-based methods resnik method lin method semcor dictionary-based methods surprisal supervised methods lesk algorithm path-based similarity michael lesk elesk word sense disambiguation extended lesk simplified lesk information content term-context matrix dot product marginals john rupert firth pmi cosine similarity measure joint probability vectors positive pointwise mutual information distributional models quantitative metrics compactness running time context-preserving word cloud visualisation cpewcv inflate and push realized adjacencies area utilization aspect ratio folksonomy social tagging automatic folksonomy construction cycle cover star forest distortion readability swedish-umeå corpus (suc) unsupervised machine learning agglomerative hierarchical clustering ward’s linkage domain ecare web corpus lay-specialized sublanguage corpus quality terminology extraction domainhood burstiness log-likelihood kullback– leibler divergence mann-withney-wilcoxon test unsupervised learning from the web freebase databases of relations hand-written patterns ace bootstrapping abstracting topic signature-based content selection rouge recall oriented understudy for gisting evaluation extractive summarization snippets summarization in question answering abstractive summarization query-focused summarization unsupervised content selection single vs. multiple documents shared semantic annotation dls tags web 3.0 shared understanding ontology tree of porphyry webprotege relations classes ontology learning sparql iri seam carving induction pipeline f-measure leave one out parameters recall stratification confusion matrix hyperparameters accuracy precision test set development set expected loss empirical error induction greediness inductive bias of the decision tree loss function suprisal constructing decision trees machine leaning attribute selection confidence interval standard error inferential statistics multiplier interval estimation confidence level z critical value confidence interval for proportion confidence interval for the mean roc curves scalable platform cheating hybrid teaching/learning model plagiarism deduction machine learning models generalization underfitting training data learning algorithms overfitting inference algorithms elements of machine learning test data concepts missing data attributes sample normal distribution features population outliers mean median measures of dispersion data instances arff format mode sparse data measures of central tendency semantic role labeling sentiment lexica scherer typology mutual information turney algorithm affetctive meaning emotion classification connotational aspects sentiwordnet likelihood sentiment mining sentiment lexicons semi-supervised methods learning sentiment lexicons general inquirer manually-built sentiment lexicons word senses homonymy hypernymy senseval membership meronymy lemma polysemy antonomy babelnet part-whole meronymy synonmy wordform metonymy meronymy zeugma test occam's razor k-statistic lift charts cost-sensitive measures loss functon recall-precision curves t-test counting the cost multiclass classification real-world implementations holdout estimation representation unbalanced data theoretical modelling bootstrap leave-one-out logic and language denotation formal theories logic meaning representation first-order logic predicate logic computational semantcs. connotation propositional logic semantic role labelling framenet propbank shallow semantics shallow semantic representation kendall correlation coefficient

Actividad
Acerca de

Marina Santini

Presentaciones

Uppsala uni 4march2011

CityTimes

Towards Contextualized Information: How Automatic Genre Identification Can Help

SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence

How Emotional Are Users' Needs? Emotion in Query Logs

Text analytics and R - Open Question: is it a good match?

Lecture 01: Machine Learning for Language Technology - Introduction

Lecture 02: Machine Learning for Language Technology - Decision Trees and Nearest Neighbors

Lecture 03: Machine Learning for Language Technology - Linear Classifiers

Lecture 4: The Weka Package

Lecture 5: Structured Prediction

Lecture 6: Ensemble Methods

Lecture 7: Learning from Massive Datasets

Lecture 1: Semantic Analysis in Language Technology

Lecture 2: Introduction to the Essay Assignment

Lecture 2: Job Opportunities

Lecture 2: From Semantics To Semantic-Oriented Applications

Lecture 3: Structuring Unstructured Texts Through Sentiment Analysis

Lecture 2 Basic Concepts in Machine Learning for Language Technology

Lecture 3 Probability Theory

Lecture 1 introduction To The Course: The Flipped Classroom

Lecture 4: Statistical Inference

Lecture 5: Bayesian Classification

Lecture 6: Hidden Variables and Expectation-Maximization

Lecture 7: Hidden Markov Models (HMMs)

Lecture 8: Decision Trees & k-Nearest Neighbors

Lecture 9 Perceptron

Lecture 10: SVM and MIRA

Lecture11 logistic regression

Lecture 2: Computational Semantics

Documentos

Towards a Quality Assessment of Web Corpora for Language Technology Applications

Recomendaciones

Il Booktrailer

Analytics Education in the era of Big Data

Evaluating Search Engines