eduroam-prg-og-1-28-119.net.univ-paris-diderot.fr 2025-10-7:15:13:14
This commit is contained in:
24
M1 LOGOS . machine learning for NLP.md
Normal file
24
M1 LOGOS . machine learning for NLP.md
Normal file
@@ -0,0 +1,24 @@
|
||||
---
|
||||
up:
|
||||
- "[[M1 LOGOS]]"
|
||||
tags:
|
||||
- s/fac
|
||||
- s/informatique
|
||||
aliases:
|
||||
---
|
||||
|
||||
# Vocabulary
|
||||
|
||||
$\underbrace{(x_1, x_2, \dots, x_{n})}_{\text{vector of length } n} \in \mathbb{R}^{n}$
|
||||
|
||||
$x_{i} \in \mathbb{R}$ is a scalar
|
||||
|
||||
one-hot : boolean vector with all zeroes but one value. Usefull if each dimension represents a word of the vocabulary
|
||||
|
||||
BOW : Bag Of Words
|
||||
You could represent sentences like that :
|
||||
Let our vocabulary be : `V = 'le' 'un' 'garcon' 'lit' 'livre' 'regarde'`
|
||||
Then "le garcon lit le livre" would be written by counting the number of occurences of each word of the sentence in a vector, so `2 0 1 1 1 0` (the formula is `sentence +⌿⍤(∘.≡) vocabulary`)
|
||||
|
||||
$\cos(u, v) = \frac{u\cdot v}{\|u\| \| v\|}$
|
||||
|
Reference in New Issue
Block a user