eduroam-prg-og-1-28-119.net.univ-paris-diderot.fr 2025-10-7:15:13:14

2025-10-07 15:13:14 +02:00
parent 791ea1a8b4
commit 893b8a566a
24 changed files with 93 additions and 34 deletions
--- a/NLP.md
+++ b/NLP.md
@@ -0,0 +1,24 @@
+---
+up:
+  - "[[M1 LOGOS]]"
+tags:
+  - s/fac
+  - s/informatique
+aliases:
+---
+
+# Vocabulary
+
+$\underbrace{(x_1, x_2, \dots, x_{n})}_{\text{vector of length } n} \in \mathbb{R}^{n}$
+
+$x_{i} \in \mathbb{R}$ is a scalar
+
+one-hot : boolean vector with all zeroes but one value. Usefull if each dimension represents a word of the vocabulary
+
+BOW : Bag Of Words
+You could represent sentences like that :
+Let our vocabulary be : `V = 'le' 'un' 'garcon' 'lit' 'livre' 'regarde'`
+Then "le garcon lit le livre" would be written by counting the number of occurences of each word of the sentence in a vector, so `2 0 1 1 1 0` (the formula is `sentence +⌿⍤(∘.≡) vocabulary`)
+
+$\cos(u, v) = \frac{u\cdot v}{\|u\| \| v\|}$
+