学习笔记_位置编码

给输入序列注入“位置信息”，让模型知道“每个元素在什么位置”。
位置编码的两大类型

类型	特点	代表
固定位置编码（Fixed）	位置编码是预定义的，不可学习	原始 Transformer 的正弦编码
可学习位置编码（Learned）	位置编码是可训练的参数，就是字典啦，tokenizer把文本变成数字编码之后做的	BERT、ViT 的 `position embedding`

正弦位置编码（Sinusoidal Positional Encoding）

这是 原始 Transformer 论文（“Attention is All You Need”, 2017）中提出的方法。
核心思想：使用正弦和余弦函数生成位置编码。编码是确定性的、固定的，不参与训练。可以表示任意长度的位置，外推性好

import torch
import torch.nn as nn
import math

class SinusoidalPositionalEncoding(nn.Module):
    def __init__(self, d_model, max_len=512):
        super().__init__()
        pe = torch.zeros(max_len, d_model)
        position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
        div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model))
        
        pe[:, 0::2] = torch.sin(position * div_term)
        pe[:, 1::2] = torch.cos(position * div_term)
        
        pe = pe.unsqueeze(0)  # (1, max_len, d_model)
        self.register_buffer('pe', pe)

    def forward(self, x):
        # x: (B, seq_len, d_model)
        x = x + self.pe[:, :x.size(1)]
        return x

编码公式

可学习位置编码（Learned Positional Embedding）

这是 BERT、ViT、GPT 等模型 采用的方法。位置编码是可学习的参数矩阵。每个位置对应一个向量（类似词嵌入）。在大多数任务上优于正弦编码，超出训练长度时性能急剧下降

class LearnedPositionalEncoding(nn.Module):
    def __init__(self, d_model, max_len=512):
        super().__init__()
        self.pos_embedding = nn.Embedding(max_len, d_model)

    def forward(self, x):
        # x: (B, seq_len, d_model)
        positions = torch.arange(x.size(1), device=x.device).unsqueeze(0)
        return x + self.pos_embedding(positions)

旋转位置编码 **（Rotary Position Embedding, RoPE）

将位置信息编码为旋转矩阵，通过旋转向量来体现位置差异。代表：LLaMA、ChatGLM、PaLM