โŒ

Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

is this appropriate trigram model?

import torch

import torch.nn.functional as F

words = open('names.txt', 'r').read().splitlines() #['emma', 'olivia', 'ava', 'isabella'...]

chars = ['.','a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

def stoi(string):
    """
    Converts a string to an integer using the chars list as the base.
    """
    result = 0
    base = len(chars)
    for char in string:
        result = result * base + chars.index(char)
    return result

def itos(num):
    """
    Converts an integer to a string using the chars list as the base.
    """
    result = []
    base = len(chars)
    while num > 0:
        result.append(chars[num % base])
        num //= base
    result.reverse()
    return ''.join(result)

xs = []
ys = []

for w in words:
    chs = ['.'] + list(w) + ['.']
    for ch1, ch2, ch3 in zip(chs,chs[1:],chs[2:]):
        xs.append(stoi(ch1+ch2))
        ys.append(stoi(ch3))

xs = torch.tensor(xs)
ys = torch.tensor(ys)

xenc = F.one_hot(xs, num_classes=729).float()
W = torch.randn((729,27), requires_grad=True)

for i in range(300):
    logits = xenc @ W
    counts = logits.exp()
    probs = counts / counts.sum(1, keepdims=True)
    loss = -probs[torch.arange(xs.nelement()), ys].log().mean()
    if i % 10 == 0:
        print(loss.item())

    W.grad = None
    loss.backward()

    W.data += -40 * W.grad

After I watched Andrej Karpathy's bigram video(https://www.youtube.com/watch?v=PaCmpygFfXo), I made this trigram model as an exercise. Is this a appropriate trigram model? Please tell me if there are some errors or bugs and improvements.

How to use safe RL with General constraints?

I have read a lot of articles about safe RL, and found that they are all methods to solve the CMDP framework, they are to reduce the cost of each step of action to set up a constraint to reduce the probability of unsafe action, but if the constraint is not the constraint of a single step of action, but the constraint of the entire trajectory of the action, it seems that there is no relevant article

I wonder if there's any way to solve a problem like this, or a similar scenario problem

โŒ
โŒ