2020-05-11pytorch自定义求导

作者: lzjngu | 来源:发表于2020-05-11 23:56 被阅读0次

2020-05-11pytorch自定义求导
【TensorFlow2】自定义函数 & 自动求导
向量，矩阵，张量求导法则
高阶求导公式
矩阵的导数运算
ceres solver 03 三种求导方式
矩阵求导
【转】（4）隐函数求导（第二章导数与微分）
tensor 自动求导
第十三天

属性（成员变量）
saved_tensors: 传给forward()的参数，在backward()中会用到。
needs_input_grad:长度为 :attr:num_inputs的bool元组，表示输出是否需要梯度。
可以用于优化反向过程的缓存。
num_inputs: 传给函数 :func:forward的参数的数量。
num_outputs: 函数 :func:forward返回的值的数目。
requires_grad: 布尔值，表示函数 :func:backward 是否永远不会被调用。

成员函数
forward()
forward()可以有任意多个输入、任意多个输出，但是输入和输出必须是tensor。
backward()
backward()的输入和输出的个数就是forward()函数的输出和输入的个数。其中，
backward()输入表示关于forward()输出的梯度(计算图中上一节点的梯度)，
backward()的输出表示关于forward()的输入的梯度。在输入不需要梯度时（通过查看needs_input_grad参数）
或者不可导时，可以返回None

forward的说明

1. 虽然说一个网络的输入是Variable形式，那么每个网络层的输出也是Variable形式。
但是，当自定义autograd时，在forward中，所有的Variable参数将会转成tensor！
因此在forward实际操作的对象是tensor。在传入forward前，
autograd engine会自动将Variable unpack成Tensor。
因此这里的input也是tensor.在forward中可以进行任意操作。
2. ctx是context，ctx.save_for_backward会将他们转换为tensor(Variable)形式。
也就是说, backward只对tensor(Variable)进行处理.
3. save_for_backward只能传入Variable或是Tensor的变量，如果是其他类型的，可以用
ctx.xyz = xyz，使其在backward中可以用。例如,ctx.constant = constant,
这里constant为常数,不能直接作为ctx.save_for_backward的参数.

backward说明

自动求导是根据每个op的backward创建的graph来进行的！
自动求导竟然是在backward的操作中创建计算图, 因此我们需要在backward中用全部用variable来操作，
而forward就没必要，forward只需要用tensor操作就可以。

y = x*w +b # 自己定义的LinearFunction
z = f(y)
下面的grad_output = dz/dy
根据复合函数求导法则:
1. dz/dx =  dz/dy * dy/dx = grad_output*dy/dx = grad_output*w
2. dz/dw =  dz/dy * dy/dw = grad_output*dy/dw = grad_output*x
3. dz/db = dz/dy * dy/db = grad_output*1

import torch.autograd.Function as Function
class LinearFunction(Function):
　  # 创建torch.autograd.Function类的一个子类
    # 必须是staticmethod
    @staticmethod
    # 第一个是ctx，第二个是input，其他是可选参数。
    # ctx在这里类似self，ctx的属性可以在backward中调用。
    # 自己定义的Function中的forward()方法，所有的Variable参数将会转成tensor！
    # 因此这里的input也是tensor．在传入forward前，
    # autograd engine会自动将Variable unpack成Tensor。
    def forward(ctx, input, weight, bias=None):
        print(type(input))
        ctx.save_for_backward(input, weight, bias) # 将Tensor转变为Variable保存到ctx中
        output = input.mm(weight.t())  # torch.t()方法，对2D tensor进行转置
        if bias is not None:
            output += bias.unsqueeze(0).expand_as(output)　＃unsqueeze(0) 扩展处第0维
            # expand_as(tensor)等价于expand(tensor.size()), 将原tensor按照新的size进行扩展
        return output

    @staticmethod
    def backward(ctx, grad_output): 
        # grad_output为反向传播上一级计算得到的梯度值
        input, weight, bias = ctx.saved_tensors
        grad_input = grad_weight = grad_bias = None
        # 分别代表输入,权值,偏置三者的梯度
        # 判断三者对应的Variable是否需要进行反向求导计算梯度
        if ctx.needs_input_grad[0]:
            grad_input = grad_output.mm(weight) # 复合函数求导，链式法则
        if ctx.needs_input_grad[1]:
            grad_weight = grad_output.t().mm(input)　# 复合函数求导，链式法则
        if bias is not None and ctx.needs_input_grad[2]:
            grad_bias = grad_output.sum(0).squeeze(0)

        return grad_input, grad_weight, grad_bias

#建议把新操作封装在一个函数中
def linear(input, weight, bias=None):
    # First braces create a Function object. Any arguments given here
    # will be passed to __init__. Second braces will invoke the __call__
    # operator, that will then use forward() to compute the result and
    # return it.
    return LinearFunction()(input, weight, bias)#调用forward()

# 或者使用apply方法并取个别名
linear = LinearFunction.apply

#检查实现的backward()是否正确
from torch.autograd import gradcheck
# gradchek takes a tuple of tensor as input, check if your gradient
# evaluated with these tensors are close enough to numerical
# approximations and returns True if they all verify this condition.
input = (Variable(torch.randn(20,20).double(), requires_grad=True),)
test = gradcheck(LinearFunction(), input, eps=1e-6, atol=1e-4)
print(test)  #　没问题的话输出True

# 这里定义一个乘以常数的操作(输入参数是Tensor)
class MulConstant(Function):
    @staticmethod
    def forward(ctx, tensor, constant):
        # ctx is a context object that can be used to stash information
        # for backward computation
        ctx.constant = constant
        return tensor * constant

    @staticmethod
    def backward(ctx, grad_output):
        # We return as many input gradients as there were arguments.
        # Gradients of non-Tensor arguments to forward must be None.
        # constant
        return grad_output * ctx.constant, None # 这里并没有涉及到Variable

# 用自己定义的Function来创建Module
import torch.nn as nn
class Linear(nn.Module):
    def __init__(self, input_features, output_features, bias=True):
        super(Linear, self).__init__()
        self.input_features = input_features
        self.output_features = output_features
        # nn.Parameter is a special kind of Variable, that will get
        # automatically registered as Module's parameter once it's assigned
        # 这个很重要！ Parameters是默认需要梯度的！
        self.weight = nn.Parameter(torch.Tensor(output_features, input_features))
        if bias:
            self.bias = nn.Parameter(torch.Tensor(output_features))
        else:
            # You should always register all possible parameters, but the
            # optional ones can be None if you want.
            self.register_parameter('bias', None)
        # Not a very smart way to initialize weights
        self.weight.data.uniform_(-0.1, 0.1)
        if bias is not None:
            self.bias.data.uniform_(-0.1, 0.1)
    def forward(self, input):
        # See the autograd section for explanation of what happens here.
        return LinearFunction.apply(input, self.weight, self.bias)
        # 或者　return LinearFunction()(input, self.weight, self.bias)

import torch
from torch.autograd import Variable


class MyReLU(torch.autograd.Function):
    """
    We can implement our own custom autograd Functions by subclassing
    torch.autograd.Function and implementing the forward and backward passes
    which operate on Tensors.
    """

    @staticmethod
    def forward(ctx, input):
        """
        In the forward pass we receive a Tensor containing the input and return
        a Tensor containing the output. ctx is a context object that can be used
        to stash information for backward computation. You can cache arbitrary
        objects for use in the backward pass using the ctx.save_for_backward method.
        """
        ctx.save_for_backward(input)
        return input.clamp(min=0)

    @staticmethod
    def backward(ctx, grad_output):
        """
        In the backward pass we receive a Tensor containing the gradient of the loss
        with respect to the output, and we need to compute the gradient of the loss
        with respect to the input.
        """
        input, = ctx.saved_tensors
        grad_input = grad_output.clone()
        grad_input[input < 0] = 0
        return grad_input


dtype = torch.FloatTensor
# dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold input and outputs, and wrap them in Variables.
x = Variable(torch.randn(N, D_in).type(dtype), requires_grad=False)
y = Variable(torch.randn(N, D_out).type(dtype), requires_grad=False)

# Create random Tensors for weights, and wrap them in Variables.
w1 = Variable(torch.randn(D_in, H).type(dtype), requires_grad=True)
w2 = Variable(torch.randn(H, D_out).type(dtype), requires_grad=True)

learning_rate = 1e-6
for t in range(500):
    # To apply our Function, we use Function.apply method. We alias this as 'relu'.
    relu = MyReLU.apply

    # Forward pass: compute predicted y using operations on Variables; we compute
    # ReLU using our custom autograd operation.
    y_pred = relu(x.mm(w1)).mm(w2)

    # Compute and print loss
    loss = (y_pred - y).pow(2).sum()
    print(t, loss.data[0])

    # Use autograd to compute the backward pass.
    loss.backward()

    # Update weights using gradient descent
    w1.data -= learning_rate * w1.grad.data
    w2.data -= learning_rate * w2.grad.data

    # Manually zero the gradients after updating weights
    w1.grad.data.zero_()
    w2.grad.data.zero_()

参考：
定义torch.autograd.Function的子类，自己定义某些操作，且定义反向求导函数
 Pytorch入门学习（八）-----自定义层的实现

2020-05-11pytorch自定义求导
官方网站 forward的说明 backward说明参考：定义torch.autograd.Function的子...
【TensorFlow2】自定义函数 & 自动求导
TensorFlow2.0对自定义函数自动求导 % Author: XuYihang 本段代码实现了自定义函数f对...
向量，矩阵，张量求导法则
向量，矩阵，张量求导向量对向量求导向量对矩阵求导矩阵对矩阵求导使用链式法则总结向量，矩阵，张量求导参考：htt...
高阶求导公式
莱布尼茨高阶求导：其他推导高阶求导：
矩阵的导数运算
1.矩阵对标量求导相当于每个元素求导 2.矩阵对列向量求导 3.矩阵对矩阵求导 4.标量对列向量求导 5.标量对...
ceres solver 03 三种求导方式
非线性优化涉及到对目标函数进行求导，从而迭代优化。Ceres Solver提供了三种求导方式：自动求导、数值求导和...
矩阵求导
深度学习-矩阵求导的坑闲话矩阵求导
【转】（4）隐函数求导（第二章导数与微分）
我们已经学习了反函数求导，复合函数求导，现在又来了个隐函数求导..... 在学隐函数求导前，我们要先知道什么是隐函...
tensor 自动求导
自动求导(autograd) 直接用张量定义的运算时无法求导的，自动求导功能由 autograde 模块提供。这...
第十三天
高数隐函数的求导和参数方程确定的函数的求导。隐函数求导（直接把y看成为x的函数，在原始公式中进行求导变化，*注...

2020-05-11pytorch自定义求导

相关文章

2020-05-11pytorch自定义求导

【TensorFlow2】自定义函数 & 自动求导

向量，矩阵，张量求导法则

高阶求导公式

矩阵的导数运算

ceres solver 03 三种求导方式

矩阵求导

【转】（4）隐函数求导（第二章导数与微分）

tensor 自动求导

第十三天

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读