Author | Zheng Jianhua
1
Storage management methods of different Tensor types
The storage of Lazy Tensor is managed by objects such as Runtime and Actor. After the static graph is compiled, how many objects and how much storage space are needed are determined. Runtime, etc. will allocate storage when initializing, and reclaim resources when exiting.
In Eager mode, Global Tensor can be regarded as a distributed encapsulation of Local Tensor, and the local data of EagerGlobalTensorImpl is a
EagerLocalTensorImpl object. You can understand the storage management of tensor in eager mode by examining EagerLocalTensorImpl.
The sample code for reference is as follows:
import numpy as np
import oneflow as flow
a = np.random.randn(1, 4)
flow.tensor(a, device=flow.device("cpu"), dtype=flow.float)
2
Tensor stores relationships of related classes
The storage-related class relationship of EagerLocalTensorImpl is as follows.
Follow up the execution process of the sample code to see when and how the objects in the diagram are constructed, who holds the storage, and how it is allocated and released.
3
Allocate storage for Tensor through virtual machine instructions
The tensor constructor is registered as PyTensorObject_init through the Python C API, by functional::_legacy_tensor_ctor
Forward according to the signature.
The sample code corresponds to TensorWithDataFunctor
, call MakeLocalTensorFromData to construct tensor, and allocate storage by calling functional::Empty and EmptyFunctor in this function. Store the relevant attributes in attrs in EmptyFunctor, and then call OpInterpUtil::Dispatch to allocate storage during the execution preparation process of the vm instruction.
The tensor returned by EmptyFunctor is an object with only storage space and no data. The data is copied later by CopyLocalTensorFromUntypedArray
Finish.
3.1 Construction of storage related objects
Because it is a local tensor in eager mode, OpInterpUtil::Dispatch will be forwarded to NaiveInterpret for execution. For the example code, the input parameters of this function are as follows:
-
inputs is an empty array
-
outputs has only one element and is a null pointer
Because the tensor pointers in outputs are all empty, you need to create an EagerLocalTensorImpl object whose one::TensorStorage member variable is a null pointer.
Because the elements in output_eager_blob_objects have not been initialized, tensor_impl->InitEagerBlobObject will be called
to initialize. Since tensor_storage_ is still empty, the process does the following:
-
Create a vm::TensorStorage object
-
Create an EagerBlobObject object
-
set_eager_blob_object
-
UpdateTensorStorage
-
Create a one::TensorStorage object
-
Set the callback function for tensor storage release
-
-
The creation of the above objects only records relevant information, and does not involve the storage allocation of tensor.
It should be noted that the callback function registered to one::TensorStorage is assigned to the member variable releaser_hook_, and this function will release the tensor through the virtual machine instruction.
3.2 Allocating tensor storage during instruction execution
The process of allocating tensor storage is as follows:
-
vm::Instruction::Compute
-
vm::InstructionPolicy::ComputeIf
-
vm::OpCallInstructionPolicy::Compute
-
OpCallInstructionUtil::Compute
-
get memory allocator
-
OpCallInstructionUtil::AllocateOutputBlobsMemory
-
blob_object->TryAllocateBlobBodyMemory
-
allocator->Allocate
In EagerBlobObject::TryAllocateBlobBodyMemory, the storage address allocated by the allocator will be assigned to dptr, and the storage address dptr and the Free function will construct a smart pointer and assign it to the blob_dptr_ variable of vm::TensorStorage.
4
Release Tensor storage through virtual machine instructions
As mentioned in the previous section 3.1, EagerLocalTensorImpl will set a callback function to release tensor while initializing EagerBlobObject and creating one::TensorStorage. The callback function is stored in the variable releaser_hook_
, this callback function is called when one::TensorStorage is destructed. Putting this information together, one::TensorStorage will perform the following operations when it is destructed:
vm::InstructionList instruction_list;
InstructionsBuilder instructions_builder(&instruction_list);
// JUST(Build(&instructions_builder));
if (eager_blob_object->producer_stream().has_value()) {
JUST(instructions_builder->ReleaseTensor(eager_blob_object));
}
JUST(vm::Run(instructions_builder.mut_instruction_list()));
In InstructionsBuilder::ReleaseTensor, if other streams have recently used eager_blob_object, they will be synchronized through SoftSyncStreamBetween. In this way, the storage dependency problem is solved.
Under normal circumstances, the storage is released through the producer_stream of tensor, and the corresponding vm::Stream object is obtained according to this object, and the instruction instruction (including eager_blob_object and vm_stream) is constructed accordingly. The instruction type corresponding to the sample code is FastReleaseTensorInstructionPolicy, and its Compute method executes the specific The storage release logic, the process is as follows:
-
ReleaseTensorInstructionPolicy::Release()
-
eager_blob_object->DeallocateBlobDataPtr()
-
tensor_storage_->Release()
-
tensor_storage_->_Release()
-
blob_dptr_.reset()
-
The smart pointer is reset, and the Free method specified when allocating storage is called
-
5
Storage management for scenarios such as reshape
In scenarios such as reshape, slice, and transpose, the parameters of the EagerLocalTensorImpl constructor called include input tensor_storage, so the tensor_storage_ variable of this tensor is not empty. When InitEagerBlobObject is executed, only EagerBlobObject is created to provide information such as shape and stride; but not One::TensorStorage will be created again, but the storage of input will be reused.
6
Can two TensorStorage types be merged?
Why is the callback function saved by one::TensorStorage triggered to release the storage in vm::TensorStorage when it is destructed?
one::TensorStorage only has one more releaser, can these two Storage types be merged?
Under the current design, the two types cannot be merged. Because one::TensorStorage::releaser_hook_ holds the smart pointer of EagerBlobObject, EagerBlobObject also holds the smart pointer of vm::TensorStorage. If two Storage types are merged into one, there will be a circular reference, and the object cannot be destructed, resulting in a memory leak.
Therefore, vm::TensorStorage is just a simple storage that can be shared between multiple tensors. EagerBlobObject includes both storage and unique object information such as shape, stride, and data_type. And one::TensorStorage is introduced to avoid circular references and is responsible for releasing storage.
7
appendix
GDB breakpoint example
break oneflow::one::MakeLocalTensorFromData
break oneflow::one::NaiveInterpret
break oneflow::vm::VirtualMachineEngine::DispatchInstruction
break oneflow::vm::OpCallInstructionUtil::Compute
break oneflow::vm::OpCallInstructionUtil::AllocateOutputBlobsMemory
break oneflow::vm::EagerBlobObject::TryAllocateBlobBodyMemory
break oneflow::vm::ReleaseTensorInstructionPolicy::Release
break oneflow/core/eager/eager_blob_object.cpp:107
References
-
OneFlow(https://github.com/Oneflow-Inc/oneflow/tree/b51cb72430619f6088e47bbb8b8226f37299573a)
-
OneFlow source code analysis: Tensor type system and Local Tensor
everyone else is watching
-
Chasing ChatGPT madly: the "replacement" craze in the open source community
-
Google Scientists: Evolution and Limitations of ChatGPT Secret Weapon
-
Faster than fast, open source Stable Diffusion refreshes the drawing speed
-
OneEmbedding: Training a TB-level recommendation model with a single card is not a dream
-
GLM training acceleration: up to 3 times performance improvement, 1/3 memory saving
Welcome Star, Try OneFlow: github.com/Oneflow-Inc/oneflow/ http://github.com/Oneflow-Inc/oneflow/