Chapter 2: Tensors Basics¶
📦 Learning Objectives
- Understand tensors and their properties
- Create tensors in various ways
- Manipulate tensor shapes and types
- Work with CPU and GPU tensors
What are Tensors?¶
Tensors are multi-dimensional arrays that are the fundamental building blocks of PyTorch. They are similar to NumPy arrays but with additional capabilities:
- GPU acceleration
- Automatic differentiation (autograd)
- Optimized for deep learning operations
Tensors vs NumPy Arrays
While tensors are similar to NumPy arrays, they have key advantages: automatic differentiation for gradients, GPU acceleration, and optimized operations for deep learning. You can easily convert between them using torch.from_numpy() and .numpy().
Understanding Dimensions
Think of tensor dimensions as nested lists. A 2D tensor is like a matrix (rows × columns), a 3D tensor is like a stack of matrices, and so on. The first dimension is often the batch size in deep learning.
Tensor Dimensions¶
| Dimension | Name | Example Shape | Use Case |
|---|---|---|---|
| 0D | Scalar | () |
Single value |
| 1D | Vector | (n,) |
Features, time series |
| 2D | Matrix | (n, m) |
Grayscale image, tabular data |
| 3D | 3D Tensor | (n, m, k) |
RGB image, sequence data |
| 4D | 4D Tensor | (batch, channels, height, width) |
Batch of images |
| 5D+ | nD Tensor | (batch, time, channels, height, width) |
Video data |
Creating Tensors¶
From Python Lists/Tuples¶
import torch
# 1D tensor from list
x = torch.tensor([1, 2, 3, 4, 5])
print(f"1D tensor: {x}")
print(f"Shape: {x.shape}")
# 2D tensor from nested list
matrix = torch.tensor([[1, 2, 3],
[4, 5, 6]])
print(f"\n2D tensor:\n{matrix}")
print(f"Shape: {matrix.shape}")
# 3D tensor
tensor_3d = torch.tensor([[[1, 2], [3, 4]],
[[5, 6], [7, 8]]])
print(f"\n3D tensor:\n{tensor_3d}")
print(f"Shape: {tensor_3d.shape}")
Output:
1D tensor: tensor([1, 2, 3, 4, 5])
Shape: torch.Size([5])
2D tensor:
tensor([[1, 2, 3],
[4, 5, 6]])
Shape: torch.Size([2, 3])
3D tensor:
tensor([[[1, 2],
[3, 4]],
[[5, 6],
[7, 8]]])
Shape: torch.Size([2, 2, 2])
From NumPy Arrays¶
import numpy as np
import torch
# NumPy array to tensor
np_array = np.array([[1, 2, 3], [4, 5, 6]])
tensor = torch.from_numpy(np_array)
print(f"From NumPy:\n{tensor}")
# Tensor to NumPy array
back_to_numpy = tensor.numpy()
print(f"Back to NumPy:\n{back_to_numpy}")
# Note: They share memory!
np_array[0, 0] = 100
print(f"After modifying NumPy:\n{tensor}") # Tensor also changed!
Initialization Functions¶
import torch
# Zeros
zeros = torch.zeros(3, 4)
print(f"Zeros:\n{zeros}")
# Ones
ones = torch.ones(2, 3)
print(f"\nOnes:\n{ones}")
# Identity matrix
identity = torch.eye(3)
print(f"\nIdentity:\n{identity}")
# Random values [0, 1) - uniform distribution
rand_uniform = torch.rand(2, 3)
print(f"\nRandom uniform:\n{rand_uniform}")
# Random values - standard normal distribution
rand_normal = torch.randn(2, 3)
print(f"\nRandom normal:\n{rand_normal}")
# Random integers
rand_int = torch.randint(0, 10, (3, 3))
print(f"\nRandom integers:\n{rand_int}")
# Full (constant value)
full = torch.full((2, 3), 7.5)
print(f"\nFull:\n{full}")
# Arange
arange = torch.arange(0, 10, 2) # start, end, step
print(f"\nArange: {arange}")
# Linspace
linspace = torch.linspace(0, 1, 5) # start, end, steps
print(f"Linspace: {linspace}")
Creating Tensors Like Others¶
import torch
x = torch.tensor([[1, 2], [3, 4]])
# Zeros like x
zeros_like = torch.zeros_like(x)
print(f"Zeros like:\n{zeros_like}")
# Ones like x
ones_like = torch.ones_like(x)
print(f"Ones like:\n{ones_like}")
# Random like x
rand_like = torch.rand_like(x.float())
print(f"Random like:\n{rand_like}")
Tensor Properties¶
Data Types (dtypes)¶
import torch
# Different data types
int_tensor = torch.tensor([1, 2, 3], dtype=torch.int32)
float_tensor = torch.tensor([1.0, 2.0, 3.0], dtype=torch.float32)
double_tensor = torch.tensor([1.0, 2.0, 3.0], dtype=torch.float64)
bool_tensor = torch.tensor([True, False, True], dtype=torch.bool)
Choosing Data Types
Use float32 for most deep learning tasks - it's the default and provides good precision with memory efficiency. Use float64 only when you need extra precision. Use float16 or bfloat16 for memory-constrained scenarios or mixed precision training.
Memory Considerations
float32 uses 4 bytes per element, while float64 uses 8 bytes. For large tensors, this difference can be significant. Most neural networks work fine with float32.
print(f"Int32: {int_tensor.dtype}") print(f"Float32: {float_tensor.dtype}") print(f"Float64: {double_tensor.dtype}") print(f"Bool: {bool_tensor.dtype}")
Default dtype¶
default = torch.tensor([1.0, 2.0]) print(f"Default dtype: {default.dtype}") # float32
**Common Data Types:**
| PyTorch dtype | Python type | Description |
|---------------|-------------|-------------|
| `torch.float32` or `torch.float` | `float` | 32-bit floating point |
| `torch.float64` or `torch.double` | `float` | 64-bit floating point |
| `torch.float16` or `torch.half` | - | 16-bit floating point |
| `torch.int32` or `torch.int` | `int` | 32-bit integer |
| `torch.int64` or `torch.long` | `int` | 64-bit integer |
| `torch.int16` or `torch.short` | - | 16-bit integer |
| `torch.int8` | - | 8-bit integer |
| `torch.uint8` | - | 8-bit unsigned integer |
| `torch.bool` | `bool` | Boolean |
### Type Conversion
```python
import torch
x = torch.tensor([1, 2, 3])
print(f"Original dtype: {x.dtype}")
# Convert to different types
x_float = x.float() # or x.to(torch.float32)
x_double = x.double()
x_long = x.long()
print(f"Float: {x_float.dtype}")
print(f"Double: {x_double.dtype}")
print(f"Long: {x_long.dtype}")
# Using .to()
x_half = x.to(torch.float16)
print(f"Half: {x_half.dtype}")
Device (CPU vs GPU)¶
import torch
# Check CUDA availability
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA device count: {torch.cuda.device_count()}")
Device Management Best Practice
Always use the device-agnostic pattern: device = torch.device('cuda' if torch.cuda.is_available() else 'cpu'). This makes your code work on both CPU and GPU without modification.
Device Mismatch
Operations between tensors on different devices will fail. Always ensure tensors are on the same device before operations. Use .to(device) to move tensors.
Create tensor on CPU¶
cpu_tensor = torch.tensor([1, 2, 3]) print(f"Device: {cpu_tensor.device}")
Move to GPU (if available)¶
if torch.cuda.is_available(): gpu_tensor = cpu_tensor.to('cuda') print(f"GPU Device: {gpu_tensor.device}")
# Or create directly on GPU
gpu_tensor2 = torch.tensor([1, 2, 3], device='cuda')
print(f"Created on GPU: {gpu_tensor2.device}")
# Move back to CPU
back_to_cpu = gpu_tensor.to('cpu')
print(f"Back to CPU: {back_to_cpu.device}")
Device-agnostic code¶
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') tensor = torch.tensor([1, 2, 3]).to(device) print(f"Using device: {device}")
### Shape and Size
```python
import torch
x = torch.randn(2, 3, 4)
# Get shape
print(f"Shape: {x.shape}") # torch.Size([2, 3, 4])
print(f"Size: {x.size()}") # Same as shape
# Get specific dimension
print(f"Dimension 0: {x.shape[0]}")
print(f"Dimension 1: {x.size(1)}")
# Number of dimensions
print(f"Number of dimensions: {x.dim()}")
# Total number of elements
print(f"Total elements: {x.numel()}")
# Check if empty
empty_tensor = torch.tensor([])
print(f"Is empty: {empty_tensor.numel() == 0}")
Tensor Attributes Summary¶
import torch
x = torch.randn(3, 4, dtype=torch.float32, device='cpu')
# All important attributes
print(f"Tensor: {x}")
print(f"Shape: {x.shape}")
print(f"Size: {x.size()}")
print(f"Dtype: {x.dtype}")
print(f"Device: {x.device}")
print(f"Requires grad: {x.requires_grad}")
print(f"Is leaf: {x.is_leaf}")
print(f"Dimensions: {x.dim()}")
print(f"Number of elements: {x.numel()}")
Indexing and Slicing¶
Basic Indexing¶
import torch
x = torch.tensor([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]])
# Single element
print(f"Element [0, 0]: {x[0, 0]}")
print(f"Element [1, 2]: {x[1, 2]}")
# Entire row
print(f"First row: {x[0]}")
print(f"Last row: {x[-1]}")
# Entire column
print(f"First column: {x[:, 0]}")
print(f"Second column: {x[:, 1]}")
# Slicing
print(f"First 2 rows: \n{x[:2]}")
print(f"First 2 columns: \n{x[:, :2]}")
print(f"Submatrix: \n{x[1:, 2:]}")
Advanced Indexing¶
import torch
x = torch.arange(1, 13).reshape(3, 4)
print(f"Original:\n{x}")
# Boolean indexing
mask = x > 6
print(f"\nMask (x > 6):\n{mask}")
print(f"Elements > 6: {x[mask]}")
# Fancy indexing
rows = torch.tensor([0, 2])
cols = torch.tensor([1, 3])
print(f"\nElements at [0,1] and [2,3]: {x[rows, cols]}")
# Using lists
print(f"Rows 0 and 2:\n{x[[0, 2]]}")
Tensor Operations Preview¶
import torch
x = torch.tensor([[1, 2], [3, 4]], dtype=torch.float32)
y = torch.tensor([[5, 6], [7, 8]], dtype=torch.float32)
# Element-wise operations
print(f"x + y:\n{x + y}")
print(f"x * y:\n{x * y}")
print(f"x / y:\n{x / y}")
# Matrix multiplication
print(f"Matrix multiply:\n{x @ y}")
# or
print(f"torch.mm:\n{torch.mm(x, y)}")
# Transpose
print(f"Transpose:\n{x.T}")
Common Patterns¶
Creating Batches of Data¶
import torch
# Batch of 32 RGB images of size 224x224
batch_images = torch.randn(32, 3, 224, 224)
print(f"Batch shape: {batch_images.shape}")
# Shape: [batch_size, channels, height, width]
# Batch of sequences (NLP)
batch_sequences = torch.randn(16, 50, 300)
print(f"Sequence batch shape: {batch_sequences.shape}")
# Shape: [batch_size, sequence_length, embedding_dim]
Setting Random Seed¶
import torch
# For reproducibility
torch.manual_seed(42)
x1 = torch.rand(3, 3)
torch.manual_seed(42)
x2 = torch.rand(3, 3)
print(f"Same random values: {torch.all(x1 == x2)}")
# For CUDA
if torch.cuda.is_available():
torch.cuda.manual_seed(42)
torch.cuda.manual_seed_all(42) # For multi-GPU
Practice Exercises¶
Exercise 1: Create Tensors¶
# Create the following tensors:
# 1. A 3x3 matrix of zeros
# 2. A 2x4 matrix of ones with dtype float64
# 3. A random 5x5 matrix from standard normal distribution
# 4. A tensor from [0, 2, 4, 6, 8, 10]
# Solutions:
zeros = torch.zeros(3, 3)
ones = torch.ones(2, 4, dtype=torch.float64)
randn_matrix = torch.randn(5, 5)
even_numbers = torch.arange(0, 11, 2)
Exercise 2: Indexing¶
# Given tensor:
x = torch.arange(1, 25).reshape(4, 6)
# Tasks:
# 1. Extract the first row
# 2. Extract the last column
# 3. Extract the 2x2 submatrix from center
# 4. Extract all elements > 15
# Solutions:
first_row = x[0]
last_col = x[:, -1]
center = x[1:3, 2:4]
greater_15 = x[x > 15]
Next Steps¶
Continue to Chapter 3: Tensor Operations to learn about: - Mathematical operations - Reshaping and manipulation - Broadcasting - Reduction operations
Key Takeaways¶
- ✅ Tensors are multi-dimensional arrays optimized for deep learning
- ✅ Can create tensors from lists, NumPy arrays, or initialization functions
- ✅ Important attributes: shape, dtype, device, requires_grad
- ✅ Support indexing and slicing similar to NumPy
- ✅ Can move tensors between CPU and GPU with
.to(device)
Recommended Reads¶
📚 Official Documentation
- PyTorch Tensors - Complete tensor API reference
- Tensor Creation - All tensor creation functions
- Tensor Attributes - Understanding tensor properties
- Indexing and Slicing - Advanced indexing operations
📖 Essential Articles
- Understanding PyTorch Tensors - Official tensor tutorial
- Tensors vs NumPy Arrays - Comparison guide
- Tensor Data Types - Choosing the right dtype
- GPU Tensors - Working with CUDA tensors
🎓 Learning Resources
- NumPy to PyTorch - Transition guide
- Tensor Operations Cheat Sheet - Quick reference
- Advanced Indexing - Complex indexing patterns
💡 Best Practices
- Memory Management - Efficient tensor memory usage
- Tensor Performance Tips - Optimization strategies
- Avoiding Common Tensor Errors - Broadcasting pitfalls
🔬 Research Papers
- Array Programming with NumPy - Understanding array operations
- GPU Computing - CUDA programming concepts