Perception

Perception is the process of transforming raw sensor data into structured and meaningful information that a robot can understand and use for decision-making.

While sensors provide raw inputs (images, point clouds, IMU data), perception algorithms extract semantics, such as objects, maps, and the robot’s position.

1. Mapping

Goal: Build a map of the environment while tracking robot motion
Input: Sensor data (LiDAR / Camera)
Output:
- 2D Occupancy Grid Map
- 3D Point Cloud Map
Example:
- 2D SLAM → slam_toolbox

3D SLAM → LiDAR-based or Visual SLAM

2. Localization

Goal: Estimate the robot’s position and orientation in the environment
Input: LiDAR / Camera / IMU
Output: Robot pose (x, y, z, roll, pitch, yaw)
Common Approaches:
- LiDAR-based localization (e.g., NDT, AMCL)
- Visual SLAM (e.g., ORB-SLAM, Isaac ROS Visual SLAM)
- Sensor fusion (e.g., EKF with IMU)
ROS 2 Topics Example:
```
/localization/pose
/tf
```

3. Object Detection

Goal: Identify and classify objects in the environment
Input: Camera images
Output: Bounding boxes, class labels, confidence scores
Example:
- YOLO (e.g., Isaac ROS YOLOv8)
- Detect people, obstacles, pallets, etc.

Next Step: The processed information is passed to the Planning module, where the robot decides how to act.

1. Mapping​

2. Localization​

3. Object Detection​

1. Mapping

2. Localization

3. Object Detection