For the moving objects with environmental sensors such as object tracking moving robot with audio and video sensors, environmental information acquired from sensors keep changing according to movements of objects. In such case, due to lack of adaptability and system complexity, conventional control schemes show limitations on control performance, and therefore, sensory-motor systems, which can intuitively respond to various types of environmental information, are desirable. And also, to improve the system robustness, it is desirable to fuse more than two types of sensory information simultaneously. In this paper, based on Braitenberg's model, we propose a sensory-motor based fusion system, which can trace the moving objects adaptively to environmental changes. With the nature of direct connecting structure, sensory-motor based fusion system can control each motor simultaneously, and the neural networks are used to fuse information from various types of sensors. And also, even if the system receives noisy information from one sensor, the system still robustly works with information from other sensors which compensates the noisy information through sensor fusion. In order to examine the performance, sensory-motor based fusion model is applied to object-tracking four-foot robot equipped with audio and video sensors. The experimental results show that the sensory-motor based fusion system can tract moving objects robustly with simpler control mechanism than model-based control approaches.