Domain centralization and cross-modal reinforcement learning for vision-based robotic manipulation
Abstract
Abstract: Vision-based robotic manipulation with deep learning method has achieved substantial advances in the field of automatic agriculture, which can be deployed and applied in the picking, sorting and transporting of agricultural products and so on. Deep reinforcement learning (DRL) is one of the learning-methods that help the robot learn the policy itself by exploration and exploitation. Training real robots with DRL would take a great price that limits its application scope. Some approaches train the DRL policy in simulation and deploy the model to real robot by transferring the images in simulator to that of the real world. However, this method requires pre-collected images as training data for each real scene. In this paper, a domain centralized approach is proposed as the sim-to-real perception module to capture the task-specific characteristics of the vision regardless of the reality gap between simulation and the real environments. Another challenge for vision-based manipulation is the learning difficulty caused by the high-dimensional vision input. Here we propose a cross-modal reinforcement learning scheme by leveraging the full system states to provide additional guidance. The experimental results show that the proposed method can perform a real robot grasping task without real-world data and outperforms current methods with the same experimental settings.
Keywords: sim-to-real, robotic manipulation, agricultural application, domain centralization, cross-modal learning
DOI:Â 10.33440/j.ijpaa.20200302.77
Â
Citation: Yang K, Zhang Z P, Cheng H, Wu H D, Guo Z Y. Domain centralization and cross-model reinforcement learning for vision-based robotic manipulation.  Int J Precis Agric Aviat, 2020; 3(2): 48–55.
Full Text:
PDFReferences
RodrÃguez F, Moreno J C, Sánchez J A, and Berenguel M. Grasping in Agriculture: State-of-the-Art and Main Characteristics. Mechanisms and Machine Science, chapter 15, 2013. doi: 10.1007/978-1-4471-4664-3_15.
Kentaro K, Masahiro K, and Tetsuo S. Development of the stem grasping and bioinformation measuring robot for the precision agriculture of the tomato. Hokurikushinetsu Branch, 2018.
Fountas S, Borja E G, Kasimati A, et al. The Future of Digital Agriculture: Technologies and Opportunities. IT Professional, 2020, 22(1): 24–28. doi: 10.1109/MITP.2019.2963412.
Bousmalis K, Irpan A, Wohlhart P, Bai Y F, Kelcey M, Kalakrishnan M, et al. Using simulation and domain adaptation to improve efficiency of deep robotic grasping. In 2018 IEEE International Conference on Robotics and Automation, 4243–4250, 2018. doi: 10.1109/ICRA.2018.8460875.
Spinello L and Arras K O. Leveraging rgb-d data: Adaptive fusion and domain adaptation for object detection. In 2012 IEEE International Conference on Robotics and Automation, pages 4469–4474. Citeseer, 2012. doi: 10.1109/ICRA.2012.6225137.
Tobin J, Fong R, Ray A, Schneider J, Zaremba W, and Abbeel P. Domain randomization for transferring deep neural networks from simulation to the real world. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 23–30, 2017. Doi: 10.1109/IROS.2017.8202133.
Tobin J, Biewald L, Duan R, et al. Domain Randomization and Generative Models for Robotic Grasping. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2018. doi: 10.1109/IROS.2018.8593933.
Robotics Online Marketing Team. Agricultural robots: Understanding professional service robots in agriculture, 2019. Available: https://www.robotics.org/blogarticle.cfm/Agricultural-Robots-UnderstandingProfessional-Service-Robots-in-Agriculture/151.
Pinto L, Andrychowicz M, Welinder P, Zaremba W, and Abbeel P. Asymmetric actor critic for image-based robot learning. Robotic: Science and Systems, 2018. doi: 10.15607/RSS.2018.XIV.008.
Wang Y, Patel A, Shen Y L, and Jin H X. A deep reinforcement learning based multimodal coaching model (dcm) for slot filling in spoken language understanding (slu). Proc. Interspeech 2018, pages 3444–3448, 2018. doi: 10.21437/Interspeech.2018-1379.
Sadeghi F, Toshev A, Jang E, and Levine S. Sim2real viewpoint invariant visual servoing by recurrent control. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4691–4699, 2018. doi: 10.1109/CVPR.2018.00493.
Mahler J, Liang J, Niyaz S, Laskey M, Doan R, Liu X Y, et al. Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics. In Robotics: Science and Systems, 2017. doi: 10.15607/RSS.2017.XIII.058.
Viereck U, Pas A T, Saenko K, and Platt R. Learning a visuomotor controller for real world robotic grasping using simulated depth images. In Conference on Robot Learning, pages 291–300, 2017.
James S, Wohlhart P, Kalakrishnan M, Kalashnikov D, Irpan A, Ibarz J, et al. Sim-to-real via sim-to-sim: Data-efficient robotic grasping via randomized-to-canonical adaptation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 12627–12637, 2019. doi: 10.1109/CVPR.2019.01291.
Pinto L and Gupta A. Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours. In 2016 IEEE International Conference on Robotics and Automation, pages 3406–3413, 2016. doi: 10.1109/ICRA.2016.7487517.
Levine S, Pastor P, Krizhevsky A, Ibarz J, and Quillen D. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. The International Journal of Robotics Research, 37(4-5):421–436, 2018. doi: 10.1007/978-3-319-50115-4_16.
Schulman J, Wolski F, Dhariwal P, Radford A, and Klimov O. Proximal
policy optimization algorithms. arXiv:1707.06347, 2017.
Kalashnikov D, Irpan A, Pastor P, Ibarz J, Herzog A, Jang E, et al. Qt-opt: Scalable deep reinforcement learning for vision-based robotic manipulation. arXiv:1806.10293, 2018.
Bousmalis K, Trigeorgis G, Silberman N, Krishnan D, and Erhan D. Domain separation networks. In Advances in neural information processing systems, pages 343–351, 2016.
Wulfmeier M, Posner I, and Abbeel P. Mutual alignment transfer learning. arXiv:1707.07907, 2017.
Chebotar Y, Handa A, Makoviychuk V, Macklin M, Issac J, Ratliff N, et al. Closing the sim-to-real loop: Adapting simulation randomization with real world experience. In 2019 International Conference on Robotics and Automation, pages 8973–8979. IEEE, 2019. doi: 10.1109/ICRA.2019.8793789.
Lillicrap T P, Hunt J J, Pritzel A, Heess N M O, Erez T, Tassa Y, et al. Continuous control with deep reinforcement learning. US Patent App. 15/217,758. doi: 10.1016/S1098-3015(10)67722-4.
Bojarski M, Testa D D, Dworakowski D, Firner B, Flepp B, Goyal P, et al. End to end learning for self-driving cars. arXiv:1604.07316, 2016.
Andrychowicz M, Wolski F, Ray A, Schneider J, Fong R, Welinder P, et al. Hindsight experience replay. In Advances in Neural Information Processing Systems, pages 5048– 5058, 2017.
Quillen D, Jang E, Nachum O, Finn C, Ibarz J, and Levine S. Deep reinforcement learning for vision-based robotic grasping: A simulated comparative evaluation of off-policy methods. In 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018. doi: 10.1109/ICRA.2018.8461039.
Zhu Y K, Wang Z Y, Merel J, Rusu A A, Erez T, Cabi S, et al. Reinforcement and imitation learning for diverse visuomotor skills. Robotics: Science and Systems XIV, abs/1802.09564, 2018. doi: 10.15607/RSS.2018.XIV.009.
Rohmer E, Singh S, and Freese M. V-rep: A versatile and scalable robot simulation framework. In 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1321–1326, 2013. doi: 10.1109/IROS.2013.6696520.
Kingma D P and Ba J. Adam: A Method for Stochastic Optimization[J]. Computer Science, 2014.
Singh A, Yang L, and Levine S. Gplac: Generalizing vision-based robotic skills using weakly labeled images. In Proceedings of the IEEE International Conference on Computer Vision, pages 5851– 5860, 2017. doi: 10.1109/ICCV.2017.623.
Johnson J, Alahi A, and Li F F. Perceptual losses for real-time style transfer and super resolution. In European conference on computer vision, pages 694–711. Springer, 2016. doi: 10.1007/978-3-319-46475-6_43.
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.