Yu CHENG,Jiatong BAO,Yunyi JIA,Zhihui DENG,Zhiyong SUN,Sheng BI,Congjian LI,Ning XI?
1.Department of Electrical and Computer Engineering,Michigan State University,East Lansing,MI,48823,U.S.A.;
2.Department of Industrial and Manufacturing Systems Engineering,The University of Hong Kong,Hong Kong;
3.Department of Automotive Engineering,Clemson University,Greenville,South Carolina,29607,U.S.A.
Modeling robotic operations controlled by natural language
Yu CHENG1,2,Jiatong BAO1,Yunyi JIA3,Zhihui DENG1,Zhiyong SUN2,Sheng BI2,Congjian LI2,Ning XI2?
1.Department of Electrical and Computer Engineering,Michigan State University,East Lansing,MI,48823,U.S.A.;
2.Department of Industrial and Manufacturing Systems Engineering,The University of Hong Kong,Hong Kong;
3.Department of Automotive Engineering,Clemson University,Greenville,South Carolina,29607,U.S.A.
There are multiple ways to control a robotic system.Most of them require the users to have prior knowledge about robots or get trained before using them.Natural language based control attracts increasing attention due to its versatility and less requirements for users.Since natural language instructions from users cannot be understood by the robots directly,the linguistic input has to be processed into a formal representation which captures the task specification and removes the ambiguity inherent in natural language.For most of existing natural language controlled robotic system,they assume the given language instructions are already in correct orders.However,it is very likely for untrained users to give commands in a mixed order based on their direct observation and intuitive thinking.Simply following the order of the commands can lead to failures of tasks.To provide a remedy for the problem,we propose a novel framework named dependency relation matrix(DRM)to model and organize the semantic information extracted from language input,in order to figure out an executable sequence of subtasks for later execution.In addition,the proposed approach projects abstract language input and detailed sensory information into the same space,and uses the difference between the goal specification and temporal status of the task under implementation to monitor the progress of task execution.In this paper,we describe the DRM framework in detail,and illustrate the utility of this approach with experiment results.
Natural language control,task planning,human-machine interaction
Previously,robots are controlled through a set of predefined rules[1,2],but these methods are not designed for lay users.People have to possess relevant expertise or get trained before using the robots.To make robot control easier for untrained users,and the communication between robots and people more smooth,natural language have been used as the medium to control robots.One key step of natural language based robot control is to process the natural language commands into more formal and unambiguous representations such that robots are able to understand and implement the instructions.
Based on how the language commands are processed into executable action plans,existing methods can be divided into two categories.The first one translates linguistic commands into executable action plans based on a set of predefined rules distilled from people’s experience.Lauria etal.[3]use hand-built grammars to map navigational commands into predefined action templates.MacMahon et al.[4]define a set of logic rules to convert linguistic input into predicate-argument structure.Rybski etal.[5]propose a method to search keywords in the instructions that are in the form of conditional branch structure and then transform them into conditional structures similar to that of programming language.Cantrell etal.[6]propose a similar method.The difference is they map instructions with a set of templates instead of keyword searching.Kress-Gazit etal.[7,8]use linear temporal logic(LTL)formulas to represent language commands and then compute an automaton based on the LTL expressions to serve as the task controller.
The other type of translating language commands into executable action plans is by using data-driven approaches to learn the implicit rules from data instead of explicit ones defined by designers.The probability based methods differ in their probabilistic models(e.g.,linear model[9],hidden Markov model(HMM)[10],Markov logic network(MLN)[11],conditional random field(CRF)[12],etc.),formal representations(e.g.,predicateargument structure[13],lambda calculus[14],graphical representation[15],etc.),designed features for training the model,etc.
These two types of methods assume that natural language commands from instructors are either single sentence command or chunks of commands that are already in correct orders.However,they have not considered it is very likely that lay users would issue mixed order of instructions based on direct observation and intuitive thinking.Simply following the mixed order instructions will lead to failures of tasks.In order to solve this issue,we propose a framework to organize the subgoals of the assigned task according to the linguistic input and temporal sensory feedback.
In the proposed framework,we design a formal representation named DRM.It represents the language description of the task in a matrix form and each element in the matrix represents the logical relation between the two involved items/events.In addition,sensory information about the workspace can also be represented as DRM.By comparing the task DRM and initial DRM(the DRM captures the environment configuration before task starts),the proposed framework is able to figure out a feasible subtask sequence to complete the task;by comparing the task DRM and sensory DRM during the task execution,the planner is able to monitor the progress of the task.
There st of the paper is organized as follows.Section2 illustrates the proposed framework in detail.Section3 applies the proposed framework to an assembly task for further illustration and validates the method with experimental results.Finally,Section4 concludes the paper.
The overview of system framework is shown in Fig.1.The system first translates the given set of natural language instructions into DRM representation,then applies the proposed approach to organize the order of subtasks and sends it to the lower level for execution.Its main components are the task organization,task planner,and action planner.The roles of these components are in the following.
The task organization takes DRM from the natural language processing module as its input and outputs a sequence of final states corresponding to each subtask.The subtasks are organized into a feasible order according to the task specifications.For most natural language controlled robotic systems that mainly rely on the instructor to correctly order the execution steps,this task organization layer is missing,which makes the system less scalable to human robot collaboration applications.
The task planner and its following action planner are all realized by employing supervisory control[16].In fact,they form a hierarchical supervisory control structure[17]to avoid complexity issues that arise in system design and control due to combinatorial explosion[18].The supervisory controller contains a set of concatenation rules for actions.Each action is represented as a state change from preconditions to the postconditions.As a task planner,it generates action plans with respect to the given task according to the goal states(the output from task decomposition)and temporal states(extracted from temporal sensory information).The backward algorithm[19]is employed to search for a feasible action path with the least number of actions.If exceptions occurred during the task implementation,the abrupt state change will be reflected via state feedback and then a new plan will be replanned.In addition,when natural language instructions include new commands that are not stored in the lexicon,a notification of undefined mapping will be sent to the human user to invoke a learning subroutine.Following step by step instruction from the human user to conduct the unknown task,the initial states before the implementation and the final states after the implementation will be recorded.Comparing the two sets of state and eliminating the unchanged ones,the remaining two sets of state are considered to be preconditions and postconditions of the new skill,respectively[20].
Fig.1 Overview of proposed system framework.
The action planner contains complete action schemas for each primitive action in the supervisory controller.Each plan schema is realized by an automaton.All parameters of the plan are linked to input states from supervisory controller and sensory feedback.Plan schemas are guaranteed to be controllable,nonblocking[16],and asymptotically stable[21]at discrete event level.
The natural language commands are transformed into a n×n matrix.Each element in the matrix captures the dependency relation between two items or events.n is the number of items/events appeared in the natural language commands.The elements of a DRM are defined based on logical relations,i.e.,spatial relation or temporal relation.The value of the relation reflect the priority.For a DRM based on spatial relation,the element at the intersection of the ith row and jth column of the DRM can be defined as
While,for a DRM based on temporal relations,the element at the intersection of the ith row and jth column of the DRM can be defined as
The NLP module translates natural language commands or descriptions about the task into a set of states corresponding to the intermediate steps and organizes the states in a matrix form as illustrated before.The raw output contains only the relations appeared in the commands.To figure out the order of subtasks and track the task implementation process,we derive a goal DRM(GDRM)that contains all the dependency relations between each pair of items/events using the following rules:
where relation ∈ R,R={...,?2,1,0,1,2,...}depends on the relation definition.
At the same time,sensory information is processed and transformed into DRM representation in a similar way.The temporal DRM(TDRM)from sensory feedback plays two roles.First,before the robot executes the task,GDRM and initial TDRM are used to derive a feasible sequence of subtasks.Subtract GDRM by the initial TDRM,we have the error DRM,denoted as DRMein Fig.1.Its elements represent the difference between the initial configuration and goal configuration.
The smaller of the relation value,the higher priority of the corresponding subtask.Subtasks with higher priorities should be implemented earlier than others that have lower priorities.By summing up all the relation values for each item/event corresponding to other items/events,as represented by equation(6),we can obtain a vector and its values are used as criteria for subtask planning.The order of the subtasks can be obtained through sorting the elements of the vector in an ascending manner.
The second role of TDRM is to monitor the task execution.After completion of each subtask,the TDRM should converge to the GDRM.We use matrix 2-norm to measure the similarity between GDRM and TDRM,as shown by equation(7).
where gij=[GDRM]ij,tij=[TDRM]ij,and k∈N+.If some abrupt events occurred during the task implementation that cause ΔV=V(k)?V(k? 1)> 0,the system will stop executing the task for further diagnosis.
The proposed approach has been implemented and tested on a mobile manipulator as shown in Fig.2.It is comprised of a 7 degree-of-freedom(DOF)Schunk robot arm and a four wheeled Segway that provides mobility.The robotic system has a Kinect mounted on its left shoulder.It identifies objects in the working environment and sends their information back to the robot.The visual recognition system can detect object features such as color,3-D position,and dimensions(height,width,and length),as shown in Fig.3[22].The attribute information are also used for language grounding through perceptive feedback.
The natural language processing module processes natural language commands in two steps[13,23].Firstly,it parses the commands into grounded action frames with a combinatory categorial grammar(CCG)parser.An action frame includes the action and their roles.The roles are further represented by objects’features,which are captured by equipped sensors and transmitted to the robot through perceptive feedback.Then the grounded action frames are transformed into state representation with predicate-argument structure.For example,the task Put the bottle into the box can be represented as In(bottle,box).Then the generated states are organized into DRM representation.In this paper,we use typed commands instead of speech and skip the voice recognition module.
Fig.2 Overview of robotic platform.
Fig.3 Description of perceived workpieces through visual recognition system.
The simulated assembly task described in Fig.4 has been used to test the proposed method.Five parts of a product are placed on the worktable without overlaps.The robot is tasked to assemble a block castle as shown in the right subfigure of Fig.4.The list of states that used to represent the robot status and environment status are shown in Table 1.One set of natural language task descriptions from untrained users is shown as follows:
A blue workpiece is on top of the green one.The grey and yellow parts are on the red workpiece.The green workpiece is above the grey and yellow parts.
The language instructions are issued in a mixed order.
It is impossible for most existing natural language based control approaches to finish the task with that order.By adding the layer of task organization,it is possible for the system to figure out a executable execution plan for the task.
Fig.4 Initial setup and desired configuration of the task.
Table 1 List of states that used to represent the task and robot status.
As shown in the left subfigure of Fig.4,the state representation of initial setup is
where Widenotes the ith workpiece,I={1,2,3,4,5}.Following the DRM element definition and rules presented in equation(1),equation(3),and equation(4),the IDRM for this task is obtained:
where from the top to the bottom are the five objects denoted as from W1to W5by following the object index assigned by visual recognition system.Each row shows one object’s dependency relations with others.
From the natural language instructions,a partial set of goal state can be obtained:
Here,“partial”means direct translation of natural language can only lead to a subset of the goal representation.
The corresponding GDRM representation is
Subtracting GDRM by IDRM,we have the DRMe:
Applying the function f(X)defined in equation(6)to the DRMe,we get the vector that contains quantified priorities of subtasks:
Sorting the elements in an ascending manner,we will have the manipulation order of the blocks:
The temporal snapshots of the task execution are shown in Fig.5.
The task execution is monitored by using matrix two norm of the difference between goal DRM and temporal DRM.Fig.6 shows the plot of?DRMe?2during the task execution.
We also notice that the workpieces W2and W3have the same priority.This means the order of operating the two blocks will not impact the completion of the task.Since only one arm is equipped,we randomly select one block to manipulate first.The set of goal states corresponding to the subtask is then sent to the task planner.
Fig.5 Robot implementation of the assembly task.
Fig.6?DRMe?2during the task execution.
The proposed method projects both linguistic input from people and analog information from sensors into the same space.Language commands and sensory information are represented as goal and feedback to drive the system moving forward until the goal has reached.Since in the organization process each item/events is operated only once,for the above assembly test it assumes that initially the items should be scattered in the workspace without overlaps,or extra operations are required and the calculated order may fail.To adapt to situations like that,if initially there are overlaps,the whole task will be set to two phases.Phase I is to remove the overlap and phase II is to implement the job.
The proposed approach acts as an intermediate layer between natural language processing and task planner.It can also be integrated with other frameworks of natural language control to figure out a correct subtask order for their task planner modules.
In addition,DRM is compositional.It enables to represent complicated tasks from simple ones.Let’s think a more complex scenario:if the robot is tasked to firstly move to location A and implement a manipulation with parts a and b,then to move to location B and execute another job with parts c,d,and e.The task consists of four subtasks:two manipulation and two navigation.They can be represented as M1,M2,N1,N2,respectively.Modeling the task by using the DRM element definition from Equation 2 and the transitive property,the GDRM is
Initially,none of the subtasks has been implemented yet,i.e.,they do not have logical preference.The IDRM is built as
The following procedure is the same as presented in the last section.Obtain DRMeby subtracting IDRM from GDRM,and then apply function of equation(6)to get a return vector that has each subtask’s priority.
The order of the four subtasks is
Each subtask is modeled in a similar way as presented in the assembly task.Representing tasks hierarchically from simpler ones helps to alleviate the state and action explosion as the tasks become more complicated and to organize components of the task in a consistent way.In addition,it helps to model natural language controlled operations given in large set of instructions.
As discussed above,the proposed method is based on the assumption that the robot makes a difference to the physical environment.But for the applications presented in[10,24],the authors command a humanoid to perform body movements without changing its surroundings.The current DRM framework may not model that natural language controlled robotic operations well.
In this work,we propose a framework to solve the problem of language instructions giving in mixed orders.It represents dependency relations between each pair of items/events as elements of the dependency relation matrix,and reasons out a feasible and correct subtask sequence for task planner.This makes it suitable to be integrated into other natural language controlled frameworks to improve their scalability.Also,it puts less constraints on users.
Language commands and sensory information are the same in essence:they are used to describe the physical world configurations but they are different modalities.The proposed method projects both linguistic input and sensory input into the same space,which helps to monitor the task execution.
[1]T.Lozano-Perez.Robot programming.Proceedings of the IEEE,1983,71(7):821–841.
[2]G.Biggs,B.MacDonald.A survey of robot programming systems.Proceedings of the Australasian Conference on Robotics and Automation,Brisbane,Australia,2003.
[3]S.Lauria,G.Bugmann,T.Kyriacou,et al.Personal robot training via natural-language instructions.IEEE Intelligent Systems,2001,16(3):38–45.
[4]M.MacMahon,B.Stankiewicz,B.Kuipers.Walk the talk:Connecting language,knowledge,and action in route instructions.Proceedings of National Conference on Artificial intelligence,Austin:AAAI,2006:1475–1482.
[5]P.E.Rybski,J.Stolarz,K.Yoon,et al.Using dialog and human observations to dictate tasks to a learning robot assistant,Intelligent Service Robotics,2008,1(2):159–167.
[6]R.Cantrell,K.Talamadupula,P.Schermerhorn,et al.Tell me when and why to do it!Run-time planner model updates via natural language instruction.Proceedings of ACM/IEEE International Conference on Human-Robot Interaction,Boston:IEEE,2012:471–478.
[7]H.Kress-Gazit,G.E.Fainekos,G.J.Pappas.Temporal-logicbased reactive mission and motion planning.IEEE Transactions on Robotics,2009,25(6):1370–1381.
[8]C.Lignos,V.Raman,C.Finucane,et al.Provably correct reactive control from natural language.Autonomous Robots,2015,38(1):89–105.
[9]N.Shimizu,A.R.Haas.Learning to follow navigational route instructions.Proceedings of International Jont Conference on Artifical Intelligence,Pasadena:ACM,2009:1488–1493.
[10]W.Takano,I.Kusajima,Y.Nakamura.Generating action descriptions from statistically integrated representations of human motions and sentences.Neural Networks,2016,80(C):1–8.
[11]G.Lisca,D.Nyga,F.B′alint-Bencz′edi,et al.Towards robots conducting chemical experiments.Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems,Hamburg:IEEE,2015:5202–5208.
[12]D.K.Misra,J.Sung,K.Lee,et al.Tell me dave:Context-sensitive grounding of natural language to manipulation instructions.The International Journal of Robotics Research,2015,35(1–3):281–300.
[13]L.She,Y.Cheng,J.Y.Chai,et al.Teaching robots new actions through natural language instructions.Proceedings of IEEE International Symposium on Robot and Human Interactive Communication,Edinburgh:IEEE,2014:868–873.
[14]C.Matuszek,E.Herbst,L.Zettlemoyer,et al.Learning to parse natural language commands to a robot control system.Experimental Robotics,Heidelberg:Springer,2013:403–415.
[15]T.Kollar,S.Tellex,M.R.Walter,et al.Generalized grounding graphs:A probabilistic framework for understanding grounded language.Journal of Artificial Intelligence Research,2013:1–35.
[16]P.J.Ramadge,W.M.Wonham.The control of discrete event systems.Proceedings of the IEEE,1989,77(1):81–98.
[17]P.Hubbard,P.E.Caines.Dynamical consistency in hierarchical supervisory control.IEEE Transactions on Automatic Control,2002,47(1):37–52.
[18]M.Sampath,S.Lafortune,D.Teneketzis.Active diagnosis of discrete-event systems.IEEE Transactions on Automatic Control,1998,43(7):908–929.
[19]M.Ghallab,D.Nau,P.Traverso.Automated Planning:Theory&Practice.San Francisco:Elsevier,2004.
[20]Y.Cheng,J.Bao,Y.Jia,et al.Analytic approach for natural language based supervisory control of robotic manipulations.Proceedings of IEEE International Conference on Robotics and Biomimetics,Qingdao:IEEE,2016:331–336.
[21]K.M.Passino,A.N.Michel,P.J.Antsaklis.Lyapunov stability of a class of discrete event systems.IEEE Transactions on Automatic Control,1994,39(2):269–279.
[22]J.Bao,Y.Jia,Y.Cheng,N.Xi.Saliency-guided detection of unknown objects in rgb-d indoor scenes.Sensors.2015,15(9):21054–21074.
[23]L.She,S.Yang,Y.Cheng,et al.Back to the blocks world:Learning new actions through situated human-robot dialogue.Annual Meeting of the Special Interest Group on Discourse and Dialogue,Philadelphia:ACL,2014:89–97.
[24]W.Takano,M.Kanazawa,Y.Nakamura.Motion-language association model for human-robot communication.Experimental Robotics,Berlin:Springer,2014:17–30.
12 September 2017;revised 13 November 2017;accepted 13 November 2017
DOIhttps://doi.org/10.1007/s11768-017-7099-5
?Corresponding author.
E-mail:xining@hku.hk.Tel.:+852 28592593.
This paper is dedicated to Professor T.J.Tarn on the occasion of his 80th birthday.
This work was partially supported by the National Science Foundation(Nos.CNS-1320561,IIS-1208390),the U.S.Army Research Laboratory,and the U.S.Army Research Office(Nos.W911NF-11-D-0001,W911NF-09-1-0321,W911NF-10-1-0358,W911NF-14-1-0327,W911NF-16-1-0572).
?2017 South China University of Technology,Academy of Mathematics and Systems Science,CAS,and Springer-Verlag GmbH Germany
Yu CHENGis currently a Ph.D.candidate in the Department of Electrical and Computer Engineering at Michigan State University,MI,U.S.A.His research interests include human-machine interaction,tactile sensing,and robot modeling and control.E-mail:chengyu9@msu.edu.
Jiatong BAOreceived the B.S.and M.S.degrees in Computer Science and Technology from Yangzhou University,Yangzhou,Jiangsu,China in 2005 and 2008,and the Ph.D.degree in Instrument Science and Technology from Southeast University,Nanning,Jiangsu,China in 2012.Since 2012,he has been an Assistant Professor in the Department of Electrical Engineering,Yangzhou University.His research interests include robotic sensing and control,human-robot interaction,and natural language based robot task programming.E-mail:jtbao@yzu.edu.cn.
Yunyi JIAis currently an Assistant Professor in the Department of Automotive Engineering at Clemson University and the Clemson University International Center for Automotive Research(CU-ICAR).He received his Ph.D.degree in Electrical Engineering from Michigan State University,Michigan,U.S.A.,in 2014.His research interests mainly include robotics,autonomous driving,human-robot interaction,intelligent manufacturing,and advanced sensing systems.He is a member IEEE,ASME and SAE.E-mail:yunyij@clemson.edu.
Zhihui DENGis a Ph.D.candidate in Mechanical Design and Theory at Jiangsu University,Zhenjiang,China.He is currently an Associate Professor with School of Mechanical and Electrical Engineering,Changzhou College of Information Technology,Changzhou,China.His currently research interests include machine learning,sensing and control technology of robot.E-mail:zhdeng@msu.edu.
Zhiyong SUNreceived his Ph.D.degree in mechatronics engineering from Northeastern University,Shenyang,China,in July 2016.He was a visiting student in the Department of Electrical and Computer Engineering at Michigan State University from February 2013 to February 2015.Currently,he is a Postdoctoral Fellow in the Department of Industrial and Manufacturing Systems Engineering,The University of Hong Kong.His main research interests include robotic automation,smart material sensors/actuators,and bioMEMS.E-mail:sunzy@hku.hk.
Sheng BIreceived the M.Sc.in2003andthe Ph.D.degreein2010from South China University of Technology,Guangzhou,China.He is a Postdoctoral Fellow in Emerging Technologies Institute,The University of Hong Kong.Also,he is an Associate Professor with School of Computer Science and Engineering,South China University of Technology.His research interests include intelligent robots,embedded intelligent terminal,and smartphone development.E-mail:shengbi@hku.hk.
Congjian LIreceived the B.E.degree in mechanical engineering and automation from Ningxia University,Yinchuan,China,in 2013 and the M.E.degree in mechanical engineering from Harbin Institute of Technology Shenzhen Graduate School,Shenzhen,China in 2016.He is currently working towards the Ph.D.degree with the Department of Industrial and Manufacturing Systems Engineering,The University of Hong Kong.His research interests include robotic vision and compressive sensing.E-mail:u3004435@connect.hku.hk.
Ning XIreceived D.Sc.degree in systems science and mathematics from Washington University,St.Louis,MO,U.S.A.,in 1993.He is the Chair Professor of Robotics and Automation in the Department of Industrial and Manufacturing Systems Engineering,and the Director of the Emerging Technologies Institute,The University of Hong Kong.Before joining The University of Hong Kong,he was a University Distinguished Professor and the John D.Ryder Professor of Electrical and Computer Engineering at Michigan State University,East Lansing,MI,U.S.A.From 2011 to 2013,he served as the founding Head of the Department of Mechanical and Biomedical Engineering,City University of Hong Kong.His research interests include robotics,manufacturing automation,micro/nanomanufacturing,nanosensors and devices,and intelligent control and systems.Prof.Xi served as the President of the IEEE Nanotechnology Council(2010-2011).He is currently the President-Elect of the IEEE Robotics and Automation Society(2017-2018).E-mail:xining@hku.hk.
Control Theory and Technology2017年4期