Creating heuristic based agents for Project Malmo (Individual only)

For this assignment, you will be creating agents that can traverse a maze in Project Malmo (Minecraft)

Now that we know some general tree searching algorithms (ask and/or go back and check the resources, for example slides, I’ve given if you’re unclear on the algorithms), we’re going to implement an agent that can find a goal block in a Minecraft maze.

We’ll be using Project Malmo for this assignment. - https://github.com/Microsoft/malmo

Minecraft is waiting for your agent!

Once the Malmo environment has completed loaded (see appendix for the instructions), you can run your model using the normal commands.

The MazeSim code below can be run (after your Minecraft server has already been loaded) using the normal python MazeSim.py

Objective

The objective of this assignment is to make an Agent that can use both Breath-First Search and A* Search. To complete both, you’ll need to decide how to implement the data structures needed to represent the tree and search it (e.g., for the frontier-set).

Your agent will be given a 2D list that represents the initial state of the agent, the goal state, possible places in the grid to move, and places to which your agent cannot move (the representations for each of these is included in the code documentation).

A bit about the 2D list/grid

Normally Malmo (as it’s setup with this file/maze) would give you a flattened list of strings that actually represent where an object is in a 2d grid. For example, the list that would correspond to the figure below would be: [“0”,“1”,“2”,“3”,“4”,“5”,“6”,“7”,“8”]. To go from “0” to “3” would be moving north 1 block or “movenorth 1”. Thus, it would look like the figure below:

Malmo Normal Grid

To help out with this, I parse the list and give you a 2D grid that should prove a bit more natural for your path planning algorithm [[“0”,“1”,“2”], [“3”,“4”,“5”], [“6”,“7”,“8”]]. I’ve also simplified things by changing the strings to integers that can easily be parsed initial, goal, accessible, and inaccessible states (where the latter is a point on the grid where the agent cannot go).

Requirements (what in the hell do I turn in and how should it look?)

For this assignment you’ll need to complete the MazeAgent.py class so that it can get the correct movements to the goal using the get_path method. You should also complete the __plan_path_breadth and __plan_path_astar method. Both should work within the get_path method, but when you turn in your MazeAgent.py file, only one of them must be actually used within the get_path method (basically the file you turn in should be such that I can run the MazeSim.py file without errors).

In addition to the above file, you also should turn in a README file that includes the following:

Also, remember that if your agent is not working as you expect, you can just terminate the agent script and keep the server running (this should allow you to not have to restart the server). You can exit “Save and Quit to Title”, but if you do this before stopping the agent/simulation script the server may not respond the next time and have to be restarted.

Grading

Grade item Points
Breadth-First & A* implemented 5 pts
Well documented (including readme) 3 pts

The MazeAgent (starter) code

Below is the starter code for your maze agent. If you add functions/change naming & functions, make sure you document changes and your agent still works!

For our Maze Agent class, we are just going to have a few attributes

__grid holds our actual representation of the map __goal_state holds the representation for our goal. I did not want to specify to allow you to figure out how you want to represent the problem, but the MazeSim code has constants & numbers to tell us what current state we are in (take a look at that code after)

class MazeAgent(object):
    '''
    Agent that uses path planning algorithm to figure out path to take to reach goal
    Built for Malmo discrete environment and to use Malmo discrete movements
    '''

    def __init__(self, grid=None):
        '''
        Arguments
            grid -- (optional) a 2D list that represents the map
        '''
        self.__frontier_set = None
        self.__explored_set = None
        self.__goal_state = None
        self.__grid = grid

Normal accessors & mutators....not much to see here ;-)

    def get_eset(self):
        return self.__explored_set

    def get_fset(self):
        return self.__frontier_set

    def get_goal(self):
        return self.__goal_state

    def set_grid(self, grid):
        self.__grid = grid

You should place your functionality into the __plan methods and use (one of) them within get_path


    def __plan_path_breadth(self):
        '''Breadth-First tree search'''
        pass

    def __plan_path_astar(self):
        '''A* tree search'''
        pass

    def get_path(self):
        '''should return list of strings where each string gives movement command
            (these should be in order)
            Example:
             ["movenorth 1", "movesouth 1", "moveeast 1", "movewest 1"]
             (these are also the only four commands that can be used, you
             cannot move diagonally)
             On a 2D grid (list), "move north" would move us
             from, say, [0][0] to [1][0]
        '''
        pass

The MazeSim class

The MazeSim class creates the environment and runs your agent MazeAgent

Use this code to run your agent. You should expect that when I run your agent, I will use this class.

Thus, don't use a modified version of this for your own testing!

Most of this code is just setup and things you won't need to worry about. However below the full code, I highlight where your code affects the running of the simulation.

'''
 Modified from Maze python example by Chris Dancy @ Bucknell University for
 AI & Cognitive Science course
'''

import os, random, argparse, sys, time, json, errno
import malmoenv
from MazeAgent import MazeAgent

class MazeSim():
	MAP_SIZE = 60
	MS_PER_TICK = 50

	FLOOR_BLOCK = "grass"
	GAP_BLOCK = "stone"
	PATH_BLOCK = "sandstone"
	START_BLOCK = "emerald_block"
	GOAL_BLOCK = "gold_block"

	#Canvas params
	CANVAS_BORDER = 20
	CANVAS_WIDTH = 400
	CANVAS_HEIGHT = CANVAS_BORDER + ((CANVAS_WIDTH - CANVAS_BORDER))
	CANVAS_SCALEX = (CANVAS_WIDTH-CANVAS_BORDER)/MAP_SIZE
	CANVAS_SCALEY = (CANVAS_HEIGHT-CANVAS_BORDER)/MAP_SIZE
	CANVAS_ORGX = -MAP_SIZE/CANVAS_SCALEX
	CANVAS_ORGY = -MAP_SIZE/CANVAS_SCALEY

	DEFAULT_MAZE = '''
		<MazeDecorator>
			<SizeAndPosition length="''' + str(MAP_SIZE-1) + '''"\
				width="''' + str(MAP_SIZE-1) + '''" \
				yOrigin="225" zOrigin="0" height="180"/>
			<GapProbability variance="0.4">0.5</GapProbability>
			<Seed>15</Seed>
			<MaterialSeed>random</MaterialSeed>
			<AllowDiagonalMovement>false</AllowDiagonalMovement>
			<StartBlock fixedToEdge="true" type="emerald_block"/>
			<EndBlock fixedToEdge="true" type="''' + GOAL_BLOCK + '''" height="12"/>
			<PathBlock type="''' + PATH_BLOCK + '''" colour="WHITE ORANGE MAGENTA LIGHT_BLUE YELLOW LIME PINK GRAY SILVER CYAN PURPLE BLUE BROWN GREEN RED BLACK" height="1"/>
			<FloorBlock type="''' + FLOOR_BLOCK + '''"/>
			<GapBlock type="'''+ GAP_BLOCK + '''" height="2"/>
			<AddQuitProducer description="finished maze"/>
		</MazeDecorator>
	'''

	def __init__(self, maze_str=None, agent=None):
		if (not(maze_str is None)):
			self.__maze_str = maze_str
		else:
			self.__maze_str = MazeSim.DEFAULT_MAZE

		self.__maze_grid = [["Empty" for x in range(MazeSim.MAP_SIZE)] \
							for x in range(MazeSim.MAP_SIZE)]
		self.agent = agent

	def get_mission_xml(self):
		return '''<?xml version="1.0" encoding="UTF-8" ?>
		<Mission xmlns="http://ProjectMalmo.microsoft.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
			<About>
				<Summary>Run the maze!</Summary>
			</About>

			<ModSettings>
				<MsPerTick>''' + str(MazeSim.MS_PER_TICK) + '''</MsPerTick>
			</ModSettings>

			<ServerSection>
				<ServerInitialConditions>
					<AllowSpawning>false</AllowSpawning>
				</ServerInitialConditions>
				<ServerHandlers>
					<FlatWorldGenerator generatorString="3;7,220*1,5*3,2;3;,biome_1" />
					''' + self.__maze_str + '''
					<ServerQuitFromTimeUp timeLimitMs="45000"/>
					<ServerQuitWhenAnyAgentFinishes />
				</ServerHandlers>
			</ServerSection>

			<AgentSection mode="Survival">
				<Name>A* Smart Guy</Name>
				<AgentStart>
					<Placement x="10" y="228" z="1"/>
				</AgentStart>
				<AgentHandlers>
					<VideoProducer want_depth="false">
						<Width>640</Width>
						<Height>480</Height>
					</VideoProducer>
					<ObservationFromGrid>
						<Grid name="World" absoluteCoords="true">
							<min x="0" y="226" z="0"/>
							<max x="''' + str(MazeSim.MAP_SIZE-1) + \
							'''" y="226" z="''' + str(MazeSim.MAP_SIZE-1) + '''"/>
						</Grid>
					</ObservationFromGrid>
					<DiscreteMovementCommands />
				</AgentHandlers>
			</AgentSection>

		</Mission>'''

	def __fill_grid(self, observations):
		'''
		Converts observation string grid (which is flat) into 2D list of observations
		This simplifies the list so that the initial location is marked with a 2,
		goal location is marked with a 3,
		 all invalid locations/blocks are marked with a 0,
		 and all valid moves/blocks are marked with a 1
		Arguments:
			observations -- list of strings in order by a flattened grid
		'''
		flat_grid_max = len(observations)
		grid_max = len(self.__maze_grid)
		for i in range(flat_grid_max):
			curr_row = ((flat_grid_max-1)-i)//grid_max
			curr_col = ((flat_grid_max-1)-i)%grid_max
			self.__maze_grid[curr_row][curr_col] = self.conv_obs_str(observations[i])

	def conv_obs_str(self, obs_str):
		'''
		Converts a given object string to the numerical representation for our
		 grid according to a few simple tests
		'''
		if (obs_str == MazeSim.GOAL_BLOCK):
			return 3
		elif (obs_str == MazeSim.START_BLOCK):
			return 2
		elif (obs_str == MazeSim.PATH_BLOCK):
			return 1
		else:
			return 0

	def create_actions(self):
		'''Returns dictionary of actions that make up agent action space (discrete movements)
		'''
		actions = [0] * 5
		actions[0] = "movenorth 1"
		actions[1] = "moveeast 1"
		actions[2] = "movesouth 1"
		actions[3] = "movewest 1"
		actions[4] = "move 0"

		return (actions)


	def run_sim(self, exp_role, num_episodes, port1, serv1, serv2, exp_id, epi, rsync):
		'''Code to actually run simulation
		'''
		validate = True
		movements = None

		env = malmoenv.make()

		env.init(self.get_mission_xml(),
				 port1, server=serv1,
				 server2=serv2, port2=(port1 + exp_role),
				 role=exp_role,
				 exp_uid=exp_id,
				 episode=epi,
				 resync=rsync,
				 action_space = malmoenv.ActionSpace(self.create_actions()))

		max_num_steps = 1000

		for r in range(num_episodes):
			print("Reset [" + str(exp_role) + "] " + str(r) )
			movements = None
			max_retries = 3

			env.reset()
			num_steps = 0

			sim_done = False
			total_reward = 0
			total_commands = 0

			(obs, reward, sim_done, info) = env.step(4)
			while not sim_done:
				num_steps += 1

				if (info is None or len(info) == 0):
					(obs, reward, sim_done, info) = env.step(4)
				elif (movements is None):
					info_json = json.loads(info)
					self.__fill_grid(info_json["World"])
					self.__maze_grid

					self.agent.set_grid(self.__maze_grid)

					#You need to make it so this works! :-)
					if (movements is None):
						movements = self.agent.get_path()
						print(movements)
						print(len(movements))

				else:
					try:
						#Moves are presented in reverse order (last move 1st)
						next_move = movements.pop()
						(obs, reward, sim_done, info) = env.step(env.action_space.actions.index(next_move))
					except RuntimeError as e:
						print("Issue with command/action: ",e)
						pass
				time.sleep(0.05)


			#print "Mission has stopped."
			time.sleep(0.5) # Give mod a little time to get back to dormant state.

#Change the MazeAgent as needed, but that should be the only part of the code that
# you need to change
if __name__ == '__main__':

	parser = argparse.ArgumentParser(description='malmovnv test')
	parser.add_argument('--port', type=int, default=9000, help='the mission server port')
	parser.add_argument('--server', type=str, default='127.0.0.1', help='the mission server DNS or IP address')
	parser.add_argument('--server2', type=str, default=None, help="(Multi-agent) role N's server DNS or IP")
	parser.add_argument('--port2', type=int, default=9000, help="(Multi-agent) role N's mission port")
	parser.add_argument('--episodes', type=int, default=10, help='the number of resets to perform - default is 1')
	parser.add_argument('--episode', type=int, default=0, help='the start episode - default is 0')
	parser.add_argument('--resync', type=int, default=0, help='exit and re-sync on every N - default 0 meaning never')
	parser.add_argument('--experimentUniqueId', type=str, default='test1', help="the experiment's unique id.")
	args = parser.parse_args()
	if args.server2 is None:
		args.server2 = args.server

	#smart_guy = MazeAgent()
	smart_guy = MazeAgent(None, "astar")
	smart_guy_sim = MazeSim(agent=smart_guy)
	smart_guy_sim.run_sim(0, args.episodes, args.port, args.server, args.server2,
					args.experimentUniqueId, args.episode, args.resync)
	print(len(smart_guy.get_eset()))
	print(len(smart_guy.get_fset()))

So, your agent is going to affect the runnning of the simulation in a simple way: Within the main loop, we will use your agent (self.agent) to get the path that we should follow! In the MazeAgent starter code, I explain that this should be a list of actions (and I explain what those actions should look like there as well). Of course, we'll also initialize your agent, which should be contained in MazeAgent in the code at the bottom of the file.

validate = True
movements = None

env = malmoenv.make()

env.init(self.get_mission_xml(),
		 port1, server=serv1,
		 server2=serv2, port2=(port1 + exp_role),
		 role=exp_role,
		 exp_uid=exp_id,
		 episode=epi,
		 resync=rsync,
		 action_space = malmoenv.ActionSpace(self.create_actions()))

max_num_steps = 1000

for r in range(num_episodes):
	print("Reset [" + str(exp_role) + "] " + str(r) )
	movements = None
	max_retries = 3

	env.reset()
	num_steps = 0

	sim_done = False
	total_reward = 0
	total_commands = 0

	(obs, reward, sim_done, info) = env.step(4)
	while not sim_done:
		num_steps += 1

		if (info is None or len(info) == 0):
			(obs, reward, sim_done, info) = env.step(4)
		elif (movements is None):
			info_json = json.loads(info)
			self.__fill_grid(info_json["World"])
			self.__maze_grid

			self.agent.set_grid(self.__maze_grid)

			#You need to make it so this works! :-)
			if (movements is None):
				movements = self.agent.get_path()
				print(movements)
				print(len(movements))

		else:
			try:
				#Moves are presented in reverse order (last move 1st)
				next_move = movements.pop()
				(obs, reward, sim_done, info) = env.step(env.action_space.actions.index(next_move))
			except RuntimeError as e:
				print("Issue with command/action: ",e)
				pass
		time.sleep(0.05)


	#print "Mission has stopped."
	time.sleep(0.5) # Give mod a little time to get back to dormant state.

#Change the MazeAgent as needed, but that should be the only part of the code that
# you need to change
if __name__ == '__main__':

	parser = argparse.ArgumentParser(description='malmovnv test')
	parser.add_argument('--port', type=int, default=9000, help='the mission server port')
	parser.add_argument('--server', type=str, default='127.0.0.1', help='the mission server DNS or IP address')
	parser.add_argument('--server2', type=str, default=None, help="(Multi-agent) role N's server DNS or IP")
	parser.add_argument('--port2', type=int, default=9000, help="(Multi-agent) role N's mission port")
	parser.add_argument('--episodes', type=int, default=10, help='the number of resets to perform - default is 1')
	parser.add_argument('--episode', type=int, default=0, help='the start episode - default is 0')
	parser.add_argument('--resync', type=int, default=0, help='exit and re-sync on every N - default 0 meaning never')
	parser.add_argument('--experimentUniqueId', type=str, default='test1', help="the experiment's unique id.")
	args = parser.parse_args()
	if args.server2 is None:
		args.server2 = args.server

	#smart_guy = MazeAgent()
	smart_guy = MazeAgent(None, "astar")
	smart_guy_sim = MazeSim(agent=smart_guy)
	smart_guy_sim.run_sim(0, args.episodes, args.port, args.server, args.server2,
				args.experimentUniqueId, args.episode, args.resync)
	print(len(smart_guy.get_eset()))
	print(len(smart_guy.get_fset()))

Appendix

In the sections that follow, you'll find installations instructions for malmo, the simulation code (I'd suggest calling it MobSim.py), and some Neural Net starter code

Instructions

Provided Files

Java 8

Getting Started with Malmo

Create a virtual Environment for the Python dependencies that you'll install for the project:

1. On any platform (Mac/Linux/Windows)

Create a virtual environment called malmoEnv

username$ python3 -m venv malmoEnv
2a. On a Mac/Linux machine, in the terminal

Activate environment called testEnv

username$ source malmoEnv/bin/activate

(Do not do this until you want to not use your virtual environment anymore)
To deactivate any virtual environment you are in

$ deactivate
2b. On a Windows machine, in cmd

Activate environment called malmoEnv

malmoEnv\Scripts\activate.bat

(Do not do this until you want to not use your venv)
To deactivate any virtual environment you are in deactivate

You should see the name of your env next to your command line in the terminal/shell now, this is how you know that your virtual environment is activated

Install the dependencies needed with pip

(malmoENV) username$ python -m pip install gym lxml numpy pillow

Congrats, you should all you need installed! Now let's move on to getting the files you need

Running the Malmo/Minecraft server

If you are on a lab computer, you will need to run module load java/1.8 before starting the Minecraft server each time in the same terminal

Running your agent

username$ python MalmoSim.py

Hint: