orarbel · June 29, 2025 07:34 · mikikop · Jul 20, 2025 · orarbel · Jul 21, 2025
diff --git a/gistfile1.txt b/gistfile1.txt
 Hi LinkedIn friend!

 Here is how I managed to have the agent I'm building reduce token usage.

 1. My agent has 62 tools (and growing quickly in terms of number of tools..)
 2. Each tool has a description. All in all I was sending the entire 62 tools+description in every agent turn. 

 It came out to 10k tokens before even the system prompt+user prompt - ON EVERY TURN.

 The solution I found was to do a preflight LLM request to select only the relevant tools for the user request.

 Exact prompt: 

 "You are a tool selection expert. Given a user request, conversation context, and a list of available tools, select ONLY the tools that are most likely to be needed.

 {context_section}
 Current User Request: "{user_message}"

 Available Tools:
 {json.dumps(tool_descriptions, indent=2)}

 Instructions:
 1. Analyze BOTH the current request AND the conversation context to understand what the user is trying to accomplish
 2. Select tools that are directly relevant to completing the task in context
 3. Always include these core tools: retrieve_from_memory, store_in_memory, async_search_google
 4. For specific domains (ads, analytics, social media), include relevant tools
 5. If the context mentions LinkedIn, include LinkedIn tools; if it mentions Google Ads, include ads tools, etc.
 6. Aim for 6-15 tools maximum to optimize performance
 7. Return ONLY a JSON array of tool names, nothing else

 Example response format:
 ["retrieve_from_memory", "store_in_memory", "async_search_google", "google_ads_api_request", "create_spreadsheet"]""

 This way the tools part of the context becomes much smaller.

 However, I did have to do some prompt engineering and switch models between GPT-4.1-mini and GPT-4o-mini until I got to the point where it doesn't miss a tool or two that are required for the user request.
	Hi LinkedIn friend!

	Here is how I managed to have the agent I'm building reduce token usage.

	1. My agent has 62 tools (and growing quickly in terms of number of tools..)
	2. Each tool has a description. All in all I was sending the entire 62 tools+description in every agent turn.

	It came out to 10k tokens before even the system prompt+user prompt - ON EVERY TURN.

	The solution I found was to do a preflight LLM request to select only the relevant tools for the user request.

	Exact prompt:

	"You are a tool selection expert. Given a user request, conversation context, and a list of available tools, select ONLY the tools that are most likely to be needed.

	{context_section}
	Current User Request: "{user_message}"

	Available Tools:
	{json.dumps(tool_descriptions, indent=2)}

	Instructions:
	1. Analyze BOTH the current request AND the conversation context to understand what the user is trying to accomplish
	2. Select tools that are directly relevant to completing the task in context
	3. Always include these core tools: retrieve_from_memory, store_in_memory, async_search_google
	4. For specific domains (ads, analytics, social media), include relevant tools
	5. If the context mentions LinkedIn, include LinkedIn tools; if it mentions Google Ads, include ads tools, etc.
	6. Aim for 6-15 tools maximum to optimize performance
	7. Return ONLY a JSON array of tool names, nothing else

	Example response format:
	["retrieve_from_memory", "store_in_memory", "async_search_google", "google_ads_api_request", "create_spreadsheet"]""

	This way the tools part of the context becomes much smaller.

	However, I did have to do some prompt engineering and switch models between GPT-4.1-mini and GPT-4o-mini until I got to the point where it doesn't miss a tool or two that are required for the user request.