WorFBench: unified Agentic Workflow Generation Benchmark with miscellaneous scenarios and intricate graph-structured workflows. A set of subtasks with execution dependencies is typically referred to as a workflow. Workflows can serve as an intermediate state for solving complex tasks, aiding agents in bridging the gap between tasks and specific executable actions.Existing workflow evaluation frameworks either focus solely on holistic performance or suffer from limitations such as restricted scenario coverage, simplistic workflow structures, and lax evaluation standards. To address it, this paper introduce WORFBENCH, a unified workflow generation benchmark and WORFEVAL, a systemic evaluation protocol. 𝗞𝗲𝘆 𝗰𝗼𝗻𝘁𝗿𝗶𝗯𝘂𝘁𝗶𝗼𝗻𝘀: - propose WORFBENCH, a unified workflow generation benchmark with multi-faceted scenarios and complex workflow structures - introduce WORFEVAL, using effective subsequence and subgraph matching algorithms to evaluate the workflow generation ability of LLM agents from both chain and graph structures 𝗪𝗢𝗥𝗙𝗕𝗘𝗡𝗖𝗛 i) Task formulation - Given a specific task and a candidate action list, our goal is to enable the language agents to generate a graph-structured workflow, where the nodes in the workflow satisfy the minimum executable granularity ii) Benchmark Construction - function call data is collected from ToolBench and ToolAlpaca, - For Embodied tasks, collected EACT format gold trajectories of ALFWorld, WebShop from ETO, OS from AgentInstruct - introduce problem-solving tasks like math, commonsense, and multimodal reasoning tasks from LUMOS and WikiHow iii) Quality Control - order of nodes is logical, is guaranteed when constructing the node chain based on the sequence of gold function calls - For checking if each node accurately decomposes the task, used each synthesized node as a query to retrieve the function list - filtered out 15.36% data through the quality control for the node chain - For workflow graph, discarded 29.77% of data points where topological sorting results not aligned with node chains 𝗪𝗢𝗥𝗙𝗘𝗩𝗔𝗟 - quantitatively evaluate both the node chain and workflow graph using restrict algorithms - calculated similarity matrix between gold workflow nodes and edges, and agent predicted nodes and edges - Since a predicted node may match multiple gold nodes and a gold node may be matched by multiple predicted nodes, utilized a max-weighted bipartite matching algorithm to find best matches 𝗥𝗲𝘀𝘂𝗹𝘁𝘀 - distinct gaps between sequence planning capabilities and graph planning capabilities of LLM agents, with even GPT-4 exhibiting a gap of around 15% - ability to predict graph-structured workflows falls far short of real-world requirements, with even GPT-4 only achieving a performance of 52.47% 𝗣𝗮𝗽𝗲𝗿: https://lnkd.in/epAgYS2P 𝗖𝗼𝗱𝗲: https://lnkd.in/e2RDYPMg 𝗗𝗮𝘁𝗮𝘀𝗲𝘁: https://lnkd.in/eJncc2VH
Task Workflow Modeling
Explore top LinkedIn content from expert professionals.
Summary
Task-workflow-modeling is the process of breaking down complex jobs into smaller, organized steps and mapping how tasks connect to achieve a specific goal. It’s a crucial practice for designing systems—especially in AI and UX—that helps teams understand, automate, and improve how work gets done.
- Map key steps: Start by listing every action needed to complete a workflow, then organize these into logical sequences or branches to reveal dependencies and possible improvements.
- Test and refine: Use real user or system data to run workflows, track completion rates and accuracy, and adjust tasks to keep operations smooth and reliable.
- Prioritize automation: Break workflows into individual tasks and identify which repetitive or time-consuming steps make sense to automate, making sure you keep tasks that build expertise or need a human touch.
-
-
agents that learn your workflows > agents that relearn you every day. I’m sharing a standout research report: Log2Plan, an adaptive GUI automation framework powered by task mining. It learns from real interaction logs, builds a reusable plan, and then adapts each step to the live screen. Think: global plan + local grounding, so agents get more reliable the longer you use them. ↳ Why this matters for UX/UI: ➤ Personalization without hero prompts, the system internalizes how you work (file paths, naming, exception paths). ➤ Recoverable runs, step-level checks and quick human-assist beats brittle macro replays. ➤ Transparent actions, structured plans you can read, audit, and improve. ➤ Resilience to UI drift, intent stays stable even when buttons and layouts move. ↳ What’s actually new here: ➤ Task mining turns messy click/keystroke logs into reusable “Task Groups” (ENV / ACT / Title / Description). ➤ Retrieval-augmented planning pulls the right pieces for a new goal, then the local planner fits them to the current screen. ➤ A clear separation of plan vs. interaction that reduces token bloat and flaky screenshot reasoning. ↳ Try this week (operator’s cut): ➤ Pick one high-volume desktop flow (e.g., monthly report collation). ➤ Curate 2–3 clean traces into “Task Groups.” ➤ Define success metrics (success rate, sub-task completion, time per task, assist rate). ➤ Add human-assist checkpoints for sensitive steps and ship a small pilot. Follow for more UX/UI & AI implementations. Re-share with your network.
-
I'm not surprised MIT's report shows 95% of AI prototypes fail. This summer, Forerunner automated 2 critical workflows for our research and diligence efforts. The patience and trial & error required far exceeded our expectations. The results? AWESOME. No doubt AI is powerful. After automating the workflows, I would say the power today is one part potential and one part reality. Building an AI workflow is still quite difficult, especially if you are not technical. So how did we do it? Here's an overview of our first step. 1️⃣ Define workflows 2️⃣ Break workflows down into tasks 3️⃣ Define automation evaluation criteria 4️⃣ Evaluate automation potential & impact 5️⃣ Prioritize tasks for automation The first principle thinking required here set us up for success. In fact, the findings of MIT's report supports this approach -- "pick one pain point, execute well, and partner smartly with companies who use their tools." Across all of our research & diligence workflows, we separated our workflows into 120+ tasks and selected five characteristics to evaluate automation potential & impact. 1️⃣ Task requires a unique Forerunner lens 2️⃣ Automating task would save significant time 3️⃣ Task is routine & frequent 4️⃣ AI can create desired output 5️⃣ Task is essential to building expertise Having this framework clarified what tasks made sense to automate first and, as importantly, what tasks did not make sense to automate today (or possibly ever). We believe certain tasks are more important for skill & perspective building even if they could be automated. More to come on what tool we selected, how we automated the workflows, and what we learned along the way -- this is where the patience and perseverance comes in.
-
✅ How To Run Task Analysis In UX (https://lnkd.in/e_s_TG3a), a practical step-by-step guide on how to study user goals, map user’s workflows, understand top tasks and then use them to inform and shape design decisions. Neatly put together by Thomas Stokes. 🚫 Good UX isn’t just high completion rates for top tasks. 🤔 Better: high accuracy, low task on time, high completion rates. ✅ Task analysis breaks down user tasks to understand user goals. ✅ Tasks are goal-oriented user actions (start → end point → success). ✅ Usually presented as a tree (hierarchical task-analysis diagram, HTA). ✅ First, collect data: users, what they try to do and how they do it. ✅ Refine your task list with stakeholders, then get users to vote. ✅ Translate each top task into goals, starting point and end point. ✅ Break down: user’s goal → sub-goals; sub-goal → single steps. ✅ For non-linear/circular steps: mark alternate paths as branches. ✅ Scrutinize every single step for errors, efficiency, opportunities. ✅ Attach design improvements as sticky notes to each step. 🚫 Don’t lose track in small tasks: come back to the big picture. Personally, I've been relying on top task analysis for years now, kindly introduced by Gerry McGovern. Of all the techniques to capture the essence of user experience, it’s a reliable way to do so. Bring it together with task completion rates and task completion times, and you have a reliable metric to track your UX performance over time. Once you identify 10–12 representative tasks and get them approved by stakeholders, we can track how well a product is performing over time. Refine the task wording and recruit the right participants. Then give these tasks to 15–18 actual users and track success rates, time on task and accuracy of input. That gives you an objective measure of success for your design efforts. And you can repeat it every 4–8 months, depending on velocity of the team. It’s remarkably easy to establish and run, but also has high visibility and impact — especially if it tracks the heart of what the product is about. Useful resources: Task Analysis: Support Users in Achieving Their Goals (attached image), by Maria Rosala https://lnkd.in/ePmARap3 What Really Matters: Focusing on Top Tasks, by Gerry McGovern https://lnkd.in/eWBXpCQp How To Make Sense Of Any Mess (free book), by Abby Covert https://lnkd.in/enxMMhMe How We Did It: Task Analysis (Case Study), by Jacob Filipp https://lnkd.in/edKYU6xE How To Optimize UX and Improve Task Efficiency, by Ella Webber https://lnkd.in/eKdKNtsR How to Conduct a Top Task Analysis, by Jeff Sauro https://lnkd.in/eqWp_RNG [continues in the comments below ↓]
-
You must know these 𝗔𝗴𝗲𝗻𝘁𝗶𝗰 𝗦𝘆𝘀𝘁𝗲𝗺 𝗪𝗼𝗿𝗸𝗳𝗹𝗼𝘄 𝗣𝗮𝘁𝘁𝗲𝗿𝗻𝘀 as an 𝗔𝗜 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿. If you are building Agentic Systems in an Enterprise setting you will soon discover that the simplest workflow patterns work the best and bring the most business value. At the end of last year Anthropic did a great job summarising the top patterns for these workflows and they still hold strong. Let’s explore what they are and where each can be useful: 𝟭. 𝗣𝗿𝗼𝗺𝗽𝘁 𝗖𝗵𝗮𝗶𝗻𝗶𝗻𝗴: This pattern decomposes a complex task and tries to solve it in manageable pieces by chaining them together. Output of one LLM call becomes an output to another. ✅ In most cases such decomposition results in higher accuracy with sacrifice for latency. ℹ️ In heavy production use cases Prompt Chaining would be combined with following patterns, a pattern replace an LLM Call node in Prompt Chaining pattern. 𝟮. 𝗥𝗼𝘂𝘁𝗶𝗻𝗴: In this pattern, the input is classified into multiple potential paths and the appropriate is taken. ✅ Useful when the workflow is complex and specific topology paths could be more efficiently solved by a specialized workflow. ℹ️ Example: Agentic Chatbot - should I answer the question with RAG or should I perform some actions that a user has prompted for? 𝟯. 𝗣𝗮𝗿𝗮𝗹𝗹𝗲𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻: Initial input is split into multiple queries to be passed to the LLM, then the answers are aggregated to produce the final answer. ✅ Useful when speed is important and multiple inputs can be processed in parallel without needing to wait for other outputs. Also, when additional accuracy is required. ℹ️ Example 1: Query rewrite in Agentic RAG to produce multiple different queries for majority voting. Improves accuracy. ℹ️ Example 2: Multiple items are extracted from an invoice, all of them can be processed further in parallel for better speed. 𝟰. 𝗢𝗿𝗰𝗵𝗲𝘀𝘁𝗿𝗮𝘁𝗼𝗿: An orchestrator LLM dynamically breaks down tasks and delegates to other LLMs or sub-workflows. ✅ Useful when the system is complex and there is no clear hardcoded topology path to achieve the final result. ℹ️ Example: Choice of datasets to be used in Agentic RAG. 𝟱. 𝗘𝘃𝗮𝗹𝘂𝗮𝘁𝗼𝗿-𝗼𝗽𝘁𝗶𝗺𝗶𝘇𝗲𝗿: Generator LLM produces a result then Evaluator LLM evaluates it and provides feedback for further improvement if necessary. ✅ Useful for tasks that require continuous refinement. ℹ️ Example: Deep Research Agent workflow when refinement of a report paragraph via continuous web search is required. 𝗧𝗶𝗽𝘀: ❗️ Before going for full fledged Agents you should always try to solve a problem with simpler Workflows described in the article. What are the most complex workflows you have deployed to production? Let me know in the comments 👇 #LLM #AI #MachineLearning
-
If you’re building anything with LLMs, your system architecture matters more than your prompts. Most people stop at “call the model, get the output.” But LLM-native systems need workflows, blueprints that define how multiple LLM calls interact, how routing, evaluation, memory, tools, or chaining come into play. Here’s a breakdown of 6 core LLM workflows I see in production: 🧠 LLM Augmentation Classic RAG + tools setup. The model augments its own capabilities using: → Retrieval (e.g., from vector DBs) → Tool use (e.g., calculators, APIs) → Memory (short-term or long-term context) 🔗 Prompt Chaining Workflow Sequential reasoning across steps. Each output is validated (pass/fail) → passed to the next model. Great for multi-stage tasks like reasoning, summarizing, translating, and evaluating. 🛣 LLM Routing Workflow Input routed to different models (or prompts) based on the type of task. Example: classification → Q&A → summarization all handled by different call paths. 📊 LLM Parallelization Workflow (Aggregator) Run multiple models/tasks in parallel → aggregate the outputs. Useful for ensembling or sourcing multiple perspectives. 🎼 LLM Parallelization Workflow (Synthesizer) A more orchestrated version with a control layer. Think: multi-agent systems with a conductor + synthesizer to harmonize responses. 🧪 Evaluator–Optimizer Workflow The most underrated architecture. One LLM generates. Another evaluates (pass/fail + feedback). This loop continues until quality thresholds are met. If you’re an AI engineer, don’t just build for single-shot inference. Design workflows that scale, self-correct, and adapt. 📌 Save this visual for your next project architecture review. 〰️〰️〰️ Follow me (Aishwarya Srinivasan) for more AI insight and subscribe to my Substack to find more in-depth blogs and weekly updates in AI: https://lnkd.in/dpBNr6Jg
-
You don’t need to be an AI agent to be agentic. No, that’s not an inspirational poster. It’s my research takeaway for how companies should build AI into their business. Agents are the equivalent of a self-driving Ferrari that keeps driving itself into the wall. It looks and sounds cool, but there is a better use for your money. AI workflows offer a more predictable and reliable way to sound super cool while also yielding practical results. Anthropic defines both agents and workflows as agentic systems, specifically in this way: 𝗪𝗼𝗿𝗸𝗳𝗹𝗼𝘄𝘀: systems where predefined code paths orchestrate the use of LLMs and tools 𝗔𝗴𝗲𝗻𝘁𝘀: systems where LLMs dynamically decide their own path and tool uses For any organization leaning into Agentic AI, don’t start with agents. You will just overcomplicate the solution. Instead, try these workflows from Anthropic’s guide to effectively building AI agents: 𝟭. 𝗣𝗿𝗼𝗺𝗽𝘁-𝗰𝗵𝗮𝗶𝗻𝗶𝗻𝗴: The type A of workflows, this breaks a task down into sequential tasks organized and logical steps, with each step building on the last. It can include gates where you can verify the information before going through the entire process. 𝟮. 𝗣𝗮𝗿𝗮𝗹𝗹𝗲𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻: The multi-tasker workflow, this separates tasks across multiple LLMs and then combines the outputs. This is great for speed, but also collects multiple perspectives from different LLMs to increase confidence in the results. 𝟯. 𝗥𝗼𝘂𝘁𝗶𝗻𝗴: The task master of workflows, this breaks down complex tasks into different categories and assigns those to specialized LLMs that are best suited for the task. Just like you don’t want to give an advanced task to an intern or a basic task to a senior employee, this find the right LLM for the right job. 𝟰. 𝗢𝗿𝗰𝗵𝗲𝘀𝘁𝗿𝗮𝘁𝗼𝗿-𝘄𝗼𝗿𝗸𝗲𝗿𝘀: The middle manager of the workflows, this has an LLM that breaks down the tasks and delegates them to other LLMs, then synthesizes their results. This is best suited for complex tasks where you don’t quite know what subtasks are going to be needed. 𝟱. 𝗘𝘃𝗮𝗹𝘂𝗮𝘁𝗼𝗿-𝗼𝗽𝘁𝗶𝗺𝗶𝘇𝗲𝗿: The peer review of workflows, this uses an LLM to generate a response while another LLM evaluates and provides feedback in a loop until it passes muster. View my full write-up here: https://lnkd.in/eZXdRrxz
-
SAP Workflow Overview SAP Workflows are tools within the SAP system that automate and streamline business processes by allowing organizations to design, execute, and monitor workflows involving multiple steps and participants across departments. Key Features 1. Process Automation: Reduces manual effort and errors. 2. Integration: Works with various SAP modules like FI, MM, and SD. 3. Flexibility: Customizable to fit specific business needs. 4. Monitoring and Reporting: Real-time tracking and performance analysis. 5. Role-Based Access: Assigns tasks based on roles and responsibilities. Components of SAP Workflows 1. Workflow Builder: A tool for designing and modeling workflows with steps, decision points, and conditions. 2. Business Objects: Represent real-world entities like invoices or purchase orders. 3. Tasks: Steps within a workflow, which can involve user interaction or automation. 4. Events: Triggers for starting workflows or specific tasks. 5. Agents: Users responsible for executing tasks. 6. Workflow Container: Stores data and parameters for workflow execution. Benefits • Efficiency: Automates repetitive tasks, reducing processing time. • Consistency: Standardizes processes, ensuring compliance. • Transparency: Provides visibility into process status and bottlenecks. • Scalability: Adapts to changing business needs and growth. Structure of SAP Workflows 1. Workflow Definition: The blueprint of the workflow process. 2. Steps: Include activity, decision, event creator, and wait steps. 3. Containers and Bindings: Hold and map data between steps. 4. Rules and Conditions: Determine execution paths based on data or logic. 5. Error Handling: Mechanisms for managing errors and escalations. Development Process 1. Analysis: Identify the process to be automated and gather requirements. 2. Design: Use Workflow Builder to design the process. 3. Configuration: Set up tasks, events, agents, and data containers. 4. Testing: Simulate and test the workflow for expected behavior. 5. Deployment: Activate and deploy the workflow to production. 6. Monitoring and Optimization: Use SAP tools to monitor and improve workflow performance. Practical Examples 1. Purchase Requisition Approval: Automates approval, ensuring compliance with budgets. • Trigger: Requisition creation. • Steps: Approval by manager, processing by purchasing. • Outcome: Timely approval and compliance. 2. Employee Onboarding: Coordinates tasks for new hires, like setting up IT equipment and scheduling orientation. • Trigger: New hire added to HR system. • Steps: Notifications to departments, task coordination. • Outcome: Smooth onboarding process. 3. Sales Order Processing: Automates inventory checks, production scheduling, and shipping coordination. • Trigger: Sales order entry. • Steps: Inventory check, decision points for out-of-stock items. • Outcome: Improved order fulfillment and customer satisfaction.
-
Most of us overcomplicate our first agentic system. .. definitely not me 😅 Too many developers jump straight to complex orchestration patterns when prompt chaining would solve 80% of their use cases. We get excited about the technology! But each pattern has its sweet spot. Aurimas Griciūnas recently shared 5 workflow patterns that actually work for enterprise AI agents, and the simplest ones sometimes deliver the most business value. - 𝗣𝗿𝗼𝗺𝗽𝘁 𝗖𝗵𝗮𝗶𝗻𝗶𝗻𝗴 breaks complex tasks into manageable pieces. Higher accuracy, trades off latency. Perfect for most enterprise workflows. - 𝗥𝗼𝘂𝘁𝗶𝗻𝗴 classifies inputs and takes the right path. Essential when you need specialized workflows (think RAG vs. action execution in chatbots). - 𝗣𝗮𝗿𝗮𝗹𝗹𝗲𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻 splits work for speed and accuracy. Game-changer for invoice processing or multi-query RAG systems. - 𝗢𝗿𝗰𝗵𝗲𝘀𝘁𝗿𝗮𝘁𝗼𝗿 dynamically delegates tasks. Use this when there's no clear hardcoded path to the result. - 𝗘𝘃𝗮𝗹𝘂𝗮𝘁𝗼𝗿-𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗲𝗿 creates continuous refinement loops. Critical for research agents that need to iterate on quality. The pattern here? Start simple, prove value, then add complexity as needed. Too many teams build the orchestrator first and wonder why their agent hallucinates or costs $50 per query. #AI #AgenticSystems #AIAgents #SoftwareFirst
Explore categories
- Hospitality & Tourism
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Healthcare
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Career
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development