Imagine this: Your AI agent is happily automating a complex workflow across desktop and Android apps. Suddenly, the connection drops for a split second. Instead of gracefully recovering, or at least failing fast, it spirals into an endless retry loop, burning tokens, time, and your patience. With the release of AskUI Python SDK v0.30.0, that nightmare scenario is now firmly in the past. This focused update transforms how agents handle real-world friction and how you observe their behavior, making them far more robust, observable, and trustworthy for serious automation projects.
If you're new to AskUI, think of it as the "eyes and hands" for AI agents in the world of user interfaces. Traditional automation tools, such as Selenium, Appium, or PyAutoGUI, rely on fragile selectors (XPath, IDs, coordinates) that break the moment a UI changes even slightly. AskUI takes a completely different, vision-first approach:
It captures screenshots (or live video streams) of any screen: desktop (Windows, macOS, Linux), mobile (Android, iOS), or even embedded/HMI systems.
Powerful multimodal AI models (you can bring your own from Anthropic, Google, OpenAI, or others) analyze the visual content exactly like a human would: recognizing buttons, text, icons, layouts, and context.
The agent then reasons in natural language or structured code, decides on actions (click, type, scroll, verify), executes them via mouse/keyboard/touch control, and learns from visual feedback.
The open-source Python SDK is your gateway to this world. It lets you create agents like ComputerAgent, AndroidAgent, or even a MultiDeviceAgent that can orchestrate multiple devices simultaneously, all from clean, readable Python code. Key superpowers include:
High-level, goal-oriented instructions (“Log into the CRM and create a new customer record”)
Automatic generation of beautiful, screenshot-rich HTML reports
Support for local execution or AskUI’s cloud infrastructure
Seamless integration with modern LLMs and vision models
In essence, AskUI turns large language models from “chatty thinkers” into reliable doers that can interact with any software the way a real user would. It’s a game-changer for QA automation, robotic process automation (RPA), workflow orchestration, data extraction, and repetitive task handling. No more brittle scripts. No more platform-specific hacks. Ready to try it?
pip install askui
Then head to the excellent documentation for quickstarts and examples here.
While not a massive feature explosion, v0.30.0 delivers targeted, high-impact improvements that address real pain points in agentic automation. Here’s the breakdown:
1. Smart Infrastructure Error Handling – No More Retry Hell
One of the most valuable additions is a new infrastructure-error handling prompt integrated into the core act() loop for Computer, Android, and Multi-Device agents. Previously, transient issues (lost controller connection, expired session, RPC errors, stream closed, service unavailable, or controller timeouts) could cause agents to loop indefinitely, wasting time and money while producing confusing reports. Now, the agent receives clear instructions:
Retry the same tool call once on infrastructure errors.
If it fails again, stop immediately and mark the conversation status as BROKEN.
This prevents unrecoverable loops, ensures failures are reported accurately and early, and keeps your agents graceful under pressure. It’s a small prompt change with outsized reliability gains.
2. Dramatically Enriched HTML Reports
The SimpleHtmlReporter has been supercharged with per-conversation intelligence:
Step count for each conversation.
Human-readable duration (e.g., “2m 14s” or precise “00:02:14.567”) calculated from started_at and ended_at UTC timestamps.
Cache token statistics: cache_creation_input_tokens and cache_read_input_tokens, crucial for understanding and optimizing prompt caching costs.
These additions give you crystal-clear visibility into efficiency, cost, and performance at a glance. Reports now feel like professional audit trails rather than raw logs, perfect for sharing with teams, debugging tricky flows, or proving ROI to stakeholders. Large base64 images are automatically truncated for better readability and smaller file sizes.
3. Quieter, Saner Logging
Tool failures are no longer logged as noisy WARNING entries. They’ve been demoted to INFO level with structured extra fields (tool name + error message). Your terminals stay clean during normal operation while still providing all the debugging context you need when something goes wrong.
Breaking Change (Easy to Fix):
UsageTrackingCallback has been renamed to the much more descriptive ConversationStatisticsCallback. This reflects its expanded role in tracking per-conversation and per-step summaries, including timestamps, durations, and detailed token/cost data.
Before v0.30.0:
Agent hits a connection blip → endless retries → high costs + vague failure
Reports show basic token usage but no durations or step counts
Tool errors spam your logs with warnings
Callback named generically, making the intent less clear
After v0.30.0:
Agent retries once intelligently → fails fast and clearly with “BROKEN” status
Rich HTML reports show exact steps, durations, cache savings, and costs
Clean logs with structured info
Explicit ConversationStatisticsCallback for better code readability.
How to Upgrade and Migrate
Upgrading is straightforward:
pip install - - upgrade askui
Then update your imports:
# Old
# from askui import UsageTrackingCallback
# New
from askui import ConversationStatisticsCallback
# Example usage with reporter
reporter = SimpleHtmlReporter(...)
callback = ConversationStatisticsCallback(reporter=reporter) # optional pricing config
The SDK automatically appends the new callback in AgentBase when you provide a reporter, so most setups will pick up the improvements with minimal changes. Full release notes and changelog are available on GitHub.
Why This Update Matters for the Bigger Picture
In the fast-evolving world of agentic AI, reliability and observability are what separate impressive demos from dependable production systems. v0.30.0 focuses precisely on those foundations, making agents more resilient to the complex realities of infrastructure and providing developers with the insights needed to iterate quickly and control costs. AskUI continues to push the boundaries of vision-based automation, proving that AI doesn’t just need to understand screens: it needs to handle them gracefully when things inevitably go sideways.
Whether you’re automating QA across a fleet of devices, orchestrating complex business workflows, or extracting data from legacy desktop apps, this release makes your agents noticeably more trustworthy. Try the new version today, explore the enhanced reports, and let the agents handle the tedium while you focus on higher-value work.
Resources:
GitHub Release: v0.30.0
Full Documentation: docs.askui.com
Examples & Community: AskUI GitHub
Online Trainings: AskUI Videos
The era of reliable, vision-powered UI agents is accelerating. What repetitive or complex task are you excited to hand off to an AskUI agent now? Drop your ideas in the comments—I’d love to hear what you build!.
Building top-notch software doesn’t have to be a struggle. At NUCIDA, we’ve cracked the code with our B/R/AI/N Testwork testing solution - pairing our QA expertise with your test management tool to deliver streamlined processes, slick automation, and results you can count on. On time. Hassle-free. Ready to ditch future headaches? Let NUCIDA show you how!
Among others, NUCIDA's QA experts are certified consultants for Testiny, SmartBear, TestRail, and Xray software testing tools.
Why Choose NUCIDA?
For us, digitization does not just mean modernizing what already exists but, most importantly, reshaping the future. That is why we have made it our goal to provide our customers with sustainable support in digitizing the entire value chain. Our work has only one goal: your success!
Don’t let testing slow you down. Explore how consulting services can make your software quality soar - headache-free! Got questions? We’ve got answers. Let’s build something amazing together!
With these updates, AskUI is more powerful, flexible, and user-friendly than ever. Whether you’re automating web or Android tasks, integrating custom models, or exploring chat-based automation, AskUI has the tools to make it happen. Of course, the latest AskUI update offers much more. I have only highlighted a few of them in this blog. Head to our AskUI page to get started and see how these new features can elevate your automation workflows and software testing.
Have questions or want to know more about the latest developments in AskUI and AI-driven automation? The NUCIDA QA Team is here to help! Until then, let’s keep pushing the boundaries of testing together. Stay tuned, and happy testing!.
Want to know more? Watch our YouTube video, AskUI Vision Agent Demo 2025, to learn more about the usage and benefits of AskUI.
Logo and pictures from pixabay.com, AskUI, and NUCIDA Group
Article written and published by Torsten Zimmermann