Windows Event Collection (aka WEC / WEF) is an awesome feature of Windows that allows you to build a logging pipeline connecting every Windows system on your network to a few central collectors from which your various SIEM, UEBA and security analytics tools can then efficiently consume all your logs.
But like many Windows features, it’s a universal foundation technology that you have to design, adapt and manage. One of the biggest challenges with WEC is manageability when it comes to thousands of forwarders and multiple collectors.
I’ve been helping some of the world’s largest companies for years to use WEC to collect logs from hundreds of thousands of servers and workstations (forwarders) and we’ve learned a lot.
Especially in the past year. We’ve come up with an entirely new way to efficiently balance large numbers of forwarders between WEC collectors. But to appreciate the innovation you must understand how companies– including Microsoft – distributed the load of forwarders between collectors in the past.
Let’s say you have 30,000 Windows systems and 3 collectors. You would create the same WEC subscription on each collector defining the same source logs and events to collect and the same destination log on the collector. Then you would create 3 groups in AD for – one for each collector. You would take those 30,000 computers and make 10,000 a member of each group. Finally, you’d assign one of the groups to the subscription on each collector. As long as you took into account the computer’s status in AD (enabled/disabled and LastLogonTimeStamp) and kept up with new computers and attrition of old ones you then had a nicely balanced log collection pipeline.
Except for one big gotcha – the Achilles’ heel of Windows Event Forwarding. Since you were assigning computers to WEC subscriptions based on group membership, the computer only subscribed and started forwarding its events once it finally recognized it was now a member of the corresponding group. I say finally because it took a long time – a Windows computer doesn’t see group membership changes involving its computer account in AD until it reboots. So, it could take weeks or longer for a Windows Event Collection pipeline to ramp up. The only way to speed it up was to schedule mass reboots or to reach inside systems and run klist to purge Kerberos tickets – not very practical. To help companies deal with this I even wrote a free utility that slept as a Windows service in the background until it noted from AD that the local computer’s group memberships had changed and then ran klist. But none of these were good options.
One day someone asked me on a forum why not just assign computers directly to a subscription so there’s no group membership change to wait around for the computer to see. It wasn’t the first time I’d thought of that, but I’d always dismissed it because it seemed so wieldy to dump tens of thousands of computers into a single subscription’s Allowed Domain Computers list. And I had real concerns about the performance impact. But faced with the growing problem of group-based load balancing I decided to actually test it.
I immediately ran into a roadblock. WEC simply doesn’t allow you to add more than about 1800 entries to a given subscription. Like many things in WEC, it’s not documented but it seems to have to do with a hardcoded memory allocation limit. But we came up with a solution. And companies that have been testing our new load balancing method can spin up a new subscription and see thousands of computers respond within seconds. One company had an urgent requirement to start collecting PowerShell events because of a new living off the land attack and they were able to respond immediately and start getting those events from systems across their global network without touching a single system.
In this real training for free event I will briefly give anyone new to Windows Event Collection a quick introduction to how it works and then I will show you how we were able to vastly simplify and accelerate logging pipeline construction based on WEC by:
- Eliminating reliance on AD groups
- Eliminating the need for special AD permissions or OUs
- Overcoming the 1,800-forwarder limitation on WEC subscriptions
- Confirming there is no negative performance or throughput impact
Our new method for scaling WEC has turned out better than I ever imagined, and we have performance charts and real world in-the-field experience to back it up. After my educational presentation, Barry Vista, from our sponsor LOGbinder, will demonstrate how Supercharger for WEC incorporates this innovation and provides automated management of WEC-based logging pipelines.
Please join us for this real training for free event.