I think what most people are miss understanding when it comes to Antis example is…
People that are into any moba or any fighting game are able to understand and appreciate in a broad sense what is going on. Actually showing quick clips is far from ideal to make this point. I think its easier to see what is going on in a RTS, MOBA or Fighting than it is in a FPS. This is due to multiple reasons…
Quake is actually a good example of this…
Most people find it easy to watch/understand Quake due to the simple nature (to the spectator) 2 men fight it out on a map over items for kills. The fact its only 2 players means there are only really 2 perspectives to take into account. Fighting games take this one step further and have ALL the information displayed on the screen at the same time. Mobas on the other hand have just the one perspective that does not switch. From a specs point of view this is the most like a traditional sport.
The challenge for any wanna be esports fps right now is framing the action…
You can do it by having a game that is slow and paced enough for multiple points of view to not be a problem (CS:GO)
Or you can focus the action and story all around one convergence. (Shootmania Elite, Left 4 Dead and Bombing Run)
Any game that requires multiple points of view, is fast paced and/or separates points of action is going to struggle in its communication of events.
A prime example of how you can build around this problem however is found in SMITE and Natural Selection.
Where the spectators view and players view is actually vastly different.
The problem is that even if SD manage to answer the problems to do with perspective they still have major issues with pacing.
During game time you need time for exposition, story building and hype building.
You also want different stages/phases of play (Start, Middle and End) to avoid boredom and repetition.
Most of all the one thing that Stopwatch lacks in the first half and Objective lacks completely is contrast.
It is hard to get hyped during the first round of Stopwatch unless the time is amazingly crazy good. Simply because you have nothing to compare that time too… Without a point of reference building hype is difficult and meaningless. The double whammy is often in a small amount of time the second round is over with a large chunk of dead time tacked onto the end of the game. If the first team did a great run and the second team mess up the first spawn. It can be physically impossible for them to win but you still have 5 min left on the map. Or it could have been a full hold for the first team and the second team finish the first objective in 30 seconds.
All this is without mentioning the speed + detail of Dirty Bomb means streams are going to look like pants unless rendered at 4mbs minimum, performance is going to limit the player base, gameplay speed makes seeing what you want to see from a first person perspective difficult in spectator and appreciation of skill is diminished due to lack of lasting consequence for the previous actions.
Little design choices further limit understanding and enjoyment for spectators. The ability to switch Mercs mid round, lack of visual excitement and many other little things all add towards the game being not fun to watch for none HARDCORE DB players.
TL:DR - Dirty Bomb suffers from its spectator points of view and will mean the chance of it becoming a popular esport is very diminished. Other games such as Mobs, RTS, Fighting, Duel FPS and CS:GO have already managed to solve some of the problems that many games suffer from in this department. ET, ETQW, BRINK and DIRTY BOMB always have and to this very day fail massively to communicate the skill, story and hype to anyone that is not already from the hardcore community.

