Galvis' lifetime OPS, per b-r, is .648. That said, I would recommend looking at OPS+ rather than OPS, because it's season- and park-adjusted. Freddy's lifetime OPS+ is 76, and he's been at 79 each of the past two seasons (although his unadjusted OPS was better in 2016 than in 2015 - which is why I recommend looking at OPS+, because it washes out league/environmental changes season to season).
With whatever weaknesses, I tend to prefer OPS+, along with some kind of overall evaluation of defense (not dWAR, per se - see below), rather than WAR. A principal problem with WAR as a predictor, at least, is that it's a counting stat. If a player is relatively poor - barely above replacement - but his club decides to give him 650 plate appearances anyway, he will accumulate positive WAR. That doesn't necessarily mean he has any real value to a competitive team. And while we all probably understand that a player who generates 2 WAR in 350 PA is more valuable than a guy who accumulates 2 WAR in 650 PA - we don't cite WAR, or use it routinely, as if we understood that. For example, Freddy Galvis. In 2016, per b-r, he produced 1.3 WAR. That's "nice"...but the Phillies, as as club, ranked 12th in the NL in wins above average at the shortstop position, and Galvis was the SS for 94% of the SS plate appearances.
Good point on the deficiency of OPS (or OPS+) - it somewhat overstates the value of a player like Galvis, whose ISO is relatively good (for a shortstop), but whose BB/PA is poor (in Galvis' case, about 4.7% lifetime, 4.0% in 2016).
Umm...yeah. Without doing any kind of statistical analysis, my impression is that dWAR stats typically vary significantly from year to year, to the point that a seasonal dWAR statistic looks relatively useless. As andyb notes, UZR is subjective. Andyb's being kind, IMHO, in saying "there is some subjectivity..." - there is "some subjectivity" in official scorers' decisions about errors, as well, and sabremetricians tend to recognize that and discount error counts - even as they tout WAR, which is built on UZR, and which depends on the same kind of subjective decisions about what "should have" happened.
My opinion... we tolerate UZR, because we have access to it. All of our statistical tools have flaws/weaknesses, of course, and we use what we have. But IMHO, we collectively tend not to differentiate between those measures much on the basis of how flawed they are; particularly with defensive measures/tools, we just don't have much that's not seriously flawed. We long ago rejected fielding percentage, because of its dependence on subjective scorer judgment (and it's complete inability to evaluate range); we've rejected RF/9 because of questions about opportunities available (due to differences in pitching staffs, impact of neighboring fielders, etc.). UZR attempts to address the problems with RF/9, but does so by depending on the subjective judgment of scorers (the problem with FP). We're going in circles?
A thought - and I have no idea how to even obtain an answer here... If a given player's dWAR/UZR varies wildly season to season...is there any way to analyze the relationship between the reported UZR and the individuals who are making the decisions about whether a play should have been made? In other words, is the variance (or part of the variance) in seasonal UZR an artifact of who is doing the scoring?