Rare Events, Gun Violence, and the Nefarious Large Organization

Suppose you are the Nefarious Large Organization (NLO) and you want to kill lots of random Americans but you do not want your fingerprints on the trigger weapons.

How might you go about doing this? Having a long time frame helps. You're going to need a large political arm that can help with loosening laws to create the conditions you need. You'll need to grow your NLO organization.

A two prong approach can work wonders. You need to get large numbers of weapons into the hands of lots and lots of people.

First, which weapons? Knives don't kill many people, are very personal, and are hard to work with. You might get hurt trying to use a knife, especially if someone fights back. Bombs are difficult to make and have a habit of blowing up their amateur makers as much as blowing up the intended targets. Plus they need to be hidden. There is a military complex that makes bombs, and there is a cottage industry in bomb making in parts of the third world, but in the first world, bombs perhaps aren't optimal. You would need better bomb training and if your NLO started marketing bomb making courses, people might catch on when bombs with your designs started killing lots of people. Guns would be better, as other companies make them. Your NLO doesn't need to make guns, just advocate for their use and acquisition. Although guns take some skill, that skill is easily acquired. Or you can shoot into large crowds where it is hard to miss. But Saturday night specials and revolvers have a limited number of bullets. Having guns with lots of bullets that shoot quickly in the hands of many people is paramount for achieving your goal of killing lots of Americans. Thus the need for automatic firearms.

This is where your NLO political operation becomes paramount. It takes a long time frame and plenty of cash to get politicians to set the stage where you can get these automatic firearms into the hands of lots of people. This may take you decades. Advertising can induce a cachet to owning automatic firearms and after the population of owners gets large enough, network effects will work to spread their availability and ownership.

If you can get enough weapons into enough hands, the 'which people' part will solve itself as there are always some people willing to work with you in your goal of wanting to kill lots of Americans without your fingerprints on the weapons. You won't even need to communicate directly with those people.

Humans do not classify neatly into 'good people' or 'bad people'. We're all good some of the time and we're all bad at least occasionally. The fractions of good/bad vary from person to person and over time within a person. Toss in a rare event (what people call 'bad luck' or 'an accident') and something terrible can happen. You speed, you drink and drive, you walk atop a wall, you clamber up a steep cliff. People can be immature, then they grow up, get a job, get married and settle down. People can be fine, then get depressed or desperate. Your spouse or mother dies and you have no one caring for you, keeping you safe and on an even keel, keeping those mental demons quiet. You suddenly are isolated, alone. You have difficulty in school, get teased a lot.

It's a fiction that we just have to keep the guns out of the hands of the 'bad people'. Truth is that most of our killers are 'good people', legally speaking, right up until they actually start killing people. Medically, it's virtually impossible to distinguish people at risk of doing bad things some time in the future and those who are not. Certainly before they bought or inherited that first gun, before their friend or neighbor or local advertising circular introduced them to that gun, they were fine, legally speaking. They buy that first gun, they still classify as 'good people'. Before something bad happened to them, before they lost their job, before their last parent died, before they were picked on, before they became depressed or suicidal, people were 'good people'.

You the NLO need to get those guns into the hands of millions of people. Some of those people are going to have problems. They become sick. They may get depressed. They get in a car accident. Socially they are isolated, or they become isolated. They don't fit in. They get teased. They feel like they are teased, even if they are not. Mild paranoia may set in. They think strangers and foreigners are going to take what is rightfully theirs. They may not have been raised with a full set of social skills to handle modern society. These people aren't all that common, they're not the majority. Most of us humans are doing fine, are decent people and aren't teased so badly that we think about becoming killers or drug addicts. That's why you need millions of gun owners. Not all of those millions of gun owners are going to have good sense. They may give or sell their guns to some troubled soul. They may encourage some troubled soul to buy guns as a way to self-worth. They themselves may grow to have psychological problems themselves even though they've been fine for decades. Take millions of people, and some of them are just going to be 'off'.

You make things easy, those bad things you want to have happen will happen more often. Make it easy to commit suicide, more people will commit suicide. Make it easier to commit mass murder, more people will commit mass murder. We are not talking about making the average person commit suicide or mass murder. Just those in the extremes. Just loosen the laws enough so that one person in a million has the ability to shoot someone else or to shoot themselves.

Similarly, reduce health care, screw up schools, reduce social services, make lives more miserable, increase access to drugs, do what you can to cause more difficulties for more people. This will provide a lush environment to grow people with the potential to have further problems.

Maybe these troubled individuals are only one in a million. This is the problem and the issue of rare events. This is where you, the NLO can hide behind the problem of rare events. How could we foresee some random, rare event occurring? Who us? How could we have figured out that lone individual in Las Vegas or Parkland or Orlando or Sandy Hook might be that one individual who feels so aggrieved that we wish we could have seen them coming? Gee, not my fault you say.

If an event is one in a million, then you need millions of attempts to get a 'success'. If you're the NLO, you need millions of gun owners, so that your guns end up in the hands of that one person who decides that killing a lot of others is the solution to their problems. Low probability events happen when you have lots of chances. Someone always wins the lottery. Your NLO political arm has been working tirelessly for years. Loosening laws, fighting hard, fighting dirty, making glib arguments, buying off amenable politicians. You can tolerate rare events, because you've gotten your guns into the hands of millions of people. You've got millions of chances to hit the jackpot.

You make things easy, those bad things you want to have happen will happen more often. Make it easy to commit suicide, more people will commit suicide. Make it easier to commit mass murder, more people will commit mass murder. We are not talking about making the average person commit suicide or mass murder. Just those in the extremes. Loosen the laws enough so that one person in a million has the ability to shoot someone or to shoot themselves. Eliminate health care, screw up schools, reduce social services, make lives more miserable, do what you can to cause more difficulties for more people. This will provide a warm environment to grow people with the potential to have further problems. You don't need everyone to want to do your bidding, you just need one person. One person in a million.

Modeling is important and useful. Having killers getting lots of press attention in the popular media is very helpful to your NLO plan. More and more of your automatic rifle owners see that killing lots of people is a really neat way to get some attention. The media is on your side, NLO, because the media will spread around the how-to and the what-to and the fame of these killers.

And when you've done your job right, your NLO fingerprints aren't on the trigger. When the killing is over, there is no obvious link between killers. But you've achieved your purpose. Setting many many small probability events in motion, incubating, waiting to see who cracks next, who decides to shoot next.

As the smoke settles, as survivors decide whether to tear down or rebuild the building, as survivors make memorials and attend funerals, your spokespeople say: don't politicize this, don't make decisions in the heat of the moment, don't do something you'll regret later. Your political arm materials just write themselves, don't they?

Sadly the memorials need to be written too. She wanted to be a scientist. He was everyone's friend. They just wanted to watch a good movie. She was an inspiring teacher. He was a great football coach. He was here on vacation. She was in 1st grade. I don't suppose, NLO, that you'd like to contribute to writing the memorials for the people killed today? How about the people killed yesterday? Your machinations are working NLO, congratulations. Maybe you'll work on the future memorials for those killed tomorrow? We appreciate your assistance.

Time to Update the P-Value Dichotomy to a Trichotomy

In executing a classical hypothesis test, a small $p$-value allows us to reject the null hypothesis and declare that the alternative hypothesis is true.

This classical decision requires a leap of faith: if the $p$-value is small, either something unusual occurred or the null hypothesis must be false.

These days we should add a third possibility. That we searched over several models and methods to find a small $p$-value. We need to update the $p$-value oath of decision making to state: Either something unusual happened, we searched to find a small $p$-value or the null hypothesis is false.

Note that being Bayesian doesn't necessarily avoid this problem. Suppose a regression model $Y = X\beta+ \mbox{error}$. Apologies for not defining notation, except that $\beta$ is a $p$-vector with elements $\beta_k$. One way to define a one-sided Bayesian $p$-value is the posterior probability that $\beta_k$ is less than zero. If this probability $P(\beta_k \lt 0 | Y)$ is near 0 or near 1, then we declare "significance". Basically the Bayesian $p$-value tells us how much certainty we have about the sign of $\beta_k$. The usual classical $p$-value is approximately twice the smaller of $P(\beta_k \lt 0 | Y)$ and $P(\beta_k \gt 0 | Y)$. How close the approximation is depends on the relative strength of the prior information to the information in the data, the observed Fisher information. The Bayesian $p$-value is subject to the same maximization by search over models as the classical $p$-value.

Bayesians have an alternative to merely searching over models however. We can do a mixture model (George and McCulloch 1993, JASA; Kuo and Mallick 1998, Sankhyā B) and incorporate all the models that we've searched over into a single model to calculate the $p$-value.

Why Be Bayesian? Let Me Count the Ways

In answer to an old friend's question.

  1. Bayesians have more fun.
    1. Our conferences are in better places too.
  2. It's the model not the estimator.
  3. Life's too short to be a frequentist: In an infinite number of replications ...
  4. Software works better.
    1. Rather surprisingly, Bayesian software is a lot more general than frequentist software.
  5. Small sample inference comes standard with most Bayesian model fitting these days.
    1. But if you like your inference asymptotic, that's available, just not high on anyone's priority list.
    2. We can handle the no-data problem, all the way up to very large problems.
    3. Don't need a large enough sample to allow for a bootstrap.
  6. Hierarchical random effects models are better fit with Bayesian models and software.
    1. If a variance component is small, the natural Bayes model doesn't allow zero as an estimate, while the natural maximum likelihood algorithms do allow zero. If you get a zero estimate, then you're going to get poor estimates of standard errors of fixed effects. [More discussion omitted.]
    2. Can handle problems where there are more parameters than data.
  7. Logistic regression models fit better with Bayes
    1. If there's perfect separation on a particular variable, the maximum likelihood estimate of the coefficient is plus or minus infinity which isn't a good estimate.
    2. Bayesian modeling offers (doesn't guarantee it, there's no insurance against stupidity) the opportunity to do the estimation correctly.
    3. Same thing if you're trying to estimate a very tiny (or very large) probability. Suppose you observe 20 out of 20 successes on something that you know doesn't have 100% successes.
    4. To rephrase a bit: In small samples or with rare events, Bayesian estimates shrink towards sensible point estimates, (if your prior is sensible) thus avoiding the large variance of point estimates.
  8. Variance bias trade-off is working in your favor.
  9. Frequentists keep reinventing Bayesian methods
    1. Shrinkage estimates
    2. Empirical Bayes
    3. Lasso
    4. Penalized likelihood
    5. Ridge regression
    6. James-Stein estimators
    7. Regularization
    8. Pittman estimation
    9. Integrated likelihood
    10. In other words, it's just not possible to analyze complex data structures without Bayesian ideas.
  10. Your answers are admissible if you're Bayesian but usually not if you're a frequentist.
    1. Admissibility means never having to say you're sorry.
    2. Alternatively, admissibility means that someone else can't prove that they can do a better job than you.
    3. And if you're a frequentist, someone is clogging our journals with proofs that the latest idiocy is admissible or not.
    4. Unless they are clogging it with yet more ways to estimate the smoothing parameter for a nonparametric estimator.
  11. Bayesian models are generalizations of classical models. That's what the prior buys you: more models
  12. Can handle discrete, categorical, ordered categorical, trees, densities, matrices, missing data and other odd parameter types.
  13. Data and parameters are treated on an equal playing field.
  14. I would argue that cross-validation works because it approximates Bayesian model selection tools.
  15. Bayesian Hypothesis Testing
    1. Treats the null and alternative hypotheses on equal terms
    2. Can handle two or more than two hypotheses
    3. Can handle hypotheses that are
      1. Disjoint
      2. Nested
      3. Overlapping but neither disjoint nor nested
    4. Gives you the probability the alternative hypothesis is true.
    5. Classical inference can only handle the nested null hypothesis problem.
    6. We're all probably misusing p-values anyway.
  16. Provides a language for talking about modeling and uncertainty that is missing in classical statistics.
    1. And thus provides a language for developing new models for new data sets or scientific problems.
    2. Provides a language for thinking about shrinkage estimators and why we want to use them and how to specify the shrinkage.
    3. Bayesian statistics permits discussion of the sampling density of the data given the unknown parameters.
    4. Unfortunately this is all that frequentist statistics allows you to talk about.
    5. Additionally: Bayesians can discuss the distribution of the data unconditional on the parameters.
    6. Bayesian statistics also allows you to discuss the distribution of the parameters.
    7. You may discuss the distribution of the parameters given the data. This is called the posterior, and is the conclusion of a Bayesian analysis.
    8. You can talk about problems that classical statistics can't handle: The probability of nuclear war for example.
  17. Novel computing tools -- but you can often use your old tools as well.
  18. Bayesian methods allow pooling of information from diverse data sources.
    1. Data can come from books, journal articles, older lab data, previous studies, people, experts, the horse's mouth, rats a** or it may have been collected in the traditional form of data.
    2. It isn't automatic, but there is language to think about how to do this pooling.
  19. Less work.
    1. Bayesian inference is via laws of probability, not by some ad hoc procedure that you need to invent for every problem or validate every time you use it.
    2. Don't need to figure out an estimator.
    3. Once you have a model and data set, the conclusion is a computing problem, not a research problem.
    4. Don't need to prove a theorem to show that your posterior is sensible. It is sensible if your assumptions are sensible.
    5. Don't need to publish a bunch of papers to figure out sensible answers given a novel problem
    6. For example, estimating a series of means $mu_1, mu_2, \ldots$ that you know are ordered $mu_j \le mu_{j+1}$ is a computing problem in Bayesian inference, but was the source of numerous papers in the frequentist literature. Finding a (good) frequentist estimator and finding standard errors and confidence intervals took lots of papers to figure out.
  20. Yes, you can still use SAS.
    1. Or R or Stata.
  21. Can incorporate utility functions, if you have one.
  22. Odd bits of other information can be incorporated into the analysis, for example
    1. That a particular parameter, usually allowed to be positive or negative, must be positive.
    2. That a particular parameter is probably positive, but not guaranteed to be positive.
    3. That a given regression coefficient should be close to zero.
    4. That group one's mean is larger than group two's mean.
    5. That the data comes from a distribution that is not a Poisson, Binomial, Exponential or Normal. For example, the data may be better modeled by a t, gamma.
    6. That a collection of parameters come from a distribution that is skewed, or has long tails.
    7. Bayesian nonparametrics can allow you to model an unknown density as a non-parametric mixture of normals (or other density). The uncertainty in estimating this distribution is incorporated in making inferences about group means and regression coefficients.
  23. Bayesian modeling is about the science.
    1. You can calculate the probability that your hypothesis is true.
    2. Bayesian modeling asks if this model describes the data, mother nature, the data generating process correctly, or sufficiently correctly.
    3. Classical inference is all about the statistician and the algorithm, not the science.
    4. In repeated samples, how often (or how accurately) does this algorithm/method/model/inference scheme give the right answer?
    5. Classical inference is more about the robustness (in repeated sampling) of the procedure. In that way, it provides robustness results for Bayesian methods.
  24. Bayesian methods have had notable successes, to wit:
    1. Covariate selection in regression problems
    2. Model selection
    3. Model mixing
    4. And mixture models
    5. Missing data
    6. Multi-level and hierarchical models
    7. Phylogeny

The bottom line: More tools. Faster progress.


Student name: Changhee Lee
Department: Electrical engineering