What makes mathematical thinking different from all other kinds of thinking is that it is exact. Mathematical thinking proceeds by the application of exact rules, and produces answers which are guaranteed to be correct. And no matter how many steps occur within a sequence of mathematical deductions, if each step is mathematically correct, then the final result will be correct.
In the "real world", most of our thinking is inexact. We apply rules which work most of the time. Like: "if a girl smiles at me, then she likes me". Or "birds can fly".
In the real world, there is the concept of over-thinking. If we think about something too much (which, for example, tends to happen when people do philosophy), we are likely to come to conclusions which are ridiculous.
But in mathematics, there is no such thing as over-thinking. Once the rules of exact thinking have been laid down, we are free to apply them as many times as we wish, without any fear of falling into error.
The "benefit" of learning mathematics, if indeed there is such a benefit (and this can vary somewhat depending on the learner and what the circumstances of their life is), is the benefit of learning how to think "exactly", and of learning that there is indeed such a thing as exact thinking, which may be somewhat different to the ordinary "inexact" thinking which makes up most of the one's everyday thoughts.
A mathematical system is typically defined formally as follows:
(I've left out a few technical details here, like how the axioms and theorems have to be written in some symbolic language, and the rules have to given as a set of operations which are permitted to be performed on theorems expressed as sequences of symbols in order to generate new sequences of symbols representing newly proven theorems.)
A set of rules like this tells us how to do mathematics, within a particular mathematical system, but it doesn't tell us what the theorems in the system actually mean.
In order for the results of our theorem-proving activities to be useful, we want the theorems to be statements about something, so that the process of proving theorems is then telling us something new that is useful to know.
If our knowledge and understanding of the real world was itself exact, then we could freely apply mathematical thinking to all aspects of thinking about the real world. Unfortunately we don't have an exact knowledge and understanding of the real world, which somewhat limits the applicability of mathematics to the daily problems of real life.
However, we can always assume that we have exact knowledge and understanding of some component of reality. Having made such an assumption – or assumptions – we can freely apply mathematical thinking to deduce any number of consequences of our assumptions, and we can compare those deductions to our observations of the real world.
In effect, this is a simplified definition of what science is, and a description of how science is done. (When talking about this from a scientific point of view, the "assumptions" are normally called theories, and the "deductions" are normally called predictions.)
The benefits of this process are somewhat indirect. After all, our initial assumption could just be wrong, and it could be wrong even if no discrepancy is detected between the consequences deduced from the assumptions and our observations of the real world. However, in practice, if we maximise the simplicity of our assumptions, and maximise the number of consequences that we deduce and test, then any assumptions that pass enough tests usually turn out to give us some useful information about the world, in that typically such assumptions continue to give us correct answers. And even when a previously un-falsified set of assumptions is falsified by some new observation (or some new deduction compared to an existing observation), typically we can evolve the newly falsified set of assumptions into a new better more encompassing set of assumptions, taking into account information about how the old assumptions worked for all the observations which they did explain).
Mathematics is most useful for understanding a system which follows exact and known rules. Which means that mathematics is very good for talking and thinking about mathematics.
In particular, we can formulate one mathematical system, let us call it System X, and then we can formulate a second mathematical system, System Y, and we can interpret the theorems of System Y as telling us statements about the provability or otherwise of theorems of System X.
It might not be immediately clear what the benefit is of such arrangement. If a theorem T can be proven in System X, what is the point of being able to prove a theorem T' in System Y whose meaning is that T is a theorem of System X?
In practice, the benefit is that often the number of steps to prove theorem T' may be much smaller than the number of steps required to prove the original theorem T.
To give a very simple example, System X might represent a theory of arithmetic which tells us how to add numbers by simple counting (like 3 + 4 = (counting up from 3: 4,5,6,7), the answer is 7), and System Y might be a theory about System X that tells us how to add decimal numbers using the normal system of adding digits from the right and carrying where necessary. So I can use System Y to add 341 + 299 to get 640 and System Y is telling me that I would get the same answer if I started at 341 and counted 299 steps to get to 640, which is how I would have to do the addition in System X. (I've left out a technical detail here that I am assuming numbers already have a decimal representation, and that System X includes rules about counting forward with decimal numbers.)
One of the more profound discoveries of modern mathematics (made by the Austrian mathematician Kurt Gödel) is that there are limitations to how much mathematics can be applied even to itself. In particular, no mathematical system which has an interpretation as describing itself can be complete as a description of truths about itself (i.e. there will be statements which will be true, but it won't be possible to prove the statements whose interpretation is that those statements are provably true), and no such system can be used to prove its own consistency (i.e. to prove that it won't give wrong answers).
However these limitations do not alter the fact that a lot of mathematics is about other mathematics, and that such "meta-mathematics" often saves us a lot of work in practice.
In theory, mathematics and computation are the same thing. This follows, because we can describe computation in the same terms that I formally defined mathematical deduction:
So, for example, we can regard 341 + 299 = 640 as a computation, or, we can regard it as a theorem which is proven to be true.
In practice, the difference is that mathematics is something that people do, and computation is something that machines do (or which a machine could do, even if a person might do it sometimes, so the distinction relates to the difficulty of the thinking involved).
In this article I've tried to explain what I think is the essence of mathematics.
To keep it simple, I've left out all sorts of important details, and I've probably even said a few things that aren't completely true. In this section I attempt to make up for some of these short-comings (or at least confess to them).
In the discussion above, I mocked (within parentheses), the inexact nature of philosophical thinking. But of course this whole article is an article about the Philosophy of Mathematics. And I don't think there is any way it can be reduced to the application of a set of formally defined rules of deduction to a set of initial axioms.
There are some parts of mathematics which explicitly deal with types of knowledge which are not exact. The biggest of these is Probability and Statistics.
It is a peculiar fact about probability that the ultimate definition of probability is somewhat circular, i.e. the Law of Large Numbers, which more or less says that if you observe an event with probability 1/p a "large" number of times, you will probably observe that it occurs with a frequency "close" to 1/p.
Another type of mathematics dealing with "inexactness" is Modal Logic, which, among other things, sometimes deals with statements which "might" be true (without assigning any specific probability to the truth of such statements).
Mathematicians can make mistakes. Computers can also make mistakes (actually computers can make several kinds of mistakes for various reasons, including mistakes by the people who program them, faulty hardware, and "acts of nature" such as high-energy cosmic rays).
There are even areas of mathematics which deal with methods for reducing the probability of errors in systems which can't avoid the occurrence of certain types of errors. In effect one can think exactly about how to mitigate the inexactness of exact thinking.
A mathematical proof is defined by a precise series of formal steps. But in practice, actually filling in all the details is a lot of work. Usually, when publishing or explaining a proof, mathematicians typically provide enough detail such that a reader could, in principle, fill in all details if they were so inclined.
However, a modern alternative is the interactive theorem assistant, such as Coq or Isabelle. These software tools are like very strict math journal reviewers that refuse to accept a proof unless every single required detail is provided (or provided in a manner such that the proof assistant itself can fill in any gaps).
Furthermore, for many of these assistants there are proofs of major mathematical theorems which have been posted online and proven true to the satisfaction of the assistants. (For example, see here for an archive of theorems proven using Isabelle.)
"Standard" set-theory combined with "classical" logic is supposedly "about" something, yet some of its axioms are explicitly "non-constructive", which implies that they have no computational meaning.
On the one hand such non-constructive mathematics is simpler to do than explicitly constructive mathematics (because with non-constructive mathematics you believe in "more truths" to start with, which makes it easier to prove new theorems), on the other hand the value of theorems proved within it are less certain.
As it happens, most of modern mathematics is supposedly based on this foundation of non-constructive set theory, however in practice much of the mathematics which actually matters for real world applications can be proven using constructive methods only.
In my informal definition of formal deduction given above, I did not say anything about the order in which rules are to be applied, and this is because there is no specified order. At any point in a procedure for proving new theorems, one has to choose which deduction to apply next.
Much of the "skill" of human mathematicians consists of deciding which choice to make next. Since any choice from the list of available choices is valid, it follows that, once restricted to the set of valid choices, there are no rules as to which choice should be chosen. Mathematics is all about following rules, yet the very doing of mathematics in a practical sense involves making choices, and there are no rules at all saying which choice should be made.
The caveat "in a practical sense" matters, because if we are prepared to be enormously patient, then it is always possible to define a deterministic enumeration of all possible deductions which guarantees to eventually enumerate all provable theorems in a mathematical system. One could even say that the "art" of mathematics consists of picking a finite amount of "good stuff" out of an infinite amount "junk", and getting there sooner rather than later.
Which raises a further question: What is "good mathematics"? This is an open-ended subject in itself, and an interesting read on that subject is What is good mathematics? by Terence Tao.