Much of what we think we know about music comes from how we think music relates to other things, either in how other things interact with music, or in how similar music is to other things.
The general theme of the following propositions is that we typically get all or most of these relationships wrong, which means we end up thinking that we know a whole lot of things about music that actually we don't.
One theory of music is that music primarily serves some sexual purpose, for example, perhaps music exists so that men can seduce women by singing to them.
The problem with this hypothesis is that for the most part music and sex interact very weakly, and even if both happen at once, it is unclear that this is anything more than just two pleasurable things happening at once, or, in some cases, music interacts with emotion, and those emotions may include emotions that we have about the opposite sex.
In order for a few rock stars to get laid, a lot of other people have to do a lot of work just wanting to listen to music. It's not at all clear what evolutionary advantage there is to being one of millions of people who enjoy a particular rock band (and you might be a man or a woman), just so that four band members can get a few hundred easy lays.
An alternative hypothesis to the "music gets rock starts laid" theory is that it is intrinsically sexy to a women to perceive that a man has an immediate and positive effect on how a large number of other people feel.
By this criterion, a male guitar player at a rock concert is sexy, but so is a male politician, or any other man who is seen to have direct and immediate power and influence over other people.
Music is sometimes considered one of the "arts", and it can be considered art if we define art to be any form of entertainment based on contrived sensory stimuli.
This can lead to the conclusion that "explaining" music is just a sub-problem of the larger problem of "explaining" art.
However I think there are a number of significant differences between music and visual art which suggest that even if we can explain visual art, we are still no closer to explaining what music is:
Another group of the arts are what we might call the "story-telling" arts, which includes fictional books, much of television, and most movies.
Music is an almost essential component of movies, due mainly to its interaction with emotions (and perhaps even more so its interaction with fictional emotions), but music itself does not tell a story. (Yes, some songs "tell a story", but this would appear to be a totally non-essential aspect of music, as non-vocal music does not have any explicit story associated with it, and even many songs have lyrics which do not "tell a story".)
In short: humour tells us to take stuff less seriously, whereas music tells us to take stuff more seriously.
Humour can be understood as a mechanism for discovering certain types of mistakes in our beliefs. One consequence of this is that many forms of humour have a very limited useage, i.e. a joke is only funny once, which would never be the case for music (i.e. if you liked it once, you will almost certainly like it a second time and a third time, even if boredom may eventually set in).
A typical joke may lead us to believe something, and then suddenly show us that we were wrong to assume something leading to that belief.
Music, on the other hand, can encourage us to believe something (especially if there is an emotion associated with the thing to be believed in), or to assume that something is true and ... that's it. There is no denouement.
Music encourages us to believe more, and does not do anything to take that belief back away from us. Thus the use of music as a background to particular types of emotional scenes in movies, and also in situations where extra "believing" is required, like political gatherings, and religious ceremonies.
It is possible to combine music and humour, but there will always be a tension between the two opposed tendencies. My own personal observation is that the most intense music cannot also be humorous.
(A couple of examples of recent mainstream pop music with a humorous element: "The Lazy Song" by Bruno Mars, and "I'm on a Boat" by The Lonely Island, featuring T-Pain.)
Jokes are usually not funny if told a second time. Some types of humour are more repeatable, because they are based on assumptions about life that "resist" the tendency to not be taken seriously.
Stories are usually only truly entertaining the first time we hear them, or read them or see them.
Music, on the other hand, eventually becomes boring, especially if we hear the same song over and over again, but music can stand a lot more repetition than humour or story-telling. Furthermore, music performances typically contain substantial repetition within the one performance, e.g. a song may have a verse and a chorus, and both of these components are repeated several times within the performance.
Any theory which attempts to explain music in terms of expectation or "surprise" has to deal with this observation that music is relatively resistant to the development of boredom, and that whatever it is that makes us enjoy music, we will enjoy a song even if it is exactly the same song we have heard many times before, and even if it is exactly the same recording of the song.
Yes, people do go to concerts to listen to music with other people, and yes, people do perform music in groups.
But equally, we can play music to ourselves, or listen to music completely by ourselves, if we have a stereo or something that plays music without requiring the presence of a human performer.
It's not clear that music is any more necessarily social than, for instance, eating. A substantial number of the world's meals may be prepared by people in groups and/or eaten by people in groups, but there is nothing essentially social about cooking and there is nothing essentially social about eating. Socialisation is certainly not the purpose of either cooking or eating.
Unfortunately it is almost impossible to determine, from direct observation, whether or not animal music exists, because we have no objective definition of what music is, and there is no way to share subjective experiences of music (or anything else for that matter) with non-human animals.
So I will give some reasons, based on other hypotheses advanced here, as to why animal music probably doesn't exist:
That music has some relationship to emotion is reasonably obvious to anyone who subjectively experiences music (which is most of us).
But it is not all clearly precisely what this relationship is. Much has been written by those attempting to pin this relationship down, but without coming to any definite conclusion. So I will advance a hypothesis.
One circumstance where music acts strongly on emotion is when a person thinks about something that has happened, or which they have previously discovered is going to happen.
This circumstance is distinct from the actual situation where the person first learned of the event or situation in question.
To give a simple example: if you are going to break up with your girlfriend, who loves you, you would not play sad music when you were about to tell her, but probably she would play sad music to herself, later, when she was ruminating on the fact that you had broken up with her.
A different circumstance where music acts on emotion is when the emotion is fictional emotion (such as in a movie), which is also, due to the fictional nature of the emotion, somewhat removed from the direct original experience of the emotion. (Note: the use of music is not limited to fictional emotion, as music is also often used in documentaries to accentuate emotional responses to events and situations described in the documentary.)
Music does seem similar to speech in various ways, but it is not at all clear what the relationship between music and speech actually is.
I will elaborate on some of the details of this relationship in the following items.
Two major aspects of music and speech that relate are:
Another aspect plausibly found in both is tree-like structuring with the tree branching mostly binary (i.e. one branch always branches into two).
In particular, musical rhythm is very regular, and is usually based on nested regular beats, e.g. 4 beats to a bar, with 4 beats grouped into 2 groups of 2, and individual beats optionally divided further into halves and quarters (which in this case results in a total of 5 different regular beats with beat length of 1 bar, 1/2 bar, 1/4 bar, 1/8 bar and 1/16 bar)
And in the case of melody, musical pitches are restricted to values on a scale, whereas speech pitch will typically be a continuously varying function of time. (Although musical melody will often be approximately continuous, if we regard any step of one step on the scale as a continuous variation.)
Binary structures in music are typically balanced, i.e. if one branch has sub-branches to a depth of N, then so will the second branch. Within normal speech, the most obvious branching structure is grammatical, and this has no requirement at all to be balanced, if anything the second branch will on average have more branches than the first branch, as this reduces processing requirements for the listener.
This is more of a hypothesis than a confirmed observation, as it requires a certain amount of interpretation as to what constitutes a correspondence between a musical aspect and a speech aspect.
For example, there are no scales, but this can be interpreted as a constraint which exists in the musical aspect of melody, and which does not exist in the corresponding speech aspect of melody.
Particular aspects of music which may raise difficulty with this hypothesis are harmony, rhyming and repetition of phrases.
Harmony is generally defined to be the simultaneous occurrence of different pitch values in music, and this is one thing that obviously does not occur in normal speech. However, we can define the perception of harmony to be the perception of relationships (particularly consonant relationships) between different pitch values, whether or not those different pitch values occur simultaneously or at different times.
By this definition, harmony is primarily a property of a single melody, in that harmonic relationships are perceived between different pitch values occurring at different times, but is perceived in such a manner than it can be perceived in relationships between different pitch values occurring simultaneously (and more strongly in that case), even though such simultaneous pitch values never occur in normal speech.
Rhyming is something that does not occur in normal speech, but does occur very consistently in vocal music (at least it occurs in almost all modern mainstream popular music).
However, it can be observed that rhyming almost always occurs in a manner which helps to define the binary structure of music, which suggests that it occurs as part of the perception of that binary structure (perhaps to emphasise the intended structure of the music), and is thus not a separate primary aspect of music in itself.
Repetition of phrases of is an obvious feature of music (and here I am referring to repetition of phrases within a melody, and not to repetition of a beat, or repetition of a whole melody within a performance).
Such repetition does not occur very much in normal speech. For example, English is very intolerant of repetition in normal speech, although it can be allowed informally for particular effect (for example: "he was talking, talking, talking"). Other languages may allow repetition of words for certain types of emphasis.
One feature of the perception of repetition of phrases in normal speech, is that if it does occur, it is certainly noticed. This suggests that it is always available as a component of language, even if many languages choose not to use it.
It is also possible that the perception of speech as a sequence of words occurs by a basic mechanism that does not reliably distinguish repetitions, because it calculates state as a function of the previous n words, and therefore some additional special mechanism is required to keep track of repetitions, should they occur. (This is a somewhat open-ended topic for discussion and speculation, so I won't take it any further just here.)
To render a reasonable version of most melodies requires just one consonant and one vowel, but no more than that. Thus a tune can be sung in recognisable form with all the syllables sung as "la", or, "doo". At least one vowel is required, to sing the pitch values, and although one can try to sing with just one vowel sound, using a consonant makes it much easier to clearly define the rhythm of the melody (in fact if you use just one vowel, you will end up defining the beginning of each syllable with a "break" sound, which is almost a consonant anyway, and actually is a consonant in some languages).
This suggests that music (or at least music perception), might have "evolved" at a time when human language actually did have just one vowel and one consonant.
These two symmetries are time scaling, under which the perception of rhythm is invariant, and pitch translation, under which the perception of melody is invariant. (Technically pitch translation is a form of time-scaling, but the perceptual machinery involved is quite distinct, due to the vastly different time scales involved, so it is simplest to regard them as distinct symmetries with distinct purposes and distinct implementations.)
Time scaling is a functional symmetry of speech perception, because it means that the rhythms of speech can be perceived the same in speech that may be faster or slower.
Pitch translation is a functional symmetry of speech perception, because it means that the melody of speech can be perceived the same in the speech of low-pitched and high-pitched speakers.
The calibration of these symmetries requires the occurrence of some naturally occurring relationship which is readily observable (by a plausible biological mechanism), and which is invariant under the relevant symmetry.
In the case of pitch translation invariance, the most plausible candidate for a calibration method is via the consonant relationships between the frequencies of harmonics in the vowel sounds that occur in normal human speech, and these same consonant relationships are significant in music (among other things, they are significant in harmony. (For evidence that the perception of consonant intervals is a function of the perception of harmonics in vowel sounds in normal speech, see The Statistical Structure of Human Speech Sounds Predicts Musical Universals.)
In the case of time scaling invariance, the most plausible candidate for calibration is the relationship between beat lengths that are very simple integer ratios, particularly 1:2 or 1:3. Such relationships are significant in music in that they are the ratios between different note lengths, and also the ratios between different regular beats that occur in the same time signature. (Unfortunately there is no peer-reviewed scientific research to support this hypothesis, but a plausible model of calibration is that interval 2x can be determined to be twice the length of interval x by observing the occurrence of three beats separated by a time of length x, given that the first and third beats are also separated by a time of length 2x.)
Almost all speech occurs with the intention of communicating information from the speaker to the listener, although there may be certain stereotyped speech that occurs in certain circumstances that does not communicate anything substantial to the listener.
However, for practical purposes, music never has the intention to communicate. It may seem that music is communicating something, given that performers are performing (and maybe singing words), and the listeners, if any, are listening. Yet, it is hard to argue that any useful communication is occurring in most of the circumstances in which music is performed or listened to.
Typically a musical item is a close to an exact duplicate of a musical item which the performers have previously performed, and which the listeners have previously listened to. So if there is any information contained in the performance of the music, it cannot be much more than the amount of information required to identify the music out of all the musical items known to the listeners, and the rest of the "information" (contained in the specification of the timings and pitches of the note values for that particular musical item) is essentially redundant (because the listeners are already highly familiar with that musical item).
Even in circumstances where the musicians improvise, there is little reason to believe that the listeners remember the additional information implicit in the improvisation, or do anything useful with that information.