Disclaimer: I have no idea what is the state of the art in the environmental map sampling. In fact, I have very little knowledge about this topic. So this will not be complete answer but I will formulate the problem mathematically and analyze it. I do this mainly for myself, so I make it clear for my self but I hope that OP and others will find it useful.

$$
\newcommand{\w}{\omega}
$$

We want to calculate direct illumination at a point i.e. we want to know the value of the integral
$$
I =\int_{S^2} f(\omega_i,\omega_o,n) \, L(\omega_i) \,(\omega_i\cdot n)^+ d\omega_i
$$
where $f(\omega_i,\omega_o,n)$ is BSDF function(I explicitly state dependance on the normal which will be useful later), $L(\omega_i)$ is radiance of environmental map and $(\omega_i \cdot n)^+$ is the cosine term together with the visibility(that what the $+$ is for) i.e. $(\omega_i \cdot n)^+=0$ if $(\omega_i \cdot n)<0$

We estimate this integral by generating $N$ samples $\omega_i^1,\dots,\omega_i^N$ with the respect to the probability density function $p(\omega_i)$, the estimator is
$$
I \approx \frac1N \sum_{k=1}^N \frac{f(\omega_i^k,\omega_o,n) \, L(\omega_i^k) \,(\omega_i^k\cdot n)^+ }{p(\omega_i^k)}
$$

The question is: How do we choose the pdf $p$ such that we are able to generate the samples in acceptable time and the variance of the above estimator is reasonably small.

~~Best~~ method Pick $p$ proportional to the integrand
$$
p(\omega_i) \sim f(\omega_i,\omega_o,n) \, L(\omega_i) \,(\omega_i\cdot n)^+
$$
But most of the times it is very expensive to generate a sample according to this pdf, so it is not useful in practice.

Methods suggested by OP:

**Method one**: Choose $p$ proportional to the cosine term
$$
p(\omega_i) \sim (\omega_i\cdot n)^+
$$
**Method two**: Choose $p$ proportional to the EM
$$
p(\omega_i) \sim L(\omega_i)
$$

Based on the names of mentioned papers I can partially guess what they do(unfortunately I do not have the time and energy to read them right now). But before discussing what they most probably do, let's talk about power series a little bit :D

If we have a function of one real variable e.g. $f(x)$. Then if it is well behaved then it can be expanded into a power series
$$
f(x) = \sum_{k=0}^\infty a_k x^k
$$
Where $a_k$ are constants. This can be used to approximate $f$ by truncating the sum at some step $n$
$$
f(x) \approx \sum_{k=0}^n a_k x^k
$$
If $n$ is sufficiently high then the error is really small.

Now if we have function in two variables e.g. $f(x,y)$ we can expand it only in the first argument
$$
f(x,y) = \sum_{k=0}^\infty b_k(y) \, x^k
$$
where $b_k(y)$ are functions only in $y$. It can be also expanded in both arguments
$$
f(x,y) = \sum_{k,l=0}^\infty c_{kl} x^k y^l
$$
where $c_{kl}$ are constants. So function with real arguments can be expanded as sum of powers of that argument. Something similar can be done for functions defined on sphere.

Now, let's have a function defined on sphere e.g. $f(\omega)$. Such a function can be also expanded in similar fashion as function of one real parameter
$$
f(\omega) =\sum_{k=0}^\infty \alpha_k S_k(\omega)
$$
where $\alpha_k$ are constants and $S_k(\omega)$ are spherical harmonics. Spherical harmonics are normally indexed by two indices and are written as function in spherical coordinates but that is not important here. The important thing is that $f$ can be written as a sum of some known functions.

Now function which takes two points on sphere e.g. $f(\omega,\omega')$ can be expanded only in its first arguments
$$
f(\omega,\omega') = \sum_{k=0}^\infty \beta_k(\omega') \, S_k(\omega)
$$
or in both its arguments
$$
f(\omega,\omega') = \sum_{k,l=0}^\infty \gamma_{kl} \, S_k(\omega)S_l(\omega')
$$

So how is this all useful?

I propose the **CMUNSM**(Crazy mental useless no sampling method):
Lets assume that we have expansions for all the function i.e.
\begin{align}
f(\omega_i,\omega_o,n) &= \sum_{k,l,m=0}^\infty \alpha_{klm} S_k(\omega_i)S_l(\omega_o) S_m(n) \\
L(\omega_i ) &= \sum_{n=0}^\infty \beta_n S_n(\omega) \\
(\omega_i\cdot n)^+ &= \sum_{p,q=0}^\infty \gamma_{pq} S_p(\omega_i)S_q(n)
\end{align}
If we plug this into the integral we get
$$
I = \sum_{k,l,m,n,p,q=0}^\infty \alpha_{klm} \beta_n \gamma_{pq} S_l(\omega_o) S_m(n) S_q(n) \int_{S^2} S_k(\omega_i) S_n(\omega) S_p(\omega_i) d\omega_i
$$

Actually we no longer need Monte Carlo because we can calculate values of the integrals $\int_{S^2} S_k(\omega_i) S_n(\omega) S_p(\omega_i) d\omega_i$ beforehand and then evaluate the sum(actually approximate the sum, we would sum only first few terms) and we get desired result.

This is all nice **but** we might not know the expansions of BSDF or environmental map or the expansions converge very slowly therefore we would have to take a lots of terms in the sum to get reasonably accurate answer.

So the idea is not to expand in all arguments. One method which might be worth investigating would be to ignore BSDF and expand only the environmental map i.e.
$$
L(\omega_i) \approx \sum_{n=0}^K \beta_n S_n(\omega_i)
$$
this would lead to pdf:
$$
p(\omega_i) \sim \sum_{n=0}^K \beta_n S_n(\omega_i) (\omega \cdot n)^+
$$

We already know how to do this for $K=0$, this is nothing but the **method one**. My guess is, it is done in one of the papers for higher $K$.

Further extensions. You can expand different functions in different arguments and do similar stuff as above. Another thing is, that you can expand in different basis, i.e. do not use spherical harmonics but different functions.

So this is my take on the topic, I hope you have found it at least a little bit useful and now I'm off to GoT and bed.