I wrote an article discussing the applications of t-distributed stochastic neighbor embedding (t-SNE) as an improved technique for dimensionality reduction and cluster association comparable to the overused Principal Component Analysis (PCA). Although t-SNE is the new emerging hot topic in the world of environmental forensics, it is already considered a jaded technique to some data scientists.
The word, “novel”, is a very subjective term because the rate of innovations between different fields, such as environmental versus data science, experience contrasting timelines. There is also a large knowledge gap that formal education cannot supplement to one specific field alone. I can remember in my first year of graduate school being blown away by a chemometrics course introducing the concept of multivariate dimensionality reduction using PCA. Fast forward now and you’d be hard-pressed to convince me to use a PCA of any similarity association in your data knowing what I know about the misrepresentation of multivariate variability.
As scientists with rigorous formal education, we are often confined to relying on our own expertise to provide innovation rather than seeking to explore alternative sources of information. To excel as a subject matter expert, it is more imperative that one is open to receiving information from sources external to their small technical niches. Confinement in your own field can be a dangerous route to follow. While writing this, it reminded me of a great example of confinement in one’s own field leading to experts being in the dark about innovation. In a recent webinar between the US air force and Elon Musk (founder of SpaceX, Tesla and eccentric billionaire), the panel was discussing recent innovations in aviation. Halfway into their panel discussion, Elon stated that his ventures had already implemented these technologies over a decade ago and that the infrastructure to make all these machines self-autonomous (without need for a pilot) was already available. He prefaced that all jobs could soon be obsolete if they implemented his technologies. For the next few moments there were a lot of crickets in the audience. I bring this up since it is a reflection of what can happen if we are not aware of what lies outside our technical niches.
Let's dive into the technical reasons why we should leave t-SNE (at this point PCA should be considered primitive) and embrace using Uniform Manifold Approximation and Projection (UMAP) as the staple for dimensionality reduction. We should also embrace the idea that this is the most up-to-date technique, but a time will come when we need to explore the new methods circulating.
Disadvantages of t-SNE:
https://jlmelville.github.io/uwot/umap-examples.html
At Chemistry Matters we aim to stay current in the evolution of data science within the field of environmental forensics. Not only are we relevant to the trends in our discipline, but we also seek to employ the trends set out by the “popular kids” and pioneer a data driven focus in the field of environmental forensics.
References:
https://towardsdatascience.com/how-exactly-umap-works-13e3040e1668