“When we don’t have linear separable set of training data like the example above (in real life scenario most of the data-set are quite complicated), the Kernel trick comes handy. The idea is mapping the non-linear separable data-set into a higher dimensional space where we can find a hyperplane that can separate the samples.”
I think this is a little bit confusing. While it is true that the kernel trick maps the non-linear separable data-set into a higher dimensional space, this is not its main idea I believe.The main idea of the kernel trick is to make the SVM algorithm computationally feasible.
You can of course map the non-linear separable data-set into a higher dimensional space by simply computing the dot product. The kernel function skips all these computations as you have described.