×

Attention Mechanisms In Artificial Intelligence

Author : Mrs. M.Poongodi, Ms. M.Janani Journa Name: International Journal of Science, Engineering and Technology Volume: 14 issue: 1 Year: Volume-14-issue-1 Views : 159
Abstract:
Transformers and Large Language Models (LLMs) have become foundational architectures in modern artificial intelligence, particularly in natural language processing and generative modeling. Their effectiveness is deeply rooted in mathematical principles drawn from linear algebra, probability theory, optimization, and information theory. This abstract presents a mathematical perspective on the core components of transformer-based models, including vector embeddings, positional encoding, self-attention, and multi-head attention mechanisms. The probabilistic formulation of language modeling, softmax-based output distributions, and cross-entropy loss functions are examined to explain learning and inference processes. Additionally, optimization techniques such as gradient-based methods and adaptive optimizers are highlighted for efficient training of large-scale models. By emphasizing the mathematical structures that govern representation, learning, and generalization, this work provides a rigorous foundation for understanding how transformers and LLMs achieve scalability, robustness, and high predictive performance. The abstract aims to support students, researchers, and educators in developing a deeper theoretical understanding of contemporary language models.

Related Indexing Platform

Indexed

Zenodo Logo
Zenodo
Research Data Repository
https://zenodo.org/records/18698659
DOI
DOI Resolver
Global Persistent Identifier
https://doi.org/10.5281/zenodo.18698659
GS
Google Scholar
Search this title on Scholar
Search on Google Scholar
SS
Semantic Scholar
Search this title
Search on Semantic Scholar
Lens
Lens.org
Check citations via DOI
Search on Lens.org
Leave Your Comment

Related Reviewers