Bayesian networks utilize a graphical notation for expressing conditional independence and compactly representing joint distributions through a directed, acyclic graph of nodes. Each node has a conditional probability distribution based on its parent nodes, and the structure encodes causal relationships, which facilitates efficient information processing. The construction of these networks involves selecting an appropriate variable ordering and identifying parents, leading to more manageable representations compared to full joint distributions.