To accurately predict future positions of different agents in traffic scenarios is crucial for safely deploying intelligent autonomous systems in the real-world environment. However, it remains a challenge due to the behavior of a target agent being affected by other agents dynamically and there being more than one socially possible paths the agent could take. In this paper, we propose a novel framework, named Dynamic Context Encoder Network (DCENet). In our framework, first, the spatial context between agents is explored by using self-attention architectures. Then, the two-stream encoders are trained to learn temporal context between steps by taking the respective observed trajectories and the extracted dynamic spatial context as input. The spatial-temporal context is encoded into a latent space using a Conditional Variational Auto-Encoder (CVAE) module. Finally, a set of future trajectories for each agent is predicted conditioned on the learned spatial-temporal context by sampling from the latent space, repeatedly. DCENet is evaluated on one of the most popular challenging benchmarks for trajectory forecasting Trajnet and reports a new state-of-the-art performance. It also demonstrates superior performance evaluated on the benchmark inD for mixed traffic at intersections. A series of ablation studies is conducted to validate the effectiveness of each proposed module. Our code is available at https://github.com/wtliao/DCENet.