This paper presents a framework of imitating the principal investor's behavior for optimal pricing and hedging options. We construct a non-deterministic Markov decision process for modeling stock price change driven by the principal investor's decision making. However, low signal-to-noise ratio and instability that are inherent in equity markets pose challenges to determine the state transition (stock price change) after executing an action (the principal investor's decision) as well as decide an action based on current state (spot price). In order to conquer these challenges, we resort to a Bayesian deep neural network for computing the predictive distribution of the state transition led by an action. Additionally, instead of exploring a state-action relationship to formulate a policy, we seek for an episode based visible-hidden state-action relationship to probabilistically imitate the principal investor's successive decision making. Unlike conventional option pricing that employs analytical stochastic processes or utilizes time series analysis to model and sample underlying stock price movements, our algorithm simulates stock price paths by imitating the principal investor's behavior which requires no preset probability distribution and fewer predetermined parameters. Eventually the optimal option price is learned by reinforcement learning to maximize the cumulative risk-adjusted return of a dynamically hedged portfolio over simulated price paths.