In this study, we design a two-stream spatial-temporal network model to solve the problems of low recognition rate and complex processes in traditional micro-expression recognition methods. We use a local binary pattern to extract texture characteristics from the SMIC and CASME II micro-expression databases, and input them into the combined 3D convolutional neural network and convolutional long short-term memory to extract time and spatial information simultaneously. We add a discard algorithm to the model to enable the extraction of multiple features to reduce the risk of overfitting while learning richer features. In the SMIC and CASME II micro-expressions databases, our recognition rate reached 67.30% and 65.34%, respectively. Compared with existing recognition methods, the proposed model improves the network training speed and the micro-expression recognition rate.