html: add option to set MaxBuf in Parse#214
Conversation
|
This PR (HEAD: cbd34d5) has been imported to Gerrit for code review. Please visit Gerrit at https://go-review.googlesource.com/c/net/+/593635. Important tips:
|
|
Message from Gopher Robot: Patch Set 1: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/593635. |
|
Message from Gopher Robot: Patch Set 1: Congratulations on opening your first change. Thank you for your contribution! Next steps: Most changes in the Go project go through a few rounds of revision. This can be During May-July and Nov-Jan the Go project is in a code freeze, during which Please don’t reply on this GitHub thread. Visit golang.org/cl/593635. |
|
Message from Ian Lance Taylor: Patch Set 2: Hold+1 (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/593635. |
I encountered an issue when using html.Parse that triggers the following
call chain: html.Parse -> ParseWithOptions -> p.parse() -> p.tokenizer.Next()
-> readByte(). In the readByte() function, there's a logic block:
if z.maxBuf > 0 && z.raw.end-z.raw.start >= z.maxBuf {
z.err = ErrBufferExceeded
return 0
}
This logic only takes effect if maxBuf is set. However, when using html.Parse,
there is no way to use SetMaxBuf, nor is there any exported method to use
ParseWithOptions with SetMaxBuf. As a result, when parsing a very large HTML
document, such as this page: http://vod.culture.ihns.cas.cn, the memory usage
can increase significantly.
To solve this problem, I wrote a function using reflection:
func ParseOptionSetMaxBuf(maxBuf int) html.ParseOption {
funcValue := reflect.MakeFunc(
reflect.FuncOf([]reflect.Type{reflect.TypeOf((*html.ParseOption)(nil)).Elem().In(0)}, nil, false),
func(args []reflect.Value) (results []reflect.Value) {
parserValue := args[0].Elem()
}
And then used it as follows:
html.ParseWithOptions(bytes.NewReader(data), util.ParseOptionSetMaxBuf(len(data)*3))
Testing showed that setting maxBuf to at least 1.04 times the body length
ensures normal operation.
Therefore, would it be feasible to introduce a function similar to
ParseOptionEnableScripting that allows users to set MaxBuf?
Environment: