Java library for parsing text files




















It supports left-recursive productions. It can automatically generate an AST. Rekex is a new PEG parser generator for Java It unifies grammar definition and AST construction in the most natural and intuitive way, leading to the simplest approach to writing parsers.

Rekex is a new parser generator with a novel approach that flips writing a parser on its head. With traditional parser generators you write a grammar and then the generated parser produces a parse tree. One issue with this approach is that the parse tree is rarely what you want.

So, you need to post-process the parse tree to create a data structure that fits your program. This can be a long process in itself. Particularly if you are dealing with a large grammar. You might be forced to optimize the grammar for performance during parsing, but this leads to a convoluted parse tree.

Therefore you then have to spend more time in creating a sensible AST for your end users. Instead the authors of Rekex created this parser generator to overcome this flaw. Rekex changes the process of designing a parser starting from the desired AST. You can read more about the whole approach in the official introduction to Rekex. Aside from this intro, the documentation is quite thorough with a guide and specification.

There are even a few examples in their repository and explanation of subprojects used by the library. This is crucial given the novel approach of this library, so that every user can understand if it is a good fit for them. In practice, you write the parser using a well-defined structure and conventions e. In other words, the grammar is Java 17 code but a bit more complicated by annotations.

They allow you to create a parser simply with Java code, by combining different pattern matching functions, that are equivalent to grammar rules. They are generally considered best suited for simpler parsing needs. Given they are just Java libraries you can easily introduce them into your project: you do not need any specific generation step and you can write all of your code in your favorite Java editor.

Their main advantage is the possibility of being integrated in your traditional workflow and IDE. In practice this means that they are very useful for all the little parsing problems you find. If the typical developer encounters a problem, that is too complex for a simple regular expression, these libraries are usually the solution.

In short, if you need to build a parser, but you do not actually want to, a parser combinator may be your best option. Jparsec is the port of the parsec library of Haskell. Parser combinators are usually used in one phase, that is to say they are without lexer. This is simply because it can quickly become too complex to manage all the combinators chains directly in the code.

Having said that, jparsec has a special class to support lexical analysis. It does not support left-recursive rules, but it provides a special class for the most common use case: managing the precedence of operators.

The library is quite popular, but it does not seems to be actively maintained anymore last edit was at the beginning of The objective of parboiled is to provide an easy to use and understand way to create small DSLs in Java. It puts itself in the space between a simple bunch of regular expressions and an industrial-strength parser generator like ANTLR. A parboiled grammar can include actions with custom code, included directly into the grammar code or through an interface.

Parboiled works a bit like a cross between a parser combinator and a parser generator. You create rules in code, with ready-to-use methods like Sequence or Optional , just like a parser combinator. However, the end result is parser class that you are supposed to use like a generated parser.

Parboiled is not suited to create individually used rules, i. You use it to parse a coherent language. It does not build an AST for you, but it provides a parse tree and some classes to make it easier to build it. It sound quite appropriate to the project objective and some of our readers find the approach better than a straight AST.

The documentation is very good, it explains features, shows example, compares the ideas behind parboiled with the other available options. There are some example grammars in the repository, including one for parsing Java 6 itself.

It is used by several projects, including important ones like neo4j. PetitParser combines ideas from scannerless parsing, parser combinators, parsing expression grammars and packrat parsers to model grammars and parsers as objects that can be reconfigured dynamically.

PetitParser is a also between a parser combinator and a traditional parser generator. All the information is written in the source code, but the source code is divided in two files. In one file you define the grammar, while in the other one you define the actions corresponding to the various elements.

The idea is that it should allow you to dynamically redefine grammars. While it is smartly engineered, it is debatable if it is also smartly designed. Given that departs from the usual design of a parser combinators it can be confusing for parsing experts. For example, it uses the function star to indicate zero or more elements, when other libraries uses many. In short, sometimes can feel like an experiment. You can see that the example JSON grammar it is more lengthy than one expects it to be.

An excerpt from the example parser definiton file that defines the actions for the rules for JSON. Duponcheel, and draws inspiration from various parsers in the Haskell world, as well as the ParsecJ library. The library wants to provide a simple internal Domain Specific Language to express grammar languages. In this example from the documentation you can see how it is possible to combine the parsers for integers intr and the one for characters chr to parse a simple sum expression.

The expression is also evaluated using the map function to call the normal sum function of Java for integers.

In addition to that there are a few utility functions to deal with input i. There are also a few interesting functions to combine and manipulate the parsers and their results, like the map one we talked about. So, it covers a small space of the parsers world, but it covers it very well. The documentation is short, but complete. It also briefly explains the basics of parsing and how to design a parsers using the library. The library is not very popular, but it has been recommended by Danny van Bruggen , the maintainer of JavaParser, so you know it is good.

There is one special case that requires a specific comment: the case in which you want to parse Java code in Java. In this case we suggest using a library named JavaParser. Incidentally we heavily contribute to JavaParser, but this is not the only reason why we suggest it. The fact is that JavaParser is a project with tens of contributors and thousands of users, so it is pretty robust.

Parsing in Java is a broad topic and the world of parsers is a bit different from the usual world of programmers. Privacy policy. The TextFieldParser object allows you to parse and process very large file that are structured as delimited-width columns of text, such as log files or legacy database information. Parsing a text file with TextFieldParser is similar to iterating over a text file, while the parse method to extract fields of text is similar to string manipulation methods used to tokenize delimited strings.

Text files may have fields of various width, delimited by a character such as a comma or a tab space. Define TextFieldType and the delimiter, as in the following example, which uses the SetDelimiters method to define a tab-delimited text file:.

Other text files may have field widths that are fixed. You are commenting using your WordPress. You are commenting using your Google account. You are commenting using your Twitter account. You are commenting using your Facebook account. Notify me of new comments via email. Notify me of new posts via email. Code Comments. Home Subscribe. Parsing Text Files in Java July 14, at pm 18 comments The following code is designed to parse comma, tab, etc.

Share this: Twitter Facebook. Text files that meet the following criteria are supported: The file is to be parsed into a list of records, where each record is a simple name-value pair so, none of the fields are lists or anything complex. Every record is delimited by either one of set of single-line regular expressions, or the end of the file. Every value in the record can be identified by a multiple-line regular expression.

While the regex may have multiple groups the value comes from a single group. The group attribute identifies the group to populate the value with. Configuration The parser is configured using an XML 'template' file. Some other text: without you? BGP router identifier Parser ; import com. Config ; import com. About Java library for parsing semi-structured text files Resources Readme.

Releases 2 V1. Nov 9, Packages 0 No packages published. Contributors 5. You signed in with another tab or window.



0コメント

  • 1000 / 1000