csv

RFC-4180 compliant CSV format

In this module:

Language

Name (Aliases)

Description

Filename(s)

Mime Type(s)

Csv

CSV

Comma-separated values

*.csv

text/csv

class Csv[source]

Bases: parce.language.Language

RFC-4180 compliant CSV format.

root

Split a file in records.

record

Split a record in escaped (string) and non-escaped fields.

string

Handle a quoted string, escaping doubled quotes inside.

class CsvTransform[source]

Bases: parce.transform.Transform

Transform for comma-separated values, that creates a list of tuples.

For example:

>>> import parce.transform
>>> parce.transform.transform_text(parce.find('csv'), 'a,b,,c\nd,"",e,"x,y,z"')
[('a', 'b', None, 'c'), ('d', '', 'e', 'x,y,z')]
root(items)[source]

Return the list of records.

record(items)[source]

Return the tuple of the fields of one record.

Adjacent commas yield None, but empty quoted strings ("") are returned as empty strings.

string(items)[source]

Return a string comprising the contents of the quoted string.

Handles doubled quotes inside, and does not add the outer quotes.

Example:

Root lexicon Csv.root and text:

Text rendered using default theme

jaar,merk,type,omschrijving,prijs
1997,Ford,E350,"airco, abs, moon","3000,00"
1999,Chevy,"Type ""Extended Edition""",,"4900,00"
1996,Jeep," Grand Cherokee ","IS VERKOCHT!
air, moon roof, loaded","4799,00"

Result tree:

<Context Csv.root at 0-204 (4 children)>
 ├╴<Context Csv.record at 0-33 (9 children)>
 │  ├╴<Token 'jaar' at 0:4 (Name)>
 │  ├╴<Token ',' at 4:5 (Delimiter.Separator)>
 │  ├╴<Token 'merk' at 5:9 (Name)>
 │  ├╴<Token ',' at 9:10 (Delimiter.Separator)>
 │  ├╴<Token 'type' at 10:14 (Name)>
 │  ├╴<Token ',' at 14:15 (Delimiter.Separator)>
 │  ├╴<Token 'omschrijving' at 15:27 (Name)>
 │  ├╴<Token ',' at 27:28 (Delimiter.Separator)>
 │  ╰╴<Token 'prijs' at 28:33 (Name)>
 ├╴<Context Csv.record at 34-77 (9 children)>
 │  ├╴<Token '1997' at 34:38 (Name)>
 │  ├╴<Token ',' at 38:39 (Delimiter.Separator)>
 │  ├╴<Token 'Ford' at 39:43 (Name)>
 │  ├╴<Token ',' at 43:44 (Delimiter.Separator)>
 │  ├╴<Token 'E350' at 44:48 (Name)>
 │  ├╴<Token ',' at 48:49 (Delimiter.Separator)>
 │  ├╴<Context Csv.string at 49-67 (3 children)>
 │  │  ├╴<Token '"' at 49:50 (Literal.String.Start)>
 │  │  ├╴<Token 'airco, abs, moon' at 50:66 (Literal.String)>
 │  │  ╰╴<Token '"' at 66:67 (Literal.String.End)>
 │  ├╴<Token ',' at 67:68 (Delimiter.Separator)>
 │  ╰╴<Context Csv.string at 68-77 (3 children)>
 │     ├╴<Token '"' at 68:69 (Literal.String.Start)>
 │     ├╴<Token '3000,00' at 69:76 (Literal.String)>
 │     ╰╴<Token '"' at 76:77 (Literal.String.End)>
 ├╴<Context Csv.record at 78-127 (8 children)>
 │  ├╴<Token '1999' at 78:82 (Name)>
 │  ├╴<Token ',' at 82:83 (Delimiter.Separator)>
 │  ├╴<Token 'Chevy' at 83:88 (Name)>
 │  ├╴<Token ',' at 88:89 (Delimiter.Separator)>
 │  ├╴<Context Csv.string at 89-116 (6 children)>
 │  │  ├╴<Token '"' at 89:90 (Literal.String.Start)>
 │  │  ├╴<Token 'Type ' at 90:95 (Literal.String)>
 │  │  ├╴<Token '""' at 95:97 (Literal.String.Escape)>
 │  │  ├╴<Token 'Extended Edition' at 97:113 (Literal.String)>
 │  │  ├╴<Token '""' at 113:115 (Literal.String.Escape)>
 │  │  ╰╴<Token '"' at 115:116 (Literal.String.End)>
 │  ├╴<Token ',' at 116:117 (Delimiter.Separator)>
 │  ├╴<Token ',' at 117:118 (Delimiter.Separator)>
 │  ╰╴<Context Csv.string at 118-127 (3 children)>
 │     ├╴<Token '"' at 118:119 (Literal.String.Start)>
 │     ├╴<Token '4900,00' at 119:126 (Literal.String)>
 │     ╰╴<Token '"' at 126:127 (Literal.String.End)>
 ╰╴<Context Csv.record at 128-204 (9 children)>
    ├╴<Token '1996' at 128:132 (Name)>
    ├╴<Token ',' at 132:133 (Delimiter.Separator)>
    ├╴<Token 'Jeep' at 133:137 (Name)>
    ├╴<Token ',' at 137:138 (Delimiter.Separator)>
    ├╴<Context Csv.string at 138-156 (3 children)>
    │  ├╴<Token '"' at 138:139 (Literal.String.Start)>
    │  ├╴<Token ' Grand Cherokee ' at 139:155 (Literal.String)>
    │  ╰╴<Token '"' at 155:156 (Literal.String.End)>
    ├╴<Token ',' at 156:157 (Delimiter.Separator)>
    ├╴<Context Csv.string at 157-194 (3 children)>
    │  ├╴<Token '"' at 157:158 (Literal.String.Start)>
    │  ├╴<Token 'IS VERKOCHT!... roof, loaded' at 158:193 (Literal.String)>
    │  ╰╴<Token '"' at 193:194 (Literal.String.End)>
    ├╴<Token ',' at 194:195 (Delimiter.Separator)>
    ╰╴<Context Csv.string at 195-204 (3 children)>
       ├╴<Token '"' at 195:196 (Literal.String.Start)>
       ├╴<Token '4799,00' at 196:203 (Literal.String)>
       ╰╴<Token '"' at 203:204 (Literal.String.End)>

Transformed result (pretty-printed):

[('jaar', 'merk', 'type', 'omschrijving', 'prijs'),
 ('1997', 'Ford', 'E350', 'airco, abs, moon', '3000,00'),
 ('1999', 'Chevy', 'Type "Extended Edition"', None, '4900,00'),
 ('1996',
  'Jeep',
  ' Grand Cherokee ',
  'IS VERKOCHT!\nair, moon roof, loaded',
  '4799,00')]